The Semantic Web and Healthcare ICT.

March 27, 2013
  • Imagine a medical information system publishing combinations of data-sets/actions on data-sets in a cloud-service (anonymously).
  • Imagine that service used by thousands of medical professionals.
  • Imagine that service ranking the incoming data-set-combinations on occurrence.
  • Imagine many professionals act the same on specific data combinations, this will make the information more trustworthy.
  • Imagine a medical information system showing combined, ranked, trustworthy data-sets to medical professionals when a recognizable combination of data occur at a system.

I think, it is better possible in an OpenEHR environment, because, there is always the problem of recognizing similar situations. This is needed to rank for trustworthiness. In an OpenEHR-environment, the archetype-ids and paths can help recognizing the similar situations and solutions.


Archetypes, how do they work for us?

December 16, 2009

In my post: What is OpenEHR, You’ll find a short explanation about archetypes, and what their role is in an OpenEHR-system.

In short: archetypes are constraints to and descriptions of data-sets which are extracted from or stored in an openEHR-system. An OpenEHR system does not have, like many other health-information-systems, a rigid data-model with much domain-knowledge.


First, let us see how legacy systems (since the early eighties) stored their data and how they were structured. Systems in those days, and still nowadays are rigidly modeled in database-structure, and a level higher in class structure, which mostly does not map one to one on each other. This is because the classes come close to the domain-model of the information to contain, and the database-structure is mostly normalized (following Codd’s methodology). There are some in between steps/mappings between the domain-classes and the database-storage. Even this can be done using persistence frameworks like “hibernate”, the system build in such way still remains hard to maintain, and also error-prone. Plus, it needs a lot of maintenance in the domain-layers because the domain-knowledge does directly reflect in the software.

Why did we build such systems that need a lot of maintenance and at the same time were very hard to maintain?

The reasons are simple. Computing power, network-bandwidth and storage were very expensive in the eighties and nineties. Also, there is a school from programmers which are well experienced in designing software like this. A cultural situation.

Archetyped System

As we can see the source of the main-problem (which caused other problems) in legacy is that domain-knowledge is integrated in the system. A programmer must be able to understand the domain-requirements.

The answer is that the domain-knowledge must be separated from the software. If this is the case, requirement-changes do not reflect in the software and no maintenance is needed.

We also can take in account that computers are so much faster, networks are so much faster, storage is so much cheaper. Thousand/million-times cheaper and faster. If we go for the same instantaneous response-time on GUI’s as the machines twenty-years ago, so much more is possible. It is miraculous that many software-designers do not realize that.

The archetyped systems take advantage of new concepts. Data fed into the system are not described in the system, but in archetypes.

The system itself is a simple storage machine based on a reference-model, which is in the case of OpenEHR, optimized for handling health-related data. I will describe the reference model in a later post on this blog. For now, it is enough to understand that is has all capabilities to work with every possible construction of medical data in a very generic way. The reference model is very stable and also very matured. It exists several years. It is developed by a team of experts, and by a community of ICT-in-Health-specialists. It is typically that has not been changed for years, except from some small changes. The reference-model is suitable for use by GP’s, specialists, dentists, veterinarians, etc.

How do we store complex data-sets in this rather generic reference model?

We use archetypes to describe the data-sets. Archetypes are in definition reflections of the generic reference model. They enrich the data by defining terminologies, constraint-definitions. There constraints on data but also on class-instance-structures. Archetypes are in fact class-structures with a tree structure and on the leaves, the primitive data. During storing of datasets the data are validated against the constraints and terms in the archetypes.

Inside the archetype are path’s defined, and all nodes have an Id, called node-id. The archetype itself has also a name, called: archetype-id. The archetype-id should be unique around the world. This achieved by using the company/hospital/university/organization name in the ID, together with the purpose of the archetype and a version identifier. Archetypes can improve, without throwing away the older ones. As a result, every node in every archetype has a unique path worldwide.

Archetypes are a kind of scripts partly written in a language ADL (Archetype Definition language) Archetypes reflect the domain-knowledge. Archetypes if normally readable by humans could be edited or created by holders of the domain-knowledge, the GP’s, specialists, etc. Although archetypes are written in a script-language, it is not easy for a non-technical person to read and understand them. But, there is tooling, archetype-editors, which enclose the richness of archetypes to non-technical holders of the domain-knowledge. In effect, there is no technician needed to create a new data-structure as a result of a domain-related requirement-change.

Except from creating archetypes, there are large repositories of archetypes created by universities, health-related organizations like the NHS, standardization-bodies like NEN, etc. Many of them can be used for free.

Archetypes, or parts of archetypes can be grouped in templates. Templates are a way of representing or collecting data. They are the GUI-related interface to archetypes. I will post an article about templates as well later.

Archetype principles

Archetypes are typically defined by the domain-specialists.

Below a small list of principals which describe the archetype-concept

  • Archetypes should define distinct domain-level models of content,
  • Archetypes define constraints on the structures of instances of the reference-model classes which they represent. Also archetypes define constraints on the primitive values inside the reference model-classes
  • Inside archetypes can be slots for (connection-points to) other archetypes. Especially this is the case in the Composition-archetypes which are based on the Composition-class in the reference model.
  • Archetypes are specializations of reference-model-classes. Archetypes can be specializations of other archetypes.
  • All nodes inside an archetype have a “path” and a unique “node-identifier”. Every path to an end-node (leave-node) is unique. This makes it possible to define a query. Complicated query statements can be issued. The language, called AQL (archetype query language) is in some aspects similar to XPath/XQuery.
  • Archetypes are neutral in respect to languages and terminologies.

More information on archetypes-principles can be found on the OpenEHR website in the document archetype-principles.

Archetype Object Model

The archetype object model, describes the construction of an archetype, not the data to which an archetype refers to (that is the reference model)

An archetype object-instance represents an archetype, it has three main parts. The description-part, the definition-part and the ontology-part. A complete specification is found in the document: Archetype Object Model

The description-part

This part has all the informational items and meta-information of the archetype. It can contain information about the archetype, what is it about, who wrote/designed it, who owns it, license-information, version-information, Also the languages used are mentioned.

The definition-part

The definition part is the actual representation of an (archetype-able) reference model-class. When the archetype is parsed by the OpenEHR-kernel, the definition is an object of type CComplexObject. A CComplexObject is the representation of a constraint on a non-primitive reference model-class. As said, the definition-part has a tree-model. A CComplexObject can have any number of CComplexObjects, and also CPrimitiveObject, which represents a primitive object in the reference model. Primitive objects are classes representing primitives, which have, in the reference model, some extra information. For example, a Text-value in the OpenEHR reference model contains, except from the text, also information about the code-page.

Except from directly described instances of reference model-classes inside the constraint-objects, it is also possible to use paths to items in other archetypes, or paths to the same archetype, or even complete other archetypes. For example, an archetype describing a Person can have an internal link to an archetype describing an Address.

The ontology-part

There are no linguistic parts in the definition-part. The linguistic items are defined in the ontology-part, in such a way as to allow them to be translated into other languages in blocks. As described in the openEHR ADL document, there are four major parts in an archetype ontology: term definitions, constraint definitions, term bindings and constraint bindings. The definitions define the meanings of various terms and textual constraints which occur in the archetype; they are indexed with unique identifiers which are used within the archetype definition body. The constraint ontology sections describe the mappings of terms used internally to external terminologies.

Interchange of data

Because data are very precisely described and constrained in the archetypes, it is for another OpenEHR-system very easy to import data. Everything an importing system needs to know is in the accompanying archetypes.

Except from this functionality it is also possible to map data to any other messaging-system (like HL7). This is done on the same way as any system has to do this. Here is not much advantage or disadvantage for OpenEHR.


Archetype-based systems separate domain-knowledge from software-structures. They are very stable and hardly need any technical maintenance, except for GUI’s. Maintenance of archetypes can be done by domain-knowledge-holders, like medical doctors.

CEN13606 and HL7 in the Netherlands, controverse or complementary?

November 21, 2009

In the context of Nictiz (the Dutch governmental ICT in Health organisation), HL7 is used as a message standard.

None of the Dutch information-systems that need to connect to the Dutch Health-network is modeled according to this HL7-standard. So they need to “translate” their data to or from a Hl7 version 3 message.

One Dutch HL7 (called PriCa=Primary Care) message is derived from the previous standard (Medeur=Edifact-message) serving the same purpose. That is why it is very similar, except from its form. Roughly said, its content is

  • patient: name/address,
  • Complaint
  • GP/nurse/specialist: name/address,
  • Treatment: medication, etc…

The purpose is to exchange this kind of information.

In the Netherlands is chosen, already in 2001, to support only this HL7 standard as message standard. The advantage is that all system builders know what to do, and can create a unified messaging model.

Health information systems on the Dutch market are in all sorts, simple or complicated, having different information levels, using different coding standards. A new kind of business model also emerges which will require information-exchange too: Microsoft-Health Vault and Google Health. They need to conform to this HL7-messaging. Tooling need to be created to support this. A good opportunity for a third party?

CEN13606-systems, or OpenEHR systems will come to market. These systems will all need to communicate on the Dutch ICT/Health backbone using HL7 messages. There is a possible controversy in here, because OpenEHR has its own message standard, and OpenEHR-implementors would like to use theirs. But except from OpenEHR there is no system using this messaging extract, and thus, also an OpenEHR system on the Dutch market needs be able to use HL7-messaging for information-exchange to other systems. So there should be no controversy between health-systems, but instead find the complementary.

When they communicate using this HL7-message-format, they restrict to what this message is able to contain, which is rather basic, but in 99% percent of cases good enough. Perfect is impossible because there will always be an information-loss when exchanging information. Health-care, information exchange, will never be perfect. We need to work on that, but also we must see the limitations.

What is OpenEHR

November 18, 2009

OpenEHR is concept describing a system for storing health related data. There are many systems with similar purpose, but OpenEHR is different.

I try to explain this in a few words, and hope that this

  • will make you go to the OpenEHR-website for more information,
  • and will help you as a “getting started” when starting to read the overwhelming amount of information on that website.

OpenEHR is a community-driven concept, which exists for several years now. It is supported by an international community. People, companies, universities, hospitals in Australia, UK, Turkey, Brasilia, Netherlands, Sweden, China, Japan and other countries are involved.

The community publishes open-source projects in several programming languages, such as Java, Eiffel, C#, Python, Ruby,….  The projects are tooling like archetype-editors, but also full kernel-implementations.

Except from open source-implementations, there are also closed source implementations. For example, the founders of the community and most active supporters of the community run a company in Australia, called Ocean Informatics. Their website explains very well what their business is.

The OpenEHR-concept is can be used in compliance with the EN13606 standard, which is like OpenEHR also based on archetypes.

The OpenEHR kernel

The kernel is the part of OpenEHR software which stores data, validates data and retrieves data.


Archetypes describe the data and their context, give validation rules and describe used coding and terminology. They are used when storing data helping to retrieve data and understand the data when retrieving.

How does it work:

There are repositories of archetypes in different countries. The NHS published hundreds. They are focused on different aspects of health-care, like observations, treatments, medications, evaluations.  The archetype describe the data-structures, the terminology-used, data-constraints for validation. Archetypes represent the domain-knowledge. Archetypes are typically created by persons with knowledge of the domain, for example: GP’s, dentists, medical specialists, eventually supported by specialists in medical informatics. It is possible to use already created archetypes, but it is also possible to create one owns archetypes. What is wise depends on situation and need for interoperability.

The kernel takes the data and the used archetype and validates and eventually stores the information. Inside the archetypes is a tree-structure of a systems of path’s on which the data are described. These paths are used to retrieve the information in a newly defined query-language called AQL (archetype query language). It has a similar concept as XPath.

Archetypes are the way to model data which will be stored in the OpenEHR-system. This concept makes the system flexible, no need for database-changes, code-changes, other technical arrangements when the information-requirements changes. Changes of terminology, constraints (validation-rules), data-structures are all arranged in the archetypes. The kernel remembers all and is always able to reproduce data in the same context as they were stored.

Archetypes are modeled according a reference-model, which is in fact a class model inside the kernel. It has complex EHR-structures, like compositions, which are build of other classes/structures like observation or/and action, or/and many more. Towards the leaves of an archetype we find complex and generic datastructures like Item-Lists or Item-Tables. The leaves are the OpenEHR datatypes, like DvDateTime or Coded_Text or even DVMultimedia which can store binaries like X Rays or recorded conversations.

The OpenEHR system is a complete system which can be used in virtually every medical profession. it is just a matter of having (or writing) the right archetypes.

I hope this helps you getting started in understanding the OpenEHR-concepts. Go to the OpenEHR-website for more information. Feel free to describe on the mailinglists, you will find a friendly and helpful community.