Search Options | Help | Site Map | Cultivate Web Site | ||
Home | Current Issue | Index of Back Issues |
Issue 9 Home | Editorial | Features | Regular Columns | News & Events | Misc. |
By Nick Crofts, Martin Doerr and Tony Gill-February 2003
Nick Crofts, Martin Doerr and Tony Gill report on the CIDOC and its work on the Conceptual Reference Model, an aid to comprehension and dialogue.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
CIDOC, the Comité International pour la Documentation is one of more than twenty committees that form part of ICOM - the International Council for Museums. CIDOC's members are primarily museum professionals working in the field of cultural heritage information and technology. It organises an annual conference and encompasses a number of working groups. The Documentation Standards working group - formed originally from the fusion of the data modelling and terminology working groups - took the decision in 1996 to embark on the development of a detailed conceptual model of the domain of cultural heritage information, known as the Conceptual Reference Model (CRM). The CRM was intended initially to extend and finally to replace the existing CIDOC relational data model [1] and the initial scope of the CRM was restricted to that of the International Guidelines for Museum Object Information: The CIDOC Information Categories, published in June 1995 [2]. This document, edited by a joint team of the CIDOC Data and Terminology and the Data Model Working Groups, resulted from the consolidation of earlier initiatives which had been in gestation since 1980. The Guidelines thus represented the fruit of many years of collective effort and reflection concerning museum information and constituted an obvious starting point for the development phase of the CRM. The first published version was released in Melbourne in 1998. Although originally developed by the CIDOC Document standards working group, the CRM is now maintained by the CRM Special Interest Group (CRM-SIG) organised by CIDOC but open to non-members. Version 3.2 of the CRM has been accepted by ISO TC 46/SC4 [3]. Currently at "Committee Draft" stage (ISO/CD 21127), the CRM will subsequently be published as an International Standard. This will transfer formal responsibility for the publication, review and maintenance of the CRM to ISO.
Development of the original CIDOC relational model had been prompted by the need to provide a common framework for the exchange of cultural heritage information. By 1995, this data model had arrived at the limits of manageable development. The number of entities was growing exponentially, (the final version of the relational data model contained 430 entities), and the overall structure of the model was no longer apparent, even to many of the authors. Yet the model was still insufficiently developed in many areas. The publication of the International Guidelines for Museum Object Information: The CIDOC Information Categories [2], and the absence of a clearly defined mapping with the data model, only served to highlight the shortcomings of the approach. One of its key objectives - to provide a framework for understanding cultural heritage information was clearly not being fulfilled.
The CIDOC data model had been developed on the assumption, common at the time, that common data schemas were required if data exchange were to be made possible [4]. However, this view was gradually being replaced by a new vision based on the possibility of mediation systems capable of managing data from heterogeneous sources. This paradigm prompted a shift from seeing the CIDOC data model as a low-level blueprint for a database schema to that of a high-level conceptual definition. Viewed in this new light, many conceptual shortcomings of existing relational models became apparent. The group realised that a new sort of model was required, one that would provide the common ground needed for the development of compatible information systems but which did not pre-define implementation issues.
The essence of this new approach can be expressed in terms of the distinction between information and data. Information, in this context, can be defined as the meaning that is common to different forms of expression. 'Je suis fatigué', 'I'm tired', and 'I need to rest', all say much the same thing in different ways - they carry much the same information. Data, on the other hand, are tokens - words, letters and symbols, bits and bytes, etc. - which have a particular meaning because of their rôle within a semiotic context. The original CIDOC data model was a data schema because it attempted to a define specific representation for cultural heritage information. By contrast, the CRM is an information level model. It is intended to specify and clarify the concepts that are needed for the exchange of cultural heritage information. Different representations and different data schema can be defined that are nonetheless faithful to this information model. Using current terminology from information science, the CIDOC Conceptual Reference model can be defined as a "domain ontology" for cultural heritage information. The term ontology is derived from philosophy where it refers to the assumptions about existence underlying a particular world view, in other words what sorts of things exist in the world and what the relationships are between them [5]. In computer science, the term has taken on a more specific meaning and refers to the formal definition of a philosophical ontology [6]. The CRM is an ontology in this latter sense since it aims to define and clarify a set of underlying concepts. Qualifying the CRM as a domain ontology further refines the notion: the CRM is intended to cover a specific area of interest, not the whole universe. Paraphrasing the initial definition we can say that the CRM provides a formal definition of assumptions about what sorts of things exist, and the relationships between them, in the context of cultural heritage information. This ontology is represented as an object-oriented model, composed of classes, organised into a hierarchy and related to each other through property links. This structure of classes and properties provides a framework for describing the complex interrelations that exist between objects, actors, events, places and concepts in the field of cultural heritage.
This orientation of the CRM as a domain ontology means that implementation level questions are not addressed. In particular, the CRM makes no assumptions about business procedures and institutional rules, it contains no methods or procedures, it does not define validation rules and constraints - other than those required for compatibility with the CRM - nor does it define data formats or user interface elements. Using the CRM does not ensure byte level or data level compatibility between different systems, it can, however, help to ensure conceptual compatibility.
Largely in reaction to the experience of previous integration projects, including work on the CIDOC data model, the CRM was designed with the following specific goals in mind [7]:
Within the life cycle of the design and implementation of information systems, the CRM has a specific role to play as a conceptualisation of the domain of cultural heritage information.
Figure 1: Theoretical frame of information |
This theoretical frame is one commonly used in information science and present, in various forms, in a number of standard methodologies for the analysis and development of information systems. It is based on the fundamental distinction between the conceptual elaboration and the technical implementation of an information system, and the domain it is intended to support.
This general approach to information systems follows the classic cycle of analysis, conception and design. The initial objective is the analysis of the domain and its conceptualisation as a formal ontology. This abstract level of design is then applied to the design and realisation of a practical system.
The intended scope of the CRM should be understood as the domain that the CRM would ideally aim to cover, given sufficient time and resources, and is expressed as a definition of principles. The practical scope is, necessarily, a subset of the intended scope. The intended scope is difficult to define with the same degree of precision as the practical scope since it depends on concepts such as "cultural heritage" which are themselves complex and difficult to define. The objectives provided by the intended scope are important, however, since they allow appropriate sources to be selected for inclusion in the practical scope. The practical scope is expressed in terms of the reference documents and sources that have been used in its elaboration. The CRM covers the same domain as these reference sources (see below). This means that data encoded in accordance with one of those sources can be transformed or integrated into a CRM-compatible form without loss, insofar as the reference source remains within the intended scope of the CIDOC CRM.
The intended scope of the CRM may be defined as all information required for the scientific documentation of cultural heritage collections, with a view to enabling wide area information exchange and integration of heterogeneous sources. This definition requires some explanation:
As of autumn 2002, formal mappings have been established for the following data structures; all elements that fall within the intended scope are covered by the CIDOC CRM:
Many other data structures have been taken into account in the development of the CIDOC CRM by informal investigation, and more mappings are under way.
A domain ontology such as the CRM is designed to be explanatory and extensible rather than prescriptive and restrictive. Currently, no specific formalism for semantic models has been widely accepted as a standard, nevertheless the semantic deviations between the various available models are minimal.
Consequently, the model has been formulated as an object-oriented semantic model, which can easily be converted into other object-oriented models. The TELOS data model [8],[9] has been used as a reference system throughout the development of the CRM, though without use of its assertional language. TELOS, in common with many other knowledge representation languages, decomposes knowledge into elementary propositions - declarations of individuals, classes, unary and binary relations. The properties of TELOS relevant for the purposes of the CIDOC CRM are similar to those of RDF and RDFS [10]. Since Resource Description Framework (RDF) may soon become a de facto standard for the applications we target, (other competitors being DL-based systems, DAML+OIL etc.), we have adopted terminology close to that of Resource Description Framework Scheme (RDFS) and therefore more familiar to members of the Web technology community than that used by TELOS. As our primary interest is ontological, we intend to produce the CRM in various representations, such as RDFS, Extensible Markup Language Document Type Definition (XML DTD), etc, The primary source for the CRM remains a complete implementation in TELOS on the SIS knowledge management system [11]. Logical assertions are omitted from this implementation since they can be added at a later stage, once the ontological commitment of the primitive classes, properties and isA relations are established.
On its own, the formal definition of the CRM is not easily understood. The use of rich specialisation hierarchies generates a rich set of inherited properties and cross-references. Consequently, this relatively compact definition of 211 elements corresponds to several thousand properties of the declared classes. A full set of direct and inherited declarations can be automatically generated from the original definition, and is available as a separate document on the CIDOC CRM website [12]. This document is implemented as html hypertext, so that all referred concepts are accessible by a single click, as needed when using the model.
It is worth noting the reasoning behind the choice of an object-oriented formalism as the basis for the definition and presentation of the CRM. This decision was motivated by several factors :
The use of the object-oriented model is specifically not intended to influence decisions about implementation. The CRM has been used as the basis for successful implementations using a wide range of technical platforms, including relational databases [7].
The CRM does not aim to provide a complete philosophical analysis of the concepts it defines, nor to provide formal criteria for determining whether or not a particular item is an instance of one of its classes. Rather, it seeks to provide a core language that will facilitate tasks such as the semantic integration of heterogeneous data structures and the design of new data structures. Our aim is that an expert's grasp of CRM concepts should be sufficient to allow parallels to be drawn between elements in the planned system and compatible CRM concepts. Consequently, the CRM is intentionally focused on a set of fundamental, shared concepts that can safely be standardised.
But what are the practical applications of the CIDOC CRM? Used as a methodological tool in cultural heritage technology projects, the CIDOC CRM can improve communication and help avoid potentially costly misunderstandings. As a reference for good practice it can be used to compare and evaluate existing systems. In a technical context the CRM can be used as a basis for data archiving, exchange and integration - an important contribution to the creation of a global network for cultural heritage information. These different applications are discussed below.
Perhaps the most immediate role for the CRM is simply as an aid to comprehension and dialogue; as its name indicates, the CRM is a reference document that can help to establish the conceptual "common ground" between different disciplines and domains. The need for clear and unambiguous communication is critical to technology projects in the cultural heritage sector that bring together domain experts, (such as historians, archaeologists, and biologists), with system developers and other technicians. In order to design and build satisfactory information systems, technical experts are faced with the difficult task of coming to terms with all the complexities and subtleties of cultural heritage information. At the same time, domain experts need to explain their requirements in terms that IT specialists can understand and evaluate the solutions they propose. Misunderstandings in the design of information systems can turn out to be extremely costly.
By providing a rich and detailed analysis of the cultural heritage domain, the CRM can facilitate dialogue between cultural heritage experts and technical specialists. The classes and property relations of which it is composed are all clearly defined through textual scope notes, examples, cross-references, and their position within the formal structure. This multiple and "redundant" presentation is intended to be accessible to technicians and domain experts alike - cultural heritage professionals may see it as a formal representation of familiar concepts, while IT specialists can view it as a high-level blueprint for an information system. The CRM provides, in effect, a basis for mutual comprehension.
Apart from its role as a purely conceptual reference, the CRM can also serve as a technical reference for use in comparing and evaluating information systems and data schema. Comparing existing or projected information systems and schema with the CRM helps to highlight divergences - both in scope and in structure - which can then be examined in more detail to see if they are justified or not.
The value of the CRM as a technical reference becomes particularly apparent when it is used as the basis for data transfer between incompatible systems. The CRM can provide the semantic backbone for a common data format, for example an XML or RDF Schema, that can be shared by a number of different systems: a technical lingua franca that allows data to be transferred from one system to another. If data need to be shared between a number of different systems, the use of a single intermediate reference format is a simple and efficient way to proceed; otherwise, the number of transfer and mapping protocols increases exponentially as more systems are included.
Providing an extensible basis for data transfer between heterogeneous systems and schema is of enormous value since it facilitates both data transfer between institutions and data migration between systems. A common semantic model such as the CRM can also be used as the basis for system-independent data formats for the long-term archiving of digital cultural information.
The CRM can be used as a reference guide when creating technical specifications for the design of new cultural heritage information systems. It is important to underline that it is not necessary to implement the entire CIDOC CRM as is. The model is intended to cover the entire field of cultural heritage information, at a level of detail acceptable for scientific research. This means that some aspects of the model would be superfluous for a specific implementation and that others would need to be extended to support institution-specific requirements. The CRM has been designed to make this process of adaptation as simple as possible by providing 'plug-in' points and guidelines for extensions that remain compatible with the overall structure. The CRM has been used successfully as the basis for the design and implementation of a number of cultural heritage database applications - such as Geneva City's Musinfo project [13] and RLG Cultural Materials [14]. By using the CRM as a starting point for a technical specification, much of the trial and error involved in modelling an information system from scratch can be avoided, resulting in a more flexible design which can be more readily adapted to future needs.
Possibly the most ambitious application of the CRM is in the development of integrated query tools, mediation systems and data warehouses. At present, much of the information stored in library catalogues, archival finding aids and museum collection management systems remains isolated. Different information resources normally need to be queried individually, and cross-system links are rare. The ability to combine and integrate information from multiple sources has the potential to add significant value to existing data - facilitating research and enhancing the quality of the user's experience.
Physically combining data into a single system may be impossible, for technical, organisational or economic reasons, so mediation systems aim instead to federate information sources, making distributed queries possible without the need to physically aggregate information into a single monolithic database. A typical mediation system acts as a single interface for users. It accepts and interprets queries and distributes them to participating systems. These systems reply to the mediator, which consolidates the results for the end user. In order for a query mediation system to function correctly, it has to be able to communicate with each participating system in a way it can understand, and interpret the results. Participating systems are unlikely to have identical data schema and may well store different levels of detail about similar objects, so the mediation system needs to be a semantic polyglot.
Using the CRM as the basis for the mediation system's data schema makes distributed query systems much easier to design. By mapping each participating system's internal data representation to the canonical form provided by the CRM, it becomes possible to integrate and interpret data stored in otherwise incompatible systems.
Conversely, the CRM can be used to inform the design of data warehouse systems, which take the opposite approach. Relevant data are copied into the data warehouse from different sources at regular intervals, integrated into a single database and 'normalised' to remove duplicates and merge identical instances. The data warehouse is used both as a source of consolidated knowledge, and as an index to the original data sources.
The CRM has been specifically designed with mediation and data warehouse applications in mind, allowing data to be combined from heterogeneous data sources in a meaningful way and without loss of detail.
The CIDOC CRM can be described in the traditional way starting with the major classes. This has been done in the main definition document [15] and in [7]. These high-level classes are those which emerged as a result of the logical grouping of shared properties [16]. These groups are concerned with fundamental notions such as identification, participation, location, purpose, motivation and use etc. The diagram below presents an overview in which Temporal Entities, and hence events, occupy a central place.
Figure 2: A qualitative metaschema of the CIDOC CRM |
All property paths to dates go through Temporal Entities, as do most of the property paths to places. Those place properties which bypass temporal entities should be understood as short cuts of temporal entities. Similarly, Actors are only seen as relating material and immaterial things (Physical Stuff, Conceptual Objects) through Temporal Entities.
Any instance of a class may be identified by a number of Appellations. These are the names, labels, titles or other means of identification used in the historical context. We model the ambiguous relation of items to their names as part of the historical process of knowledge acquisition. The notion of identification used here should not be confused with that of database identifiers in implementations of the Model, which are not part of the ontology.
All class instances can be refined (specialised) into more detailed categories through the use of Types. Types frequently consist of a range of properties that refer in general to things of a certain kind, such as "a dress made for a wedding" in contrast to the "dress made for my wedding".
CRM properties can be grouped by the following list of meta properties
The CIDOC Conceptual Reference Model is maintained by the CIDOC CRM Special Interest Group [17], a diverse international group of museum information professionals with an official mandate from ICOM-CIDOC (the Documentation Committee of the International Council of Museums), dating from August 2000, to develop and promote the standard.
The membership of the CIDOC CRM Special Interest Group is diverse, both in terms of the members' geographical and professional backgrounds; the group currently has 50 members from across the globe, spanning Europe, North America, Asia and Australasia, and includes museum curators from various disciplines, collections information managers, information scientists, librarians, representatives of regional, national and international standards bodies, natural historians, museum data management consultants, and system vendors.
The CIDOC CRM Special Interest Group also contains two "sub groups." Several members of the SIG are also members of Working Group 9, Sub Committee 4, Technical Committee 46 of the International Standards Organization (normally identified by the rather cumbersome alphanumeric string ISO TC46 SC4 WG9!). This group is made up of technical experts representing ISO "P-member" (i.e. voting) countries, and they are responsible for guiding the CIDOC CRM through the ISO standard development process. The ISO process normally consists of six separate stages, but since the CIDOC CRM was already a relatively mature standard developed by ICOM-CIDOC (an internationally-recognised body with standards development experience), the CRM was eligible for the "Fast Track Procedure" and entered the ISO process at stage 3, the Committee Stage.
The second sub-group within the CIDOC CRM Special Interest Group consists of the participants in the Cultural Heritage Interchange Ontology Standardization (CHIOS) Project. This Thematic Network project is also devoted to the development, standardisation and promotion of the CRM, and is generously funded by the European Commission's Fifth Framework IST programme.
Since the members of the CIDOC CRM Special Interest Group are spread across four continents, electronic communications are used extensively to discuss a variety of issues; chief among these are the electronic mailing list, crm-sig, and the Web site at <http://cidoc.ics.forth.gr/>, both hosted by ICS-FORTH in Crete.
However, the CRM-SIG meets in the real world too; to ensure that the momentum is kept up, the group aims to meet three times a year, at least until the CRM is published as an ISO standard. The CHIOS funding is invaluable in this regard, because it enables a core group of long-standing SIG members to attend the meetings regularly, providing vital continuity and momentum for the development of the standard.
The CRM-SIG has adopted a rigorous procedure for managing outstanding issues, based in part upon the process developed by the Dublin Core community (some members of the SIG also took part in the Dublin Core Metadata Initiative in the past). Outstanding issues can be raised either at meetings or on the e-mail list, but an issue must be submitted at least two weeks in advance of a meeting to be on that meeting's agenda; otherwise it is deferred until the next meeting.
Issues are given an identifying number, assigned to a particular working group, and tracked on the Web site. Before each meeting, proposals for addressing the issues are sought from the membership, and each proposal is also added to the issue log on the Web site. The group then votes for the proposals for each issue on the agenda at the following meeting.
The CHIOS funding has also been used to good effect to invite experts from specific disciplines when particular types of professional input have been required; for example, the group was able to sponsor experts from the natural history community to participate who would otherwise not have been able to take part. As a result, the CIDOC CRM was modified slightly to address the description of taxon creation and assignment so fundamental to the natural history community, but which has traditionally been excluded from mainstream museum documentation information standards.
The benefits of this kind of expert involvement are threefold; the CIDOC CRM standard itself is enriched to cover a broader scope; the natural history museum community will now have a much more useful tool at their disposal, that will facilitate interoperability both between their peer natural history institutions and the wider museum community; and the CIDOC CRM can demonstrate additional community input and support, an important criteria for ISO in the standardisation process.
Members of the CRM-SIG are also increasingly promoting the CRM in professional arenas by developing support materials, presenting papers and running training workshops at conferences. The SIG is increasingly focusing its efforts on this kind of dissemination activity as the standard becomes ever more stable, and emphasis shifts to application and deployment rather than development. Again, the CHIOS funding has been invaluable in facilitating this essential outreach and support work.
The CRM SIG will shortly be holding its most ambitious outreach event yet; the "Sharing the Knowledge" Symposium [18], organised jointly by the CIDOC CRM SIG and the Smithsonian Institution. This event, to be held on 26th-27th March 2003 at the Smithsonian's International Center in Washington, D.C., will bring together researchers and practitioners from many disciplines to address the technical, organisational and philosophical challenges to the effective sharing of cultural knowledge from museums, libraries, archives and beyond.
The final phases of the ISO process will probably be completed within a year. However, even though the ISO standardisation process is not yet finished, the CRM can and is already being used. Partners in the CRM-SIG have successfully developed applications based on the CRM ranging from data consolidation and data migration to full scale information systems. Current work on the CRM within ISO is aimed at finalising formal aspects of the model, ensuring coherence and facilitating comprehension. The basic conceptual constructs used in the model can be considered as stable and are unlikely to be modified in the near future.
Nick Crofts
Head of Documentation
Musées d'art et
d'histoire
Rue Charles-Galland 2
Geneva, CH-1206
Switzerland
URL:
<http://mah.ville-ge.ch/>
Email: nicholas.crofts@mah.ville-ge.ch
After studying Philosophy and History of Art in Canterbury, UK, and a brief spell in radio journalism, Nick Crofts started working at the National Sound Archives in London where he first became interested in information management. Nick studied Information Technology in Geneva and spent several years working in the documentation department of Geneva's Musées d'art et d'histoire. He has worked as project manager for Musinfo -computerising Geneva's museums and is currently Head of Documentation of the Musées d'art et d'histoire. Nick is also co-ordinator of the ICOM/ClDOC Documentation Standards Group.
Martin
Doerr
Researcher
Foundation for Research and Technology - Hellas
(FORTH)
Institute of Computer Science
Vassilika Vouton
P.O. Box
1385
Heraklion Crete
71110 Greece
Email: martin@ics.forth.gr
URL: <
http://zeus.ics.forth.gr/forth/ics/isl/people/people_individual.jsp?Person_ID=2>
Martin Doerr has studied Mathematics and Physics from 1972-1978 and holds a
PhD in Physics from the University of Karlsruhe, Germany. He has been Senior
Researcher at FORTH since 1990. He has done theoretical work in knowledge
representation as well as system and application development of various advanced
information systems. Since 1992 he has participated in a series of projects on
cultural information systems and teaches courses in cultural informatics. He is
chair of the CIDOC CRM Special Interest Group, a Working Group of the
International Council of Museums and collaborates with several cultural
organisations on the development of advanced information systems and IT
environments. His research interests are ontology-driven systems, cultural data
models and terminology management.
Tony Gill
The Andrew W.
Mellon Foundation
140 E. 62nd Street
New York, NY 10021
U.S.A.
URL: <http://www.mellon.org/Staff/Gill/Gill.htm>
Email: tg@mellon.org
Phone: +1 (212) 838
8400 x2265
Fax: +1 (212) 223 2778
Tony Gill is the Director of Metadata for ARTstor at the Andrew W. Mellon Foundation, with strategic and operational responsibility for analysing, enhancing and integrating heterogeneous descriptions of art and material culture in order best to meet the needs of scholars and educators. He participates actively in the international art and museum standards and knowledge management communities.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
For citation purposes:
Crofts,N., Doerr, M. and Gill, T
"The CIDOC Conceptual Reference Model: A standard for communicating cultural
contents", Cultivate Interactive, issue 9, February 2003
URL:
<http://www.cultivate-int.org/issue9/chios/>
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Related articles:
If you would like to view similar
articles to this one click on a key word below:
< - CIDOC - CHIOS - Conceptual Reference Model - Ontology - data model - data exchange - data warehouses - >
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Translation:
If you wish to translate this page choose
the preferred language using the WorldLingo icon below.
Cultivate Interactive is not responsible for the outcome of this translation software. For further information see a previous article on Machine Translation.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Feedback:
If you wish to read or submit comments on this
article we now provide access to the Crit annotation
service. Before using this service you should read the guidelines.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Copyright ©2000 - 2001 Cultivate. | Published by UKOLN | Design by ILRT | Contact Us |