OCHRE Database
A description of the schema and querying mechanism of the highly efficient and scalable graph database that powers the OCHRE platform. Click the buttons below for more information on other OCHRE topics.By David Schloen and Sandra Schloen (May 2023)
Representing Hierarchical Data in a Semistructured Graph Database
The OCHRE database is not a relational database. It does not use the relational data model but rather the semistructured data model, for which the Extensible Markup Language (XML) provides a convenient text-based notation. JSON (JavaScript Object Notation) is also a semistructured data notation, albeit in more limited way, being a simplified version of XML that has become popular among Web app developers.
(For discussions of the semistructured data model and XML, see Database Systems: The Complete Book, 2d edition, by H. Garcia-Molina, J. D. Ullman, and J. Widom [Upper Saddle River, N.J.: Pearson Prentice Hall, 2009], pp. 483–515) and Dan Suciu, “Semi-structured Data Model,” in Encyclopedia of Database Systems, ed. M. Ling Liu and Tamer Özsu; Springer Link, 27 January 2017.)
The semistructured data model is a variant of the graph data model that leverages the power of “tree” structures to represent information as open-ended hierarchies of entities while also allowing cross-hierarchy links among the entities in different trees. This data model is well suited for hierarchically structured data, which is ubiquitous in the study of languages, texts, artifacts, and other cultural materials. Hierarchies make it possible to exploit the power of recursion, which is used extensively in the OCHRE software.
For example, relations of spatial containment are easily represented by means of recursive “parthood” hierarchies (see the discussion of the Tree category below), such that a spatially situated unit of observation contains smaller spatial units and these in turn contain still smaller units, and so on down the hierarchy (e.g., an archaeological site contains soil layers that each contain many artifacts). Likewise, it is easy to see how temporal, linguistic, and textual entities and relations can be modeled as recursive “parthood” hierarchies in which smaller entities are nested within larger entities of the same kind. Other kinds of entities and relationships lend themselves to non-recursive “grouping” hierarchies that represent group membership rather than part-whole relationships. For example, hierarchies of agents and resources serve to organize these entities without implying that one entity is contained within another.
In addition to hierarchical tree structures, the semistructured data model works well with loosely structured networks of entities because it includes not just hierarchies but also cross-hierarchy links. And it can represent highly structured information, e.g., tables of data organized in rows and columns. This flexibility explains why XML and JSON, as notations for semistructured data, have become the standard formats for exchanging information on the Web.
In contrast, the relational data model that underlies most commercial databases is well suited for highly structured data but is cumbersome to use for hierarchical data. OCHRE database hierarchies that are easily represented and queried using XML would require many inefficient table joins in a relational database. On the other hand, the more generalized network graph data model is appropriate for unstructured data in which there are few constraints on how entities are related to one another, as in the World Wide Web (which is an unconstrained network graph, from a database perspective). However, the network graph data model is much less efficient than the semistructured graph data model when dealing with hierarchical data, for which XML provides structural constraints that permit efficient querying.
The relational data model, the network graph data model, and the semistructured graph data model are all universal data models that can be used to represent information of any kind, but each model has its own advantages, depending on the kind of data one is working with. The OCHRE ontology and resulting item categories (described below) could be — and once were — implemented in a relational database using tuples (table rows) and SQL. And this ontology could also be implemented in a network graph database using RDF and SPARQL, for example. But implementing the ontology in a semistructured graph database using XML is the best approach in principle and has proven to be the most elegant and efficient method in practice.
The OCHRE Database Uses XML Schema and XQuery Instead of SQL
More specifically, the semistructured graph database on the back end of the OCHRE platform is based on the XML Schema and XML Query (XQuery) standards of the World Wide Web Consortium (W3C). The database runs on an enterprise-class native-XML database management system (DBMS) that has a highly optimized XQuery processor and supports indexing on the “ancestor axis” for efficient searching of hierarchically organized data. It is a transactional multi-user database with password-protected user accounts, record-locking, and mechanisms to ensure data security and disaster recovery. It is a professionally engineered and highly scalable database that meets the “ACID” requirements of atomicity, consistency, isolation, and durability.
Information in the database is stored in a large number of Extensible Markup Language (XML) “documents” that conform to various predetermined document types. These XML documents serve as the atomic keyed-and-indexed data objects in the database (or what we prefer to call “items,” as explained below). Each XML document has an attribute containing a universally unique identifier (UUID) that functions as its database key. The database maintains indexes on the keys of all the documents to ensure efficient querying and joining of data via XQuery. Links between documents are established by adding the UUID key of the target document to an element of the source document.
Note that XML documents need not correspond to real-world documents and in the OCHRE database they rarely do. The term “document” in this context reflects the fact that XML is a notation for serializing complex data structures as linear character strings using Unicode character codes (i.e., using one of the UTF encoding schemes). This means that any modern operating system that supports Unicode text files can handle data stored as XML documents. Since it was invented in 1998, XML has become ubiquitous in computing because it provides a text-based and thus cross-platform notation for all kinds of data — structured, semistructured, and unstructured — and not just for documents as normally understood.
XML Schema is used to specify the elements and attributes of each XML document type in the OCHRE database. An XML Schema specification of the structure and sequence of an XML document’s elements and attributes is analogous to a relation schema for a relational database table, which defines the names, types, and sequence of the attributes (normally displayed as table column headings) that are associated with the attribute values in a tuple (normally displayed as a table row). The advantage of using XML Schema to specify the structure of the atomic data objects in the OCHRE database is that their internal structures can be more complex than a tuple, i.e., they can contain internal tree structures represented by nested elements inside the XML document. This has many benefits for the kind of information stored in the OCHRE database.
The XML Query (XQuery) querying language is used to create, read, update, and delete the XML documents in the OCHRE database — the “CRUD” operations — and also to join them together into larger configurations based on their keys. The XML documents managed by a native-XML database are analogous to the tuples (table rows) in a relational database, i.e., they are the atomic data objects that are individually indexed and retrieved. Thus XQuery is analogous to the SQL querying language used in relational databases, although XQuery is a Turing-complete language, unlike SQL, and in principle is more powerful. Relational tables and SQL are not well suited for semistructured data organized in open-ended hierarchies of entities whereas XML documents and XQuery are very well suited for such data. That is why XML Schema and XQuery were chosen to implement the OCHRE database, which makes extensive use of hierarchies to represent the parthood relations, class-subclass relations, and cataloging relations commonly found in the study of languages, texts, artifacts, and other cultural materials.
OCHRE Database Categories
The OCHRE database contains millions of small XML documents that represent entities of interest in a highly granular fashion. Each XML document conforms to one of 16 different XML document types that collectively constitute the internal schema of the OCHRE database. These document types correspond to ontological classes in the foundational ontology (meta-ontology) implemented in the database.
The foundational ontology implemented in OCHRE has its own name. It is called the Comprehensive Hierarchical Ontology for Integrative Research (CHOIR) to distinguish it from the OCHRE database schema that implements this ontology. The schema of a working database system, specified in terms of logical data structures, is distinct from the abstract conceptual ontology it implements because the same conceptual ontology can be implemented in different ways in different database systems. The CHOIR ontology specification can be implemented using XML documents, as in OCHRE, or by means of relational tables or network graphs.
CHOIR’s foundational conceptual classes of entities correspond to what are called “categories” in the OCHRE database, with a nod to Immanuel Kant. Instances (individuals) of these ontological classes are called “items” in OCHRE, emphasizing the notion that we are referring here to any item of interest that might be singled out by an agent who names it and makes a statement about it. Items of interest to scholars include mental concepts and linguistic utterances as well as spatiotemporal entities. Each item of interest that someone has singled out, no matter how small, is represented in the database by a keyed and indexed XML document of the appropriate document type. These atomized items can then be combined into larger configurations, as needed, to provide different views of the data to end users.
XML document types have been specified using XML Schema for the following categories of OCHRE database items. The brief descriptions below can be supplemented by consulting OCHRE: An Online Cultural and Historical Research Environment (Eisenbrauns, 2012), which is out-of-date in some respects but still useful. Note that the user interface for adding content to the OCHRE database employs different names for some categories to make them less abstract and easier to understand; e.g., Agent items are called “Persons & organizations” in the user interface, Spatial items are called “Locations & objects,” Temporal items are called “Periods,” and Attribute items are called “Variables.”
Two of the categories of database items described below have subcategories: the Attribute category and the Tree category. These subcategories correspond to subclasses of the corresponding ontological class in CHOIR. The fact that a database item belongs to a subcategory of a main category is represented in the database, not by a different XML document type, but by an internal element within the XML document. The subcategory specified in this internal element triggers the appropriate behavior in the software for handling an item belonging to that subcategory.
Project
A database item in the Project category represents a research project that controls the database items that were added to the database by members of the project. Items are associated with projects by a link (pointer) stored in each item that contains the unique identifier (database key) of the relevant Project item.
The data in the OCHRE database is organized by projects. A project’s director or designated project administrator determines who can view or edit the project’s data. There is one and only one Project item for each project. Tree-Group items (see below) are used to organize Project items in groups and sub-groups that indicate associations among separate projects. Tree-Parthood items (see below) are used to organize Project items into nested recursive hierarchies to represent sub-projects that are a constituent parts of larger projects and not merely associated with them.
Agent
A database item in the Agent category represents a social agent of any kind, as defined by the project that created the item. An Agent item might represent an individual person or a collective group. The agent may be real or fictitious and may be contemporary or historical. All items in the database are attributed in some way to a person or persons represented by an Agent item, even if the attribution is made only implicitly to the members of the research project that added the items to the database.
The members of a project are represented by Agent items, which are linked to the database items they create when they enter data into the database. Agent items may also represent persons outside the project who are responsible for the project’s data in some way as authors, editors, observers, photographers, illustrators, resource creators, or data-entry staff. In this way, all the information in the database is normally attributed to one or more named agents. Tree-Group items (see below) are used to organize Agent items in named groups and sub-groups.
Spatial
A database item in the Spatial category represents a spatially situated unit of observation (or imaginary spatial unit) of any size or kind, as defined by the project that created the item. Tree-Parthood items (see below) are used to organize Spatial items into nested recursive hierarchies that represent relations of strict spatial containment. For example, a project in archaeology or art history might create Spatial items and organize them via a Tree-Parthood item to represent geographical regions containing settlements containing stratigraphic layers containing buildings containing artifacts containing components of those artifacts.
Temporal
A database item in the Temporal category represents a temporal unit of any duration, be it a geological era or a cultural period or sudden event, as defined by the project that created the item. Tree-Parthood items are used to organize Temporal items into nested recursive hierarchies that represent relations of temporal sequencing and sub-sequencing, i.e., sub-periods may be nested within longer periods of time. For example, a project in history might create Temporal items and organize them via a Tree-Parthood item to represent cultural ages containing historical eras containing political periods (e.g., royal dynasties) containing the reigns of particular rulers.
Epigraphic
A database item in the Epigraphic category represents an epigraphic unit of any size, i.e., some part of the physical expression of a particular text. For example, an Epigraphic item could represent (in the case of codices) a book, leaf, page, column, line, character, or smaller grapheme. This hierarchy of possible epigraphic units is just an example: OCHRE does not prescribe the way the text will be divided into epigraphic components. This is determined by the project that created the Epigraphic items or by the person who analyzed the text in accordance with the degree of atomization they need to do their research. In some cases, not just individual characters but diacritical marks will be represented by individual Epigraphic items. In other cases, it may be sufficient to have Epigraphic items that represent entire lines or pages and not subdivide the text any further.
Tree-Parthood items (see below) are used to organize Epigraphic items into nested recursive hierarchies that represent part-whole relations within the epigraphic dimension of the text, e.g., to show that the text consists of a book that contains leaves that contain pages that contain columns that contain lines that contain characters that may contain smaller graphemes. Note that an Epigraphic item represents an actual region of inscription in a particular text. It does not represent an ideal sign or character in a writing system, which would be represented instead by a Sign item. Epigraphic items may contain link(s) to the relevant Sign item(s) instantiated in the text or even to a particular allograph of a sign, which is useful in some cases, but this is not required.
Discourse
A database item in the Discourse category represents a discourse unit of any size, i.e., some part of the linguistic meaning of a particular text, large or small. For example, a Discourse item could represent a chapter (considered as a unit of discourse), paragraph, sentence, clause, phrase, word, or morpheme. This hierarchy of possible discourse units is just an example: OCHRE does not prescribe the way a text will be divided into linguistic or grammatical components, which is determined by the project that created the Discourse items or the person who interpreted the text.
Tree-Parthood items (see below) are used to organize Discourse items into nested recursive hierarchies that represent part-whole relations within the discursive dimension of the text, e.g., to show that the text consists of chapters that contain paragraphs that contain sentences that contain clauses that contain phrases that contain words that contain morphemes. A Discourse item will normally contain links to Epigraphic items that have been read as constituting the discourse unit being represented.
Sign
A database item in the Sign category represents a graphic sign or character in a writing system of any kind, e.g., alphabetic, syllabic, logosyllabic, or logographic. A Sign item usually contains a Unicode codepoint so it can be displayed using a Unicode font, but if Unicode does not contain the relevant sign then an image of the sign or the conventional Roman-alphabet transcription of its name can be included within the Sign item. A Sign item may contain information about the different reading values of a sign and its allographic variants (e.g., upper-case “A” and lower-case “a” are allographs of the same sign in the alphabetic writing system used in this website, and this sign also has different phonetic reading values in different contexts).
Tree-Group items (see below) are used to organize Sign items in named groups and sub-groups to represent a writing system. Hierarchical nesting of a Sign item within another Sign item is used to represent a compound sign.
Text
A database item in the Text category normally contains a link to a Tree-Parthood item that represents a hierarchy of Epigraphic items plus a link to another Tree-Parthood item that represents a hierarchy of Discourse items. In this way, the Text item represents a particular text in both its epigraphic and discursive dimensions, which are too often conflated in digital humanities.
However, there is no requirement that a Text item be linked to both kinds of hierarchy. In some cases, only the epigraphic dimension of the text is represented (e.g., in the case of an undeciphered text) or only the discursive dimension is represented (for research that can ignore the physical expression of the text). Likewise, there is no requirement that the Text item be linked to only one epigraphic hierarchy and only one discourse hierarchy. Multiple analyses of the same text on both the epigraphic and discursive levels can be represented by linking to any number of Tree-Parthood items representing multiple epigraphic hierarchies and multiple discourse hierarchies. Branching within the same hierarchy can also be used to represent different readings of portions of the text.
Epigraphic items, Discourse items, and Text items (via Tree-Parthood items) can be used to represent texts in any genre and language that have been written using any writing system. They can even represent born-digital texts. Epigraphic, Discourse, and Text items are normally used to represent texts that are objects of study and analysis in their own right in the context of historical or literary research. Other kinds of digital texts (e.g., scholarly reports and secondary literature) are normally represented by Resource items (see below). But the decision concerning how to represent a given text is up to the project.
Tree-Group items (see below) are used to organize Text items in named groups and sub-groups.
Lexical
A database item in the Lexical category represents a word (lemma) in a dictionary or glossary for a particular language. A Lexical item may contain just one meaning (definition), several meanings, or a hierarchy of meanings and sub-meanings for each grammatical form of a word, with the option of including textual citations of the use of the grammatical form in context. Orthographic variants of each grammatical form of the word may also be included. Depending on how much detail is included, a Lexical item may contain only a brief glossary entry or a full OED-style dictionary entry.
Lexical items contain links to Discourse items that instantiate the word in its particular grammatical forms and orthographic renditions within particular tests. Tree-Group items (see below) are used to organize Lexical items in named groups and sub-groups to represent a dictionary or glossary for a particular language or dialect.
Bibliographic
A database item in the Bibliographic category represents a bibliographic reference to a published work. OCHRE can link Bibliographic items to the Zotero online citation system to automatically populate the content of the bibliographic reference and style it according to the user’s preference. Tree-Group items (see below) are used to organize Bibliographic items in named groups and sub-groups to represent a citation list or bibliography.
Resource
A database item in the Resource category represents an external digital resource of any kind that resides outside the OCHRE database and is fetched dynamically as needed from an FTP server or HTTP Web server; for example, a 2D image, 3D model, document, spreadsheet, geospatial shapefile, audio file, video clip, etc. A Resource item contains the name, description, and URL of the external digital resource. Like any other OCHRE database item, it may be linked to Attribute items which are linked in turn to Value items (or contain values) to represent the properties of the digital resource, such as its metadata. Tree-Group items (see below) are used to organize Resource items in named groups and sub-groups.
Concept
A database item in the Concept category represents a concept defined by a research project that does not correspond to any of the built-in OCHRE categories or subcategories. For example, units of measurement can be represented as Concept items, which can be linked to Attribute items to indicate the units of a numeric attribute. Likewise, artifact styles can be represented as Concept items, or anything a project needs to describe and relate to other database items.
Tree-Parthood items (see below) are used to organize Concept items into nested recursive hierarchies that represent semantic class-subclass relations among Concept items. In addition, Tree-Group items (see below) may be used to organize Concept items in named groups and sub-groups.
Attribute
A database item in the Attribute category represents an attribute or variable (qualitative, quantitative, or relational) that has been defined by the project or borrowed from another project. Any database item in any category may contain a link to one or more Attribute items that indicate its properties. Each Attribute item will either contain a link to a Value item (for qualitative nominal or ordinal attributes) or else the Attribute item will itself contain the value (for quantitative and relational attributes). The term “property” is used in OCHRE to refer to the attribute-plus-value, not just the attribute alone.
Linking database items in this way results in an item-attribute-value triple that makes a “statement” about an entity, analogous to the subject-predicate-object triples in RDF. Each such statement in the OCHRE database can be credited to a named author or observer by means of a link to an Agent item.
Tree-Group items (see below) are used to organize Attribute items in named groups and sub-groups. In addition, there is a Tree-Taxonomy item for each project that organizes its Attribute items and Value items in a taxonomic hierarchy (described in more detail below).
An Attribute item may contain internal links to one or more other Attribute items with an indication of the semantic relation between them: “close match” (synonym), “broader term,” “narrower term,” or “related term.” This permits a thesaurus-style view of a project’s terminology to be generated from a list of Attribute items (the list itself will be represented by a Tree-Group item).
There are several subcategories of Attribute items:
Nominal
An Attribute item in this subcategory contains the name and description of a nominal attribute whose possible values have no inherent order. A nominal property is attributed to a database item by linking that item to an Attribute item which has an internal XML element that indicates it belongs to the Nominal subcategory. The Attribute item in turn contains a link to a Value item.
Ordinal
An Attribute item in this subcategory contains the name and description of an ordinal attribute whose possible values have a rank order (e.g., the values small, medium, and large for an ordinal attribute called Size). An ordinal property is attributed to a database item by linking that item to an Attribute item which has an internal XML element that indicates it belongs to the Ordinal subcategory. The Attribute item in turn contains a link to a Value item.
Boolean
An Attribute item in this subcategory contains the name and description of a logical attribute that takes a Boolean value, i.e., true or false. A Boolean property is attributed to a database item by linking that item to an Attribute item which has an internal XML element that indicates it belongs to the Boolean subcategory. The value of the attribute is stored internally in the Attribute item in an XML element of the xsd:Boolean data type.
Integer
An Attribute item in this subcategory contains the name and description of a numeric attribute that takes an integer value. An integer property is attributed to a database item by linking that item to an Attribute item which has an internal XML element that indicates it belongs to the Integer subcategory. The value of the attribute is stored internally in the Attribute item in an XML element of the xsd:integer data type.
Decimal
An Attribute item in this subcategory contains the name and description of a numeric attribute that takes a decimal value. A decimal property is attributed to a database item by linking that item to an Attribute item which has an internal XML element that indicates it belongs to the Decimal subcategory. The value of the attribute is stored internally in the Attribute item in an XML element of the xsd:decimal data type.
Date
An Attribute item in this subcategory contains the name and description of an attribute that takes a calendar date as its value. A date property is attributed to a database item by linking that item to an Attribute item which has an internal XML element that indicates it belongs to the Date subcategory. The value of the attribute is stored internally in the Attribute item in an XML element of the xsd:date data type.
Coordinates
An Attribute item in this subcategory contains the name and description of an attribute that takes map coordinates as its value. A coordinates property is attributed to a database item by linking that item to an Attribute item which has an internal XML element that indicates it belongs to the Coordinates subcategory. The value of the attribute is stored internally in the Attribute item in XML elements that store unprojected geographical coordinates or planar coordinates with information about the associated map projection (e.g., UTM coordinates).
Alphanumeric
An Attribute item in this subcategory contains the name and description of an attribute that takes an alphanumeric string as its value. An alphanumeric property is attributed to a database item by linking that item to an Attribute item which has an internal XML element that indicates it belongs to the Alphanumeric subcategory. The value of the attribute is stored internally in the Attribute item in an XML element of the xsd:string data type. An alphanumeric value is similar to a nominal value, insofar as they both consist of character strings, but alphanumeric Attribute items are not linked to Value items and are not part of the project’s taxonomy.
Serial Number
An Attribute item in this subcategory contains the name and description of an attribute that takes an integer serial number as its value. A serial number property is attributed to a database item by linking that item to an Attribute item which has an internal XML element that indicates it belongs to the Serial Number subcategory. The value of the attribute is stored internally in the Attribute item in an XML element of the xsd:integer data type. Serial number Attribute items are similar to an integer Attribute items but they behave differently because the software automatically increments the serial number each time the Attribute item is used.
Relational
An Attribute item in this subcategory contains the name and description of an attribute that takes as its value the UUID database key of another item in the database. A relational property is attributed to a database item by linking that item to an Attribute item which has an internal XML element that indicates it belongs to the Relational subcategory. The value of the attribute is the UUID database key of the related item, which is stored internally in an XML element in the Attribute item.
Relational Attribute items are used for project-defined relations between items, spanning hierarchies and even categories and supplementing the inter-item relations created by means of Tree items (see below). In other words, a named relation between any two items in the database can be created using a relational Attribute item. For example, a Spatial item might contain a link to a relational Attribute item named “is above” that contains as its attribute value the UUID database key of another Spatial item to represent the fact that one thing is situated spatially above another.
A group of database items that have been linked together by relational Attribute items can be regarded as a network graph. The OCHRE user interface can use these inter-item relations to display visualizations of network graphs using node-link diagrams and can analyze the networks using standard graph-analysis algorithms. This is valuable for social network analysis, for example, to identify clusters and cliques in a social network of Agent items.
The combination of the hierarchies of database items organized by Tree items and the cross-hierarchy links created by relational Attribute items — not to mention the internal XML hierarchies frequently found within database items belonging to other categories — yields a semistructured graph database that can represent any kind of information.
Value
A database item in the Value category represents a qualitative nominal or ordinal value that has been defined by the project. A qualitative property is attributed to a database item by linking that item to a nominal or ordinal Attribute item (see above) that in turn contains a link to a Value item, which stores the qualitative value internally as a character string. Many different item properties can therefore use the same value by pointing to the same Value item. This avoids error-prone and storage-consuming duplication of data and maintains the regularity and “cleanliness” of the data because the name of the value exists in only one place in the database and a change in the value’s name will be propagated instantly wherever it is displayed. This also permits the organization of qualitative values as taxonomic terms independently of the properties in which they are used.
Quantitative integer and decimal values do not need to be defined in OCHRE because the ontological class of numbers is, in effect, predefined for everyone and is digitally represented by standard data types (e.g., the XML Schema (XSD) data types) that are interpreted in the same way everywhere on the Web. The same is true of Boolean values, calendar dates, and map coordinates.
A Value item may contain internal links to one or more other Value items with an indication of the semantic relation between them: “close match” (synonym), “broader term,” “narrower term,” or “related term.” This permits a thesaurus-style view of a project’s terminology to be generated from a list of Value items (the list itself will be represented by a Tree-Group item).
Query
A database item in the Query category represents the search criteria for a database query that can be executed to select database items. Search criteria can be named and saved in a Query item for repeated use. The criteria can be quite complex, involving both the intrinsic attribute-value properties of database items and their extrinsic relations with other items. The extrinsic relations among database items can be specified by Tree items (see below) or by relational Attribute items.
Boolean algebraic operators (AND, OR, NOT) and relational operators (< , > , <= , >= , == , !=) are supported in query expressions, as well as the nesting of expressions via parentheses. Readable query expressions are constructed in the back-end user interface of the database via drop-down pick lists. The user can easily specify the scope of the query by selecting the projects and item categories to include and then specify the attributes, attribute value ranges, and operators of the search criteria.
When the query is executed, the user-created query criteria are automatically converted to XQuery and sent to the native-XML database management system for execution, returning a list of item identifiers (UUID database keys) as the query result set. Separate queries can be chained sequentially to select items based on the intersection or union of their result sets.
Tree-Group items (see below) are used to organize Query items in named groups and sub-groups.
Tree
A database item in the Tree category contains an internal tree structure that is used to organize other database items in hierarchies and lists. Many different Tree items can link to the same database item and a single Tree item can have multiple links to the same database item from different locations in its tree. Thus an item may occur in more than one hierarchy or in more than one branch of the same hierarchy. This allows multiple overlapping configurations of the same information without duplicating any of the database items that represent that information.
There are four subcategories of Tree items:
Tree-Parthood
An item in the this subcategory organizes other database items in a recursive nested hierarchy in which each item is the child of a parent item that belongs to the same category, e.g., Spatial items nested within Spatial items. A parent item may have one or more child items but each child item has one and only one parent, thus there is a single parent item at the root of the tree.
The meaning of the hierarchical relations in a Tree-Parthood item depends on the category of items it organizes. In the case of Spatial, Temporal, Epigraphic, and Discourse items the tree structure represents part-whole relations (mereological “parthood”). In the case of Concept items the tree structure represents semantic class-subclass relations, which some philosophers consider to be parthood relations (see David K. Lewis, Parts of Classes [1991]).
Tree-Group
An item in this subcategory organizes other database items in a non-recursive hierarchy of parent items and child items that may belong to different categories. A Tree-Group item does not represent parthood relations or class-subclass relations but rather the grouping and sub-grouping of items according to some cataloging scheme. For example, catalogues of Agent items and Resource items can be represented by means of Tree-Group items.
A Tree-Group item can be used to organize database items in a flat list (ordered or unordered) rather than a hierarchy. This is simply a matter of including only one generation of child items (all siblings) under a root item. In this case, the root item represents the list itself.
Flat lists of items can be displayed in the user interface as tables with rows and columns, where each item in the list is a row and the properties of the items (variable-value pairs) appear as columns in which the property variables are the column headings and the property values are shown in the table cells. The items in a list can also be displayed in the user interface as network graphs using node-link diagrams.
Executing a query in OCHRE yields a named list of database items that is organized by a Tree-Group item, which can be saved and displayed as a data table or as network graph.
Publishing data from the database on the back end of the OCHRE platform to the front end yields a list of “denormalized” (structurally simplified) XML documents and equivalent JSON documents suitable for use by Web app developers via the OCHRE Web API. Each published document will normally correspond to a real-world entity such as an artifact or text, or to some other entity or topic that app developers will prefer to handle as a modular unit of information. Although these published documents are not database items that conform to the XML document types (categories) in the back-end database, they contain persistent URLs in their internal elements that include the same UUID keys as the database items to which the contents of the published documents correspond. Thus Tree-Group items in the back-end database can use these database keys to keep track of what has been published to the front end of the platform.
A Tree-Group item can be used to organize other Tree-Group items in named groups and sub-groups.
Tree-Sequence
An item in this subcategory organizes other database items into sequences and sub-sequences, e.g., to represent a timeline of temporal events or a flowchart of processes. In this kind of Tree item, the internal tree structure is supplemented by elements that represent looping (cycles) within the tree.
A Tree-Group item can be used to organize Tree-Sequence items in named groups and sub-groups.
Tree-Taxonomy
An item in this subcategory has a modified internal tree structure that organizes Attribute items and Value items in a taxonomic hierarchy in which Attribute items alternate with Value items at successive levels of the hierarchy. This allows a project to specify the allowable values for each qualitative attribute by making them children of that attribute in the tree.
It is possible for an Attribute item to be a child of a Value item that is itself a child of the same Attribute item. This recursive structure, repeating the same Attribute item at lower levels of the taxonomic hierarchy as a descendant of itself, represents the genus-species relation between more general and more specific values of an attribute, allowing queries to use a general term to find more specific terms within the taxonomic hierarchy and vice versa.
Each OCHRE project has one and only one Tree-Taxonomy item, which specifies the taxonomy used in the properties of database items owned by that project.
Linking OCHRE Database Items to External Controlled Vocabularies
Several categories of OCHRE database items can be linked semantically to external controlled vocabularies of terms and concepts such as WikiData and the Getty Vocabularies. This can be done for the following categories: Agent, Spatial, Temporal, Text, Resource, Concept, Attribute, and Value.
Users can enter and save SPARQL queries associated with an item in any of these eight categories. The SPARQL queries are used to search a given external vocabulary and find the URLs of published concepts that could be linked to the item. An OCHRE item can be semantically linked to one or more external terms or concepts from any number of published vocabularies. The semantic linkage may be characterized as a “close match” (synonym), “broader term,” “narrower term,” or just a “related term.” If desired, the external term can be displayed in the OCHRE user interface as the name of the item instead of using a project-defined name. This will often be appropriate in the case of a close semantic match, allowing projects to employ standard terms curated by reputable organizations in various domains of research, such as the Getty Research Institute in the domain of cultural heritage.
These semantic linkages clarify the meaning of terms used by OCHRE projects and provide interoperability with other systems. They solve the problem of homographs (i.e., words that have the same written form but different meanings, such as “light” as in weight versus “light” as in color). They allow OCHRE projects to employ any language, not just English, and to translate their terms using standard terminologies. More generally, semantic linkages to external controlled vocabularies enable cross-project querying within the OCHRE environment among projects that use different nomenclatures. If each project links its terms to one or more external controlled vocabularies, an OCHRE database query can retrieve similar items that have been described differently by different projects. Alternatively, a project can borrow a taxonomy or part of a taxonomy from another project entirely within the OCHRE database platform itself, as long as the second project has made its taxonomy public for other OCHRE projects to use. This provides another (and often more efficient) way to achieve semantic integration among projects.