Entity and Property Inference for Semantic Archives
The National Archive of Torre do Tombo, TT in the sequel, is the backbone of the Portuguese institutional memory, managed by DGLAB, the public administration partner in EPISA. It holds the most relevant cultural heritage collection, largely digitized and accessed both by history researchers and by laypeople from all the Portuguese-speaking countries and beyond. The vast amounts of archival description metadata help them find and contextualize the documents they seek. Being at the forefront of the archival world, TT designed its online description system 20 years ago, according to the standards by the International Council of Archives (ICA). Metadata in TT is mainly composed by textual descriptions of the context and contents of the documents. Meanwhile, the archival assets evolved to encompass growing amounts of born-digital information and the interoperability requirements of cultural heritage repositories grew. A new generation of description tools is needed that includes libraries, archives and museums (LAM), and is more fine grained, more flexible and specially more machine-actionable. These are the characteristics of linked open data (LOD) in semantic networks and preliminary work in TT led to the choice of CIDOC Conceptual Reference Model (CRM)(ISO, 2014), a standard developed in the museums community. The conceptual model of CIDOC CRM is a graph where nodes are entities and edges are relations. The huge step represented by such a paradigm shift raises many issues, some of which this project is devoted to solve. The first problem is the effective migration between the ICA and the CIDOC CRM standard, requiring both the use of existing crosswalks and the inference of the new relations with semi-automated methods. The second problem is the support to description, with tools that automate part of the generation of the more complex CIDOC CRM metadata records. The third has to do with interfaces for both human users and machines, improving user access to archives and promoting interoperability with both archives and global semantic networks. The role of TT as a large archival institution (it integrates the headquarters in Lisbon and the majority of the district archives) and also as a regulator for the state, municipal and private archives, ensures the impact of the project results in case the paradigm shift becomes a rule. Furthermore, the extensive record of innovation of TT makes it a respected voice in the ongoing debate on the archival description evolution. Three main impacts are expectable from the project. The proposed change in cultural heritage metadata will give users a better knowledge of the repository and an improved tool for more precise and richer retrieval. The second impact is a stronger presence in the aggregators, mainly in Europeana, that already uses a similar description approach. The third impact is the potential to deal with metadata assets in different platforms, from Excel files to archival description systems, and thus contribute to the integration in the Digital Archive of the Public Administration of diverse administrative as well as research assets.