The Digital Archaeological Record--an Analytic Data Repository for Archaeology

The Digital Archaeological Record--an Analytic Data Repository for Archaeology
In the past 150 years, the discipline of archaeology has changed dramatically; excavation procedures, field methods, and record keeping have both improved and become formalized. Ahead of the digital era, the physical records of an excavation (the papers, data tables, journals, and monographs) were preserved as artifacts alongside the excavated materials in museums and repositories. More recently, Archaeologists have been quick to adopt new technology from punch cards in the 1960s to spreadsheets, databases, GIS, and 3D scanning. Yet, these modern files, images, data sets, and documents, if not properly preserved, are more fragile than the objects they describe. With the Digital Archaeological Record (tDAR --, we hope to change this.

tDAR was designed as a domain-specific digital repository, focused on preservation of, and access to archaeological documents, reports, data sets and images. The most successful digital repositories provide additional value to their users beyond the core mission of preservation. Examples including ArXiv ( or the University of Rochester's digital repository ( are successful because of additional factors such as reputation or community. For tDAR, the additional value is created through research tools developed on top of the repository. These tools aim to promote new synthetic and comparative research using the data sets stored within the repository.

tDAR's architecture includes three architectural components, a backend preservation repository based on the California Digital Library's Micro-Services model, an interactive web interface, and a research platform. Metadata is stored within tDAR using a extension of the Library of Congress MODS schema, modified to add archaeologically significant metadata. This includes descriptive metadata about the site, location, culture, materials found, among other attributes. Data sets ingested into tDAR function differently from data sets in a traditional repository. Once a data set has been uploaded, users are guided through the process of documenting their data set within the system. This process is designed to focus on identifying non-machine discernible information such as, whether numeric data represents a measurement or count and translating coded values or lookup tables into human-readable values. Once complete, tDAR's additional features provide unique opportunities to compare, contrast, and analyze data within the system.

A significant challenge for many disciplines is the ability to perform synthetic research. Data from archaeological excavations commonly include a mixture of standardized observational data such as Munsell codes to record sediment color and GPS/GIS readings are combined with more qualitative assessments about artifact types, or the amount of "burning" on faunal elements. Within the context of a specific site, this is easily reconcilable --as the team develops a common understanding of these terms. However, utilizing these classifications outside of a given site, region, or community of archaeologists, can be challenging. Certain data may lend itself to the application of universal classification schemes --including data that is either more scientific or is derived from a well-documented period. However, more qualitative data may not be as easily mapped to a universal classification model --as definitions of terms will vary between archaeologists or over time. Instead, contributors may provide, or develop a unique classification scheme (ontology) to describe their data. These two approaches represent well-tread road within both research and practice, with distinct benefits to each side. However, to perform any useful comparison, a mapping must be developed.

tDAR does not force users to map data to universal data models or classification schema. Instead, the application has developed a different approach --maintain the data in its original, and capturing the intent of the archaeologist. The application was been developed with reference ontologies available for certain data elements including faunal species data among others. tDAR enables users to create additional ontologies within the repository, or upload existing one using the OWL format. Once a column of data is associated with an ontology, users are presented with straightforward tools to map the unique data values to terms within the ontology. We believe that this process serves a number of purposes, not only does it maintain and represent the data as it was collected, but it provides opportunities for collaboration and communication within the discipline as archaeologists share data, and discuss intents.

Once data has been mapped, the application guides the user through the data integration process of selecting data sets, identifying columns to compare, fine tuning any mapping issues, and producing the new combined data set. In an analog context, or outside of tDAR, this process can be time-consuming for one data set, and overwhelming for multiple. Within tDAR, what would have been a complex process taking days or weeks when performed manually becomes much more fluid, taking hours. With the technology performing much of the heavy lifting, it leaves the archaeologist to focus on the specific questions and details of their research.

While tDAR is still developing, tDAR's data integration has already enabled Archaeozoologists to ask novel questions about the cultural and ecological circumstances under which species are overhunted or subsistence strategies change. It is our hope that tDAR's core values of access, preservation, and integration will enable us to ask, understand, and evaluate new questions and ideas otherwise impossible within the field of archaeology.

