Digital Humanities Lab - Universität Basel (University of Basel)
Digital Humanities Lab - Universität Basel (University of Basel)
Digital Humanities Lab - Universität Basel (University of Basel)
Digital Humanities Lab - Universität Basel (University of Basel)
Introduction
Digital storage systems are like tins. They might have some content but the only evidence are labels, captions or any kind of lettering. In addition maybe the weight of the tin could be taken to judge about its inside but still to get full assurance the tin has to be opened to be able to identify the content correctly. In order to open it an appropriate tool is necessary and another one to take something out of it, something like a fork or a spoon.
A digital storage like a hard drive behaves very similar to that but in case of magnetic recording the data is not only invisible from in and outside the storage, not even can it be detected physically by a human being. Digital data has no appearance that can be touched, we have no sense for it.
In addition to that any bit stream stored on a data carrier needs to be migrated after a certain time to ensure accessibility and consistency because following three factors endanger the digital archiving process:
The storage media decays over time and it can fail by aging.
Hardware gets incompatible so that accessing the data becomes impossible.
File formats – technical metadata – change and develop over time. Therefore programs to interpret the file content might not be available in future.
Missing contextual metadata make any digital bit stream more or less useless.
Migration
Each of these possibilities is a major drawback for cultural heritage preservation and each one renders data into digital waste, data that is either lost completely or without meaningful sense. Continuous migration and copy storage content in periodic intervals are today’s best practice to transform binary information into the future. Theoretically migration works very well because digital data can be copied without loss, which is one of the major advantages of any digital code like binary information or our even alphabet. The ability to copy data lossless allows not only the arbitrary replacement of the data carrier, it also allows to increase redundancy by storing multiple copies, as e.g. proposed by LOCKSS1. The down side of migration is the financial effort associated with it. Independent of the specific costs per migration, archiving is getting expensive sooner or later because of the short lifetime of the technology. In addition its dependence on numerous cascaded technologies makes it a fragile process that can cause dramatic data loss if only one of the incorporated components fails.
Migration can be omitted if the storage media fulfills the following requirements:
It must contain human readable metadata in order to describe the archived object and it's context.
Information on how to recover the original file (the decoding manual) must be part of the metadata. This knowledge is the key to interpret the archived byte stream.
The file format must be well documented (open format) and it must be widely used to ensure its accessibility over time.
Digital data is stored hardware independent as far as possible. Thus it is not affected by the change of technology.
If a medium claims to be suitable for long-term preservation of digital data it has to fulfill more requirements. Lunt et al.2 identified 7 characteristics, which are particularly interesting to archivists regarding preservation of digital data. The first says, there shouldn’t be active maintenance or migration required to preserve actual data. They continue with: 2) no special storage conditions are necessary to preserve the storage media; 3) a minimum lifetime of at least 100 years, preferably more should be supported; 4) no energy is required to maintain the data; 5) the media is easily transported; 6) the data format is widely adopted; 7) the medium has a large storage capacity.
Bits-on-Film Approach
Facing those facts the Digital Humanities Lab of the University of Basel has developed a workflow for migration-less preservation of digital data on optical media called “Monolith”. It combines the advantages of photographic material and standard digital imaging technology to create a long-term migration-less archiving system. This is achieved by the hybrid characteristics of the optical carrier. Any arbitrary binary bit stream is put right besides human readable technical, structural, and contextual metadata. Original files of any appropriate format are stored on film as visual 2D-barcodes. Technically spoken every bit of the original bit stream (the file to be stored) is converted into a spot representation on film. A full bit stream then results in a two-dimensional image, an “image of bits“. In other words the logical data-bits are transformed and represented by dye or silver of photographic film. This process can be regarded as a materialization of binary data, which becomes visual and physical. Monolith has no limitation regarding the format of the file to be archived. However, the documentation of the file format must be part of the metadata and therefor it should be an open standard like the widely used PDF-A or image formats like JPEG2000 (3). Metadata can be stored binary or as human readable text information, e.g. encoded and written in letters on film as any of the well-known standards like Dublin Core, METS or others.
This approach has various advantages: First, and most important, the bit stream on film can be read/captured by any digital camera, there is no special hardware necessary to transform the physical representation of the bits back into logical states within the computer system. This can be compared to the process of seeing. As human beings see letters – in fact digital data – the camera sees signs, spots on film – binary data; Monolith has a visual interface. The decoding of the binary bit stream is well defined because the explanation of the code is an inseparable part of the technical metadata set written on film. Like no other storage media Monolith can not only contain barcodes and text but images – e.g. thumbnails – as well. Besides its technological features the storage film has another advantage. It can be stored the same way as regular archival film. There are no special storage conditions necessary nor does the film need any specific care. Therefore, Monolith can be regarded more as an “engraved stone“ than as a data storage for computer systems. It is a “Digital Rosetta Film“.
But is the application of optical film for archival purposes reasonable these days? Many companies stopped production high fidelity film material and very likely the quality – not stability – of film will drop in the future. For the representation of photographic images this is of course a major draw back since image quality is directly related to film quality. In case of Monolith this is irrelevant. The only function the film has to fulfill is to separate dots spatially, requirements that are achieved by most photographic materials. The quality of digital originals will not be disturbed by film quality because they are stored as binary data and therefore decay of the material has little impact. In addition any well-known error correction method can be applied. The concept of a binary representation of data is a simple but a very efficient solution and it is the reason why every computer storage system is adapting this concept4. Even if film won't be available in future, for any existing Monolith this means no impact. Not for the sustainability nor for the future ability for data recovery.
All those features show that materialized bits are not only a nice concept to mimic historic documents but also an efficient way to transport digital cultural heritage into the future5. In the presentation we will show how Monolith works and what its advantages are.
MonolithTM on 35mm color material
Fig. 1: Monolith™ includes all necessary information for future information recovery. Especially contextual metadata and the decoding manual to understand the structure of the bit-pattern.
Conclusion
Monolith is a solution that has made its way from university to a commercial company. It shows that there is a possibility for an alternative solution for classical digital archiving, that doesn't need to be migrated. The advantages are not only of technical but also of economical nature. Even if costs for plain storage media decrease with time, total costs of ownership for archived digital data increased in the last years continuously and migration is the primary costs driver of archiving. Therefore Monolith can not only be an answer for technological but also for economical challenges on the way of digital information to the future.
References
1. Vicky Reich & David S.H. Rosenthal.LOCKSS (Lots Of Copies Keep Stuff Safe), Presented at Preservation 2000: An International Conference on the Preservation and Long Term Accessibility of Digital Materials, December 7-8, 2000, York, England. Also published in The New Review of Academic Librarianship, vol. 6, no. 1, 2000, pp. 155-161. doi:10.1080/13614530009516806
2. Barry M. Lunt, Matthew R. Linford, Robert Davies (2012), Research on Another Permanent Data Storage Solution, Proc. Archiving 2012, IS&T, pg. 19-21
A list of file formats suited for archiving can be found at www.kost-ceco.ch/wiki/whelp/KaD/index.php.
4. Shaw Rodney, Selected Reading in Image Evaluation, SPSE, ISBN 0-89208-085-X
5. Florian Müller, Peter Fornaro, Lukas Rosenthaler, Rudolf Gschwind (2010) PEVIAR: Digital Originals, ACM Journal on Computing and Cultural Heritage, Volume 3, Issue 1. ACM 2010
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
Complete
Hosted at École Polytechnique Fédérale de Lausanne (EPFL), Université de Lausanne
Lausanne, Switzerland
July 7, 2014 - July 12, 2014
377 works by 898 authors indexed
XML available from https://github.com/elliewix/DHAnalysis (needs to replace plaintext)
Conference website: https://web.archive.org/web/20161227182033/https://dh2014.org/program/
Attendance: 750 delegates according to Nyhan 2016
Series: ADHO (9)
Organizers: ADHO