Many libraries and archives are in the process of going digital. The advantages of digital technology are well known and its adoption by libraries and archives seems inevitable, inexorable and well-motivated. Yet the fact remains that several key issues concerning the long term preservation of digital technologies remain unsolved. Two key problems are the fragility of digital media (its shelf life compared with, say, non-acidic paper is extremely short) and, perhaps even more intractable, is the rate at which computer hardware and software become obsolete. Many cases have been cited in which valuable data has already been lost because of obsolescence. Moreover, as of today no one knows how to ensure the long-term preservation of multimedia documents nor how to ensure the integrity of documents that may have many links to other documents that may be anywhere in the world. For a brief overview of some digital preservation issues see  and . These problems have, of course, been exercising the library and archive communities for some time but as yet no one solution or set of solutions has been reached. Solutions need to be found urgently if we are not to sink in what Rothenberg  calls technological quicksand.
Given the problems just outlined, the dominant approach to digital preservation has been that of migration. Migration is the process of transferring data from a platform that is in danger of becoming obsolete to a current platform. This process has both dangers and costs. The notable danger is that of data loss, or in some cases the loss of original functionality or the look and feel of the original platform. For these reasons, some have seen emulation as an alternative and superior strategy. The essential idea behind emulation is to be able to access or run original data/software on a new/current platform by running software on the new/current platform that emulates the original platform. The staunchest advocate of emulation has been Jeff Rothenberg whose views I will examine in some detail.
What are the issues with respect to emulation as a digital preservation strategy? I suggest the picture appears as in Figure 1. I discuss briefly each of the major issues listed.
If emulation is to be adopted, the first question is what is to be emulated? Three options are:
- Emulate applications
- Emulate operating sytems
- Emulate hardware platforms
I will provide some discussion of these options when discussing Rothenberg.
Figure 1: Issues for emulation
It is here that emulation has its strongest attraction. One of the issues for digital preservation is what needs to be preserved in addition to the pure intellectual content of a digital object. In some cases the look and feel of a digital object may be important, as may its interactivity. The fear is that migration to new platforms may lose these aspects, and that emulation may be the only strategy for preserving them. This problem is exacerbated, of course, by the increasing use of more complex digital objects (multimedia and hypermedia).
Some analysis needs to be undertaken to provide a framework for examining the issue of when these aspects of digital objects are important, with a view to providing data for a cost-benefit analysis for the justification of the likely additional costs of preserving these complex aspects of digital objects.
Intellectual Property Rights (IPR) issues may well be involved in emulating either operating systems or applications.
It seems almost certain that the costs of emulation will mean that emulation will not (if unsupported) be an option for every user. This will be one part of a case for wanting trusted organisations that can undertake the work and make this available for others to use. Of course the nature and cost of these organisations are major issues but not issues that only effect emulation.
Also likely to be important for emulation as a preservation strategy are the issues of standards and open specifications. An open environment and the adoption of standards are likely to make emulation more feasible and cost-effective.
It is highly probable that metadata will play a key role in any emulation strategy -- defining precisely what this metadata should be and in what format is likely to be a major task. See ,  and .
As a general comment, if Rothenbergs scepticism with respect to approaches other than emulation is justified, and on the other hand, his optimism with respect to emulation is not justified (an open issue in my view), then the whole digital project is in serious trouble. Therefore I am concerned to examine Rothenbergs attitudes as well as his main technical arguments. I do not attempt to provide a full account of Rothenbergs approach but rather do attempt to raise some issues for discussion.
First I examine his dismissal of other approaches.
Rothenbergs Views of Other Approaches
Reliance on hard copy
As a simple method of preserving a digital document, why not print it out and save it as hard copy? Rothenberg rightly points out that this is not a complete solution since some multimedia documents cannot be properly printed out at all and any interactivity a document possesses will be lost. It is worth noting, however, that if ones sole concern is with the intellectual content of a document and the document is of a fairly simple nature (a word processed document that can be read by a current word processor) this at least provides some form of security. It is also worth noting that preservation strategies have often accepted loss of some of the characteristics of the originals (microfilm loses both the texture and colour of the original). Rothenbergs attitude seems to be that if we cant have everything, we cant have anything.
Reliance on standards
Rothenberg discusses claims made on behalf of relational databases that this provided a kind of lingua franca; since the same mathematical model (Codds) underlies them all, any can be translated into any other. This, as he points out, is limited by the fact that some relational databases use proprietary features. Worse still, paradigm shifts -- such as that from relational databases to object orientated ones -- can bypass or make irrelevant accepted standards. These points are valid, but Rothenberg says that "
standardisation sows the seeds of its own destruction by encouraging vendors to implement non-standard features in order to secure market share." (, p.10) This seems to me a very odd apportioning of blame. It is surely not the fault of standardisation if vendors implement non-standard features; it is the fault of the vendors. Vendors have exacerbated the problem of digital preservation, and users need to fight back. Of course, it suits vendors to make everyone constantly upgrade hardware, software and operating systems, and not to worry about backward compatibility, but this does not make such behaviour good for users.
Reliance on computer museums
The idea that computer museums be established, where old machines would run original software to access obsolete documents (Swade 1998 ) is highly problematic for several reasons that Rothenberg cites. These include:
- It is unlikely that old machines could be kept running indefinitely at any reasonable cost.
- Old digital documents (and the original software needed to access them) will rarely survive on their original digital media. The obsolete document would therefore have to be read by an obsolete machine from a new medium for which that machine has no physical drive, no interface, and no device software. The museum approach would therefore require building unique new device interfaces between every new medium and every obsolete computer in the museum as new storage media evolve, as well as coding driver software for these devices, which would demand maintaining programming skills for each obsolete machine.
- Computer chips have a limited life-time.
Rothenberg does see two possible minor roles for computer museums in digital preservation: testing emulators and helping data recovery.
Reliance on migration
In his report  at least, Rothenberg is extremely scathing about migration. He provides a whole list of reasons why he believes migration to be inadequate as a digital preservation strategy. Rothenberg states that migration is:
and that migration can require new solutions for each new format.
- Labour intensive
- Risky (causing lost or corrupted information)
This is a formidable list and in my view there is some justice in all of the points made, but I think two things need to be borne in mind:
- Migration has been the only serious candidate thus far for preservation of large scale archives. (Emulation has been used in some marginal cases, but currently no major archive will adopt emulation as its primary digital preservation strategy).
- Not all of the points apply with equal force all of the time, with migration many of these claims will vary on a case-by-case basis. Moreover, one has to ask, "Labour intensive, expensive, etc.," compared with what? It is impossible to evaluate these claims against Rothenbergs vision of emulation since so much of it remains, as he says, a long way off.
The Ideal Solution
Rothenbergs idea of the ideal solution is certainly demanding:
an ideal approach should provide a single, extensible, long-term solution that can be designed once and for all and applied uniformly, automatically, and in synchrony (for example, at every future refresh cycle) to all types of documents and all media, with minimal human intervention. It should provide maximum leverage, in the sense that implementing it for any document type should make it usable for all document types. It should facilitate document management (cataloging, de-accessioning, and so forth) by associating human-readable labeling information and meta-data with each document. It should retain as much as desired (and feasible) of the original functionality, look, and feel of each original document, while minimizing translation so as to minimize both labor and the potential for loss via corruption. If translation is un-avoidable (as when translating labeling information), the approach should guarantee that this translation will be reversible, so that the original form can be recovered without loss.
The ideal approach should offer alternatives for levels of safety and quality, volume of storage, ease of access, and other attributes at varying costs, and it should allow these alternatives to be changed for a given document, type of document, or corpus at any time in the future. It should provide single-step access to all documents, without requiring multiple layers of encapsulation to be stripped away to access older documents, while allowing the contents of a digital document to be extracted for conversion into the current vernacular, without losing the original form of the document. It should offer up-front acceptance testing at accession time, to demonstrate that a given document will be accessible in the future. Finally, the only assumptions it should make about future computers are that they will be able to perform any computable function and (optionally) that they will be faster and/o cheaper to use than current computers."
Even supposing that this ideal approach is feasible, doubts would remain, I think, about whether it would be required in all cases. Would the ideal approach be the most cost effective solution for any and all requirements?
How Should Emulation Be Achieved?
The major parts of Rothenbergs approach to emulation are:
- developing generalizable techniques for specifying emulators that will run on unknown future computers and that capture all of those attributes required to recreate the behavior of current and future digital documents;
- developing techniques for saving -- in human-readable form -- the metadata needed to find, access, and recreate digital documents so that emulation techniques can be used for preservation; and
- developing techniques for encapsulating documents, their attendant metadata, software, and emulator specifications in ways that ensure their cohesion and prevent their corruption.
This list implies a very ambitious plan of work and would seem to imply very large overheads. Rothenberg provides a diagram (see Figure 2) which shows how much needs to be encapsulated:
Figure 2. Encapsulation
First, the information that has to be encapsulated comprises the document and its software environment. Central to the encapsulations is the digital document itself, consisting of one or more files representing the original bit stream of the document as it was stored and accessed by its original software. In addition, the encapsulation contains the original software for the document, itself stored as one or more files representing the original executable bit stream of the application program that created or displayed the document. A third set of files represents the bit streams of the operating system and any other software or data files comprising the software environment in which the documents original application software ran.
The second kind of information is an emulator specification which is supposed to:
specify all attributes of the original hardware platform that are deemed relevant to recreating the behavior of the original document when its original software is run under emulation." 
This will include:
.interaction modes, speed (of execution, display, access, and so forth), display attributes (pixel size and shape, color, dimensionality, and so forth), time and calendar representations, device and peripheral characteristics, distribution and networking features, multi-user aspects, version and configuration information, and other attributes." 
The third kind of information to be encapsulated consists of explanatory material, labeling information, annotations, metadata about the document and its history, and documentation for the software and (emulated) hardware included in the encapsulation.
Admittedly, some of this information can be represented by pointers to the information stored in a centralised repository, but it is nevertheless a daunting list, especially since no one currently knows how to produce some of the information (e.g., how does one produce an emulator specification?) Rothenberg himself acknowledges that there could well be IPR issues with respect to operating systems for example.
What should be emulated?
Rothenberg argues that -- normally at least -- what should be emulated is the hardware platform on which an application runs rather than emulating an application or emulating an operating system. This raises quite a large number of issues. With respect to multimedia systems, for example, the number of possible configurations is very large -- consider the rapid rate at which the speed of CD ROMs changed. I suspect that a mere 2x speed CD ROM drive is now practically an obsolete piece of equipment. How many changes in sound cards have there been? There have at least been the changes from 8 bit to 16, 32 and 64 bit, without taking into account the difference in the quality of speakers that different people will have had. The preservation of the precise quality of a given multimedia production would be a formidable challenge.
Discussion of Rothenberg
Rothenbergs approach is highly theoretical and one could almost say absolutist; it seems that nothing short of a complete, once and for all solution will satisfy him. His approach does not seem to concern itself with particularities of the immediate context. In some ways this is admirable, but to my mind it raises great doubts about the feasibility of his approach. Aside from questions about the technical feasibility of the approach, there are equally large doubts about its practical cost-effectiveness. The Task Force report  recognised the need for trusted organisations on a number of grounds. It seems to me that were such organisations to be created, one of their roles could be to provide support for emulation -- under certain well defined conditions. I believe that this would be essential to make Rothenbergs approach work since it seems highly probable that only if such institutions were created could the overheads implied by the approach be met. But, of course, the creation of such institutions will necessarily have to be a political one -- which argues against the kind of one-off complete theoretical solution that Rothenberg demands. Equally important, Rothenbergs approach also will come up against political issues when it comes to IPR issues. The theoretical nature of Rothenbergs approach does not seem to recognise that the current situation we are in could be described as a digital jungle. The extreme pace of technological change is exacerbating the challenge of digital preservation. Digital preservation problems were created when the world moved much more slowly -- from now on it is going to get much tougher.
Bearman on Rothenberg
In an opinion piece in D-Lib Magazine, April 1999 , David Bearman provided some fairly trenchant criticism of Rothenbergs views. At the core of these criticisms was Bearmans view that,
.Rothenberg is fundamentally trying to preserve the wrong thing by preserving information systems functionality rather than records. As a consequence, the emulation solution would not preserve electronic records as evidence even if it could be made to work and is serious overkill for most electronic documents where preserving evidence is not a requirement."
It is important to understand that what Bearman means by a record is something much stronger than what many of us non-archivists understand (in our naivety probably thinking of a database record as a RECORD). In another of his papers  Bearman says,
most information created and managed in information systems is not a record because it lacks the properties of evidence. Information captured in the process of communication will only be evidence if the content, structure and context information required to satisfy the functional requirements for record keeping is captured, maintained and usable."
I have no reason or authority to question this claim, and it is indeed an important one, but I do think it is perhaps a little hard on Rothenberg to criticise him for not anticipating a particularly specialised set of requirements.
Another criticism Bearman makes , is as follows,
"It is worth noting that systems migration literature suggests that a significant contributor to high cost and low reliability migrations is the absence of unambiguous specifications of the source and target environments. The problem, quite often is that either (or both) source and target are protected by proprietary interests of the software producers. If this is a hurdle in systems migration, it is an absolute barrier to the viability of the emulation which will try to replicate a system environment years, decades, or perhaps centuries, after it became obsolete. Rothenberg acknowledges that "saving proprietary software, hardware specifications and documentation, as required by this emulation strategy, raises potential intellectual property issues". He correctly identifies this as "required" for the solution to function, yet refers to its as an "ancillary issue" in the title he gives the two paragraphs (section 8.3) in which it is discussed. Nowhere does Rothenberg suggest the dimensions of the Herculean social task that would be involved in creating a trusted agency for acquiring such proprietary information and administering it over time."
These are important points; Rothenbergs almost casual hand waving references to emulator specifications surely glosses over a potential minefield of difficulties.
Writing in 1997 about the difficulties of digital preservation, Bennett  makes some telling points pertinent to the view that the idea of a complete emulator specification may well be naïve,
. if a digital item is captured today, its components will represent a legacy of technology, possibly from the last five years. An item can be assessed as to the age of its components.
Presuming that the item was captured (written, edited, scanned, composed) on the latest equipment, it is likely that less than 5% of the total is represented by 1997 technology, for example, bug fixes. The latest Microsoft suite of office software (Office 97) will contribute 50% of the technology legacy, but it will be of 1996 vintage, which was when it was tested, possibly on advance shipments of the new hardware. The rest is mainly 1995 (35%), being standard core routines from Windows 95, unchanged by Office 97. Finally, elements of the base MS-DOS operating system (DOS version 7, and DOS emulation code) will remain embedded in the architecture of the PC system, this may still account for perhaps 10%. In contrast, Windows NT and OS/2 were written without any progenitors, and have a completely different composition If the same document, image or spreadsheet were captured in March 1998, the proportions would have changed, particularly if the hardware and software configuration had been kept up to date. In the main, however, many PC users are using software which is based on a platform which is pre-1995. Because it is suitable for their purposes, is reliable, at least with known glitches, they have made no attempt to change the basic configuration, adding components when required, year by year.
.. even the concept of "migration" is not adequate to describe the changes that are occurring in every aspect of technology, hour by hour. We are using hardware and software components that are in a continuous state of transition in our office systems Compaq built up their reputation by guaranteeing that the internal construction of their PC does not change, whether you order 10 or 100 from stock. Many other suppliers deliver varying internal configurations for the same model, which causes many problems during upgrades, maintenance and trouble shooting. Software fixes are embodied within the next release of office packages as they are shipped, and the new configurations are rarely announced. We are using these "chameleon" PC systems to record critical aspects of our culture. The trend is to ever more complex technical implementations, easier to use for the user, but hiding increasingly complex interactions on the inside."
Consider also the report of the emulation experiment reported by Ross and Gow , They experimented with emulating Sinclair Spectrum tapes on a PC.
"To run an emulation you need to access the information on the tapes. We used a standard radio cassette recorder to "play " the tapes. One of the advantages of having a Sinclair expert is the pleasure of watching him listen to the tapes to discern the difference between the Spectrum loader or earlier Sinclair systems, such as the ZX80 or ZX81. The different loaders sound different when played through a normal cassette recorder. This sort of knowledge only comes with experience. It is this experience that can assist in data recovery. Faced with unmarked cassettes, it is possible to identify a Sinclair Spectrum or indeed any system by listening to it. When the various clicks and beeps were explained, it is easy to make the distinction. But the expertise comes from many years of working with the tapes."
If the above example seems outlandish, consider the fact that many of us have experienced hours of frustration when installing new hardware or software on a PC when, in principle, there should have been no difficulty at all.
Three Positions on Emulation
There would appear to be two extreme positions on emulation and one middle ground position. One extreme position is what Bearman aptly describes as the magic bullet position:
a simple, universally applicable, one-time fix."
Approximately, at least, that is Rothenbergs positionii and it is subject to the difficulties discussed above.
The other extreme position is that emulation has no role to play in digital preservation. Bearman does not exactly adopt this view but comes quite close to it.
Neither of the extreme positions seems to me to be tenable. What is the middle ground? A very modest view would be that emulation, sometimes at least, may play a role in rescue operations. (Indeed, it already has.) I think it would be hard to deny that emulation may play such a role, but is there a more general, strategic role it could play?
Bennett says the following,
"In an archive, it may be necessary to handle some emulations, but this can only be tenable in the short term, while both the emulated and the host emulator are current in technology terms. Obsolescence for the host environment will bring double jeopardy for the emulated environment. Archiving of an emulation and its dependants should be considered only for the near term, and in the advent of destructive forces."
This is indeed in the middle ground as I have defined it. But Im inclined to think that this position is too conservative. In particular I question whether it is essential that the emulated environment must be current. Consider the following scenario. An institution has a large archive of documents in Word 6 format. These documents are important, but it is rare for any given document to be accessed, so the institute wants to retain the ability to retrieve any particular document when the need arises. In this case I want to suggest that in principle there could be a strategic choice between migration and emulation. The institute could choose to migrate all the documents to a new format. Alternatively, it could choose to retain the documents in their original format (refreshing the media as required), providing the institute ensures that it retains the ability to read Word 6 documents. That ability, at the current time, would hardly constitute emulation at all, but surely current platforms could change in such a way that emulation would be required. (It could be that Windows emulation becomes required.) If that were the case, would not the archive be justified in making the strategic decision to opt for emulation even though the original platform had become obsolete?
Emulation as a Strategic Digital Preservation Strategy - Some Imaginary Scenarios
I want to consider the example just mooted in more detail with the hope of showing that, over a long period of time, different preservation strategies may need to be employed. In this view the best strategy at a given time could be contingent on particular factors.
By definition I take a platform to be obsolescent when any part of that platform becomes obsolete. Consider the following situations:
Archive A exists as a set of Word 6 documents. These are stored on 8mm DLT tape. Not just WINTEL machines can read these documents however, they can be read on many other computers and operating systems as well (I believe this to be the case). Now it could be argued that the application Word (version 6) is obsolete. Should the archive migrate all the documents to a more recent version of Word? In this case it would hardly seem worth the expense, since the archive rarely needs to access these documents and the means for reading them seems in no danger of becoming obsolete. The archive could read the documents with a more recent word processor which would convert the document to a new format. When the archive does access a document, what should it do with it afterwards, save it in the new format? This would seem a sensible thing to do.
Suppose that at time T1 nothing else has changed but the tapes are reaching the end of their shelf-life. The relevant part of the platform (DLT tape readers) are not obsolete, however, so the archive chooses to refresh the tapes.
By this time DLT tape drives are becoming less common and are in danger of becoming obsolete. However, the ability to read Word 6 documents on current computers and operating systems is not threatened. In this case the archive decides to adopt migration with respect to the media, copying it to the recently discovered molecular storage system, whilst sticking to the original format of Word 6.
Radical changes in application software and operating systems start to threaten the ability to read Word 6 documents. The archive decides to use an emulator developed by the European Institute for Digital Archiving, which had been established ten years earlier, and to stick with emulation as its main strategy for the larger part of its archive. However, those documents which had been accessed and saved in a more recent word processor format the archive chooses to migrate to the new platform, application and operating system. The media is refreshed. Here one would have emulation, migration and refreshing all being employed simultaneously.
At this time the effort required to maintain the Word 6 emulator is becoming onerous, so the European Digital Archiving Institute issues a warning. The Archive decides that the best option is to migrate all the documents to the latest standard and platform.
I should add that, in agreement with many others, I would always favour preserving the original bit streams.
Emulation and the CAMiLEON Project
CAMiLEON (Creative Archiving at Michigan and Leeds) is a joint NSF/JISC funded project whose aim is to assess emulation as a digital preservation strategy. To achieve this aim the project will:
- Evaluate publicly available emulators
- Explore emulator development
- Conduct test cases (using Apple 2 and BBC micro computers and specifically investigate the use of emulation applied to the BBC Doomsday Project) from both technical and user perspectives
- Conduct user trials comparing original systems with emulation of those systems
- Undertake cost-benefit analysis of emulation vs. other digital preservation strategies.
There is a case, I believe, for the view that emulation may have a strategic role to play in digital preservation. My view is more optimistic than most commentators but not as optimistic as Rothenberg's view. I do not believe that emulation can resolve all digital preservation issues.
 Preserving Digital Information: Report of the Task Force on Archiving of Digital Information, Commission on Preservation and Access and The Research Libraries Group, Inc., May 1, 1996,
 Digital Collections: A Strategic policy Framework for creating and preserving digital resources, Arts and Humanities Data Service, N.Beagrie & D.Greenstein, 1998. Available at:
 Metadata to Support Data Quality and Longevity, Jeff Rothenberg, IEEE, 1996,
 Avoiding Technological Quicksand: Finding a Viable Technical Foundation for Digital Preservation, Jeff Rothenberg, 1999. Available at:
" Who will create the Metadata for the Internet?" Charles F. & Linda S. Griffin, First Monday Vol. 3 No. 12, December 7, 1998. Available at:
 The CAMiLEON Project,
 "Metadata and Digital Preservation: a plea for cross-interest co-operation," Stewart Granger, VINE, 1999, Theme Issue on Metadata, Issue 117, Part 2, ISSSN 0305 5728, pp.24-29 London University. Also available at:
 "Reality and Chimeras in the Preservation of Electronic Records", David Bearman, D-Lib Magazine, April 1999. Available at:
 "Toward a Reference Model for Business Acceptable Communication", David Bearman, 1994, available at:
 "Digital Archaeology: Rescuing Neglected and Damaged Data Resources", S. Ross and A. Gow, February 1999. Available at:
 "A Framework of Data Types and Formats, and Issues Affecting the Long Term Preservation of Digital Materials", J.C. Bennett, 1997. Available at:
 "The Problems of Software Conservation", Doran Swade, Computer Conservation Society. Available at:
[i] Issues about emulation as a digital preservation strategy are being studied in a joint JISC/NSF Project: CAMiLEON: Creative Arching at Michigan and Leeds: Emulating the Old on the New. See .
[ii] Perhaps the caveat that would have to be made is that Rothenberg does not claim that the solution is simple, but he does claim that it would be universal and one-off (if we can ever achieve it!).
Copyright© 2000 Stewart Granger