IEEE TCDL Bulletin
 
space space

TCDL Bulletin
Current 2006
Volume 2   Issue 2

 

Shortcomings of the Reference Model for an Open Archival Information System (OAIS)

Alexander Egger
Campus02 University of Applied Sciences
Graz, Austria
alexander.egger@campus02.at

 

Abstract

The OAIS model is a well-known guideline for implementing digital archives. Having to build a software system for a digital archive that is conform to the OAIS model, one faces the problem of a very vague definition of what is meant by the term "OAIS conformity".

This paper shows how the OAIS model can be integrated into a standard software development process, using the model as user requirements. These requirements are used to identify the use cases of a digital archive that are necessary to develop a software architecture and a software design.

The results of the use case analysis show that the OAIS model has several shortcomings if it is used as a basis for developing a software system. It is therefore necessary to develop additional specifications which fill the gap between the OAIS model and software development.

Section 1 of this paper describes the assumptions made for this paper. The connection between the OAIS model and software development, which has to be established, is outlined as well as the embedding of this problem into a larger project. Section 2 shows how a basic set of use cases can be identified using the OAIS specification. The main scenarios of these use cases are developed in Section 3. The resulting use cases are listed in Section 4. Section 5 sums up the major findings of this paper and suggests developing specifications and models that describe digital archives in more detail than the OAIS specification.

 

Introduction

This paper shows how the Austrian Literature Online consortium tried to integrate the OAIS model into its project to build a trusted digital repository. The consortium has already implemented such a digital repository, which is currently used to run the digital library (http://www.literature.at). As this system is becoming outdated and does not fullfill the requirements of the consortium any longer, a new system has to be designed.

For the design of the system two sources of user requirements have to be taken into account: the requirements of the consortium, and the requirements of standards that have to be used. In particular, the new system should follow the suggestions of The Reference Model for an Open Archival Information System (OAIS) [CCS02]. In this paper, the author describes his experiences using the OAIS model as a source for software design.

The OAIS Model is one of the basic principles for building a digital archive that provides services for a digital library system. The OAIS model describes all aspects of such an archive, especially with the challenge of long-term preservation of digital data.

The OAIS specification clearly states that it is not a guideline for an implementation or a design:

This reference model does not specify a design or an implementation. Actual implementations may group or break out functionality differently. [CCS02, page 1-2]

In trying to build a software system for an OAIS-conformant digital archive, one faces the problem of having only a very vague definition of what is understood be the term "OAIS conformity". The fact that the OAIS specification must not be seen as a technical guideline1 prompts the question: What relationship does the OAIS specification have to the system to be developed?.

The answer to this question is that the OAIS model is part of the user requirements that serve as the starting point for the software development process. However, in order to describe a complete system, it is important that the requirements of the specific project have to be added to the requirements of the OAIS model. Actual archive implementations can use their specific requirements to extend this basic design. If the basic system design is highly modular, the additional functionality does not break the OAIS conformance to the basic design.

Based on the assumption that the OAIS model completely describes the basic functionality of a digital archive, it is possible to create a system design of an OAI-conformant archive. Using the OAIS specification as the definition of the requirements, the design of the system can be developed as part of a standard software development process. For this paper, the software development process described by Bernd Bruegge and Allen H. Dutoit is used [BD04]. The requirements are analysed by means of use cases, as defined by Alistar Cockburn [Coc00]. The next step, which is not described in this paper, is the selection of a system architecture model that determines the basic properties of the system, such as performace and scalability. The detailed description of the elements of the system architecture then results in a system design.

The use case analysis is carried out in two steps: In the first step, Chapter 2.3 OAIS High-level External Interaction of the OAIS specification is used to identify the actors of the system and the use cases they trigger. In the second step, using Chapter 4.1 Functional Model, the main scenarios of the identified use cases are developed. If necessary, new use cases and actors are introduced in this step.

Identification of Actors and Use Cases

The identification of the actors in a system is done according to Cockburn [Coc00], and it is the first step for developing the use cases of a system. Chapter 2 of the OAIS specification contains the first general descriptions that can be used for that purpose.

Using the OAIS environment model, as shown in Figure 1, the following actors can be identified:

  • Producer: A Producer is a person or a system providing information for the archive.
  • Management: Management defines guidelines for the OAIS. This does not include system administration.
  • Consumer: A Consumer is a person or a system retrieving information from the archive.

Chart showing the three actors within the environmental model

Figure 1: OAIS environment model

The first use cases can be derived from Section 2.3 of the OAIS specification. The subsection 2.3.1 of the OAIS specification describes the interaction of the actor "Management" with the archive. Only legal and organisational functions, such as the development of a pricing strategy, are defined here. Since these functions do not have an effect on the technical systems of the archive, it is not necessary to take these functions into consideration for the use case analysis.

Section 2.3.2 of the OAIS specification describes two interactions with the system for the actor "Producer":

  1. Establish an agreement about the submission of information. This agreement identifies the data that is to be transmitted to the archive, and determines the size and the time period of the transmission. Additionally, details like data formats are defined in this process. These descriptions lead to the first use case called "Establish submission agreement".
  2. Transmit the data to the archive. Within the scope of a "Submission Agreement" several "Submission Information Packages" (SIPs) are transferred to the archive. This leads to the use case "Submit Data".

For the third actor, "Consumer", the analysis of section 2.3.3 of the OAIS specification results in five use cases.

  1. Similar to the Producer, the Consumer has to establish an agreement with the archive about the delivery of information. This agreement does not have to be explicit. It can be a simple click on a link in a web interface of the archive. However, this function is necessary to establish prices and usage guidelines. This leads to the use case "Establish order agreement".
  2. For the retrival of data from the archive "Data Dissemination Sessions" are established. During these sessions the requested information is transmitted to the user. This leads to the use case "Disseminate data".
  3. If the users do not already know the data they want to retrieve from the archive, they use the search function of the archive to identify the relevant data. This leads to the use case "Search data".
  4. Once the customer has identified the data he wants to retrieve, he files an order for that data. The OAIS specification distinguishes two such types of orders: the "Ad Hoc Order" and the "Event based order". An "Ad Hoc Order" is used if the requested data already exists in the system. This leads to the use case "Place ad hoc order".
  5. An "Event Based Order" refers to data not yet available in the system. The data is delivered to the customer as soon as it becomes available. Functions such as subscriptions to documents on a certain topic could be implemented using "Event Based Orders". This leads to the use case "Place event based order".

Figure 2 shows the use cases identified so far using chapter 2.3 of the OAIS specification. Although "Disseminate data" is described as a separate use case in the OAIS specification the use case is only carried out as part of the use cases "Place ad hoc order" and "Place event based order". Therefore this use case is drawn as an extension to this use cases.

Image showing the identified use cases

Figure 2: Identified use cases

In order to complete the analysis, the next step, is to describe the main scenarios for each identified use case. This step is outlined in the next section.

Use Case Scenarios

Chapter 4 of the OAIS specification describes the functional model of an OAIS, as shown in Figure 3.

Chart showing the OAI functional model and relationship of parts

Figure 3: OAIS functional model

Section 4.1.1 of the OAIS specification deals with detailed descriptions of the functional entities of an OAIS. These descriptions can be used to define the main scenarios of the use cases identified in the previous section. As use cases are described from the actor's point of view, a use case often uses functions of several OAIS modules (for example, the use case "Submit data" handles functions of the modules "Ingest", "Data Management" and "Archival Storage"). The OAIS specification describes a function only within the boarders of a module. For the use case analysis the functions of all three modules have to be taken into account.

One problem when trying to describe OAIS functions as use case scenarios is that the OAIS specification uses different levels of detail for these descriptions. Some functions are specified nearly at an implementation level; others are general guidelines. For example, on one hand, the module "Archival Storage" has a function called "Replace Media", which is a very low level function. On the other hand, the module "Administration" defines a function called "Establish Standards and Policies". For consistent system analysis, all these descriptions have to be brought to the same level.

Another problem is that there is no actor for some functions in the module "Administration". This a sign of a poor concept. No system has functions that just happen without a cause. Therefore, new actors have to be defined that are responsible to carry out these functions.

The subsequent sections summarise the descriptions of the OAIS functional modules and show how these descriptions can be used to define steps for the main scenarios of the use cases.

Ingest

The transfer of a "Submission Information Package" (SIP) to the archive starts with the function "Receive Submission" of the module Ingest. Once the information of the SIP has been stored successfully in the archive, an acknowledgement message is sent to the Producer.

The function "Receive Submission" triggers several other functions of the archive in different modules.

"Quality Assurance" checks whether the transmitted package fulfils the quality standards of the archive. For example, the calculation of check sums may be part of this function.

"Generate AIP" transforms the SIP into an "Archival Information Package" (AIP). This includes data format conversions and the integration of additional information from other modules of the archive.

"Generate Descriptive Information" creates the descriptive information. Additional information (for example, thumbnails) for the presentation of the objects is generated in this step.

Finally the function "Coordinate Updates" writes the data to storage media and finishes the transaction. In case of an error, this function tries to recover from the error and sets appropriate actions.

These descriptions can be used to define steps for the main scenario of Use Case 1 "Submit data".

Archival Storage

The function "Receive Data" of the module Archival Storage starts with the submission of a SIP to the archive. It is thus part of use case 1, "Submit data". Steps for the storage of the AIP can be added to the main scenario of this use case.

"Manage Storage Hierarchy" manages the data storage devices of the archive. Such functions are features of state of the art hardware storage products. If an archive is built upon such a system, these functions need not be implemented by the software of the archive system. The same is true for the functions "Replace Media", "Error Checking" and "Disaster Recovery".

The request for an object through the Access module triggers the function "Provide Data", which provides the relevant data. This function adds steps to the use case 2 "Disseminate data".

Data Management

Search requests of the users through the Access module are forwarded to the Data Management module. The function "Perform Queries" executes the user's queries. It is part of the use case 5 "Search Data".

The function "Generate Report" is executed by the modules Ingest, Access and Administration. The description of this function can be applied to add main scenario steps to the use cases 1 "Submit data" and 6 "Request report".

"Receive Database Update" and "Administer Database" are internal functions of the system without any actor interaction. They add no further information to any of the use cases.

Administration

The functions "Negotiate Submission Agreement", "Establish Standards and Policies" and "Physical Access Control" describe general management functions that cannot be implemented in a technical system.

Using "Manage system configuration", the system can be configured and monitored. For this function a new use case that has not been identified so far has to be introduced. This use case is called "Manage system configuration". As the OAIS system does not mention an actor for this function, a new actor called Administrator is added.

"Archive Information Update" is part of the use case "Manage system configuration". This function is executed when information in the archive has to be updated.

"Audit Submission" checks to see if the data that is to be stored in the archive fulfils the guidelines of the archive. This function is used by the modules Ingest and Preservation Planning. It is therefore part of the use case "Submit data".

The function "Active Requests" stores data of event driven orders and checks to see if the orders can be carried out. This adds steps to the main scenario of use case 4 "Place event based order".

Customer service functions include the management of user information. Like "Manage system configuration", a new use case has to be defined for this function. The use case is called "Manage customers". The actor Administrator can be used for this use case as well. As the functions for managing users are well-known, these functions need not be described by scenarios.

Preservation Planning

The functions of the module "Preservation Planning" describe how the data stored in an OAIS can be kept readable. There are two main processes: "Monitor Technologies" and "Monitor Designated Community". These processes compare the current state of technical systems with the data in the archive. Appropriate long-term preservation steps are carried out if necessary.

"Preservation Planning" is a management process that, in the first step, has no technical implementation. Therefore, no use cases for a technical system are defined by these functions.

The definition of preservation planning as a non technical task is also discussed in [BRSS03] or [Ber02]. The reason for this lies in the difficulty of knowing what technical systems will look like in the future. It is therefore impossible to implement technical systems that meet the needs of future systems.

Access

The functions of the module Access are triggered by the actor Consumer that can make three types of requests to the system:
  • Search requests - search the descriptive data of the archive for a specific object,
  • Report requests - create reports using descriptive data and AIPs, and
  • Order requests - these are requests that trigger the transmission of information. They can either be ad hoc orders or event based orders.

For the creation of a report, a new use case called "Request report" (use case 6) is created. The Consumer is the actor for this use case.

The function "Coordinate Access Activities" adds further steps to the main scenarios of the use cases 5 "Search data", 3 "Place ad hoc order" and 4 "Place event based order".

The description of the functions "Generate DIP" and "Deliver Response" can be applied for use case 2 "Disseminate data".

Use Cases

This section lists the use cases and their main scenarios that were developed in the previous sections. Figure 4 shows an overview of the use cases and their connections to the actors of the system.

Use Case diagram

Figure 4: Use case overview

For a better picture of the system it is also useful to further describe the use cases using UML activity diagrams. Figure 5 shows an example of such a diagram for the use case "Submit data". Activity diagrams can be used as starting point for a UML-based system design.

Activity diagram

Figure 5: Activity diagram of the use case submit data

Use Cases 1 - 6

Use Case 1: Submit data
Primary Actor: Producer
Main success scenario:
1. A Sip is submitted to the OAIS by the Producer.
2. The SIP is validated. Checksums are calculated and checked.
3. The SIP is checked for compliance with the archive standards.
4. The AIP is generated.
4a. File formats are converted.
4b. Descriptive informaiton is requsted from Data Management.
5. Desecriptive Information is generated.
5a. Descriptive Information is extracted from the SIP.
5b. Additional information is requested from Data Management.
5c. Additional files are generated for viewing and browsing.
6. The AIP is stored in Archival Storage.
6a. The appropriate storage media is selected.
6b. The AIP is transferred to the media.
6c. A storage confirmation is generated.
7. Descriptive Information is stored in Data Management.
8. A confirmation of receipt is returned to the producer.

Use Case 2: Dsseminate data
Primary Actor: Consumer
Main success scenario:
1. The AIP for this reauest is retrieved from the Archival Storage.
1a. The AIP request is received.
1b. The AIP is identified.
1c. The AIP is transferred to Access.
2. Additional information for the DIP is requested from Data Management.
3. The data is transformed into the requested data formats.
4. The DIP is generated.

Use Case 3: Place ad hoc order
Primary Actor: Consumer
Main success scenario:
1. An object is requested by the Consumer.
2. The DIP is generated.
3. The DIP is transmitted to the Consumer.

Use Case 4: Place event based order
Primary Actor: Consumer
Main success scenario:
1. An object is requested, and conditions for the transmission are specified by the Consumer.
2. The request and the conditions are stored and monitored.
3. The conditins can be met, and the DIP is generated.
4. The DIP is transmitted to the Consumer.

Use Case 5: Search data
Primary Actor: Consumer
Main success scenario:
1. A query is submitted by the Consumer.
2. The query is executed in Data Management.
2a. The query is received.
2b. The query result is generated.
3. The query result is returned to the Consumer.

Use Case 6: Request report
Primary Actor: Consumer
Main success scenario:
1. A report is requested by the Consumer.
2. The report is generated using data from Data Manaement and Archival Storage.
3. The report is returned to the Consumer.

Conclusion

This paper shows how use cases necessary for the development of a software system for a digital archive can be developed using the OAIS specification as user requirements. Several use cases were identified and specified using main scenarios.

The results of this paper have already been used for the design and implementation of an archival system that conforms to the OAIS model. The resulting software called ADIGRES - Austrian Literature Online Digital Repository Software [ADI05] is available as open source.

However, the results of the analysis are somewhat disappointing, because the OAIS model mixes management functionality with technical functionality. For the development of a technical system, all management functionality has to be removed. Furthermore, the OAIS model uses different levels of abstraction to describe its functionality. This leads to the strange split of the use case "Place order" into "Place ad hoc order" and "Place event based order", whereas at such a high level of system description one would expect only the use case "Place order".

It can be said that the OAIS Model has shortcomings when it is used as a basis for developing a software system. The mixture of management and technical functionality makes it hard the define the parts of the system that can actually be implemented by software or hardware. The different levels of the description predefine decisions that should be made in the software design already in the requirements definition. This is a well known risk for software development projects.

As the OAIS specification has already foreseen, there is a clear need for additional specifications. These specifications should define system architectures and designs that conform to the OAIS model. In areas related to digital archives, such specifications are currently developed. One example is the Java Specification Request "JSR 170 Content Repository for Java Technology API" [JSR04], which might be a good starting point for defining an OAIS implementation recommendation.

Note

1. The OAIS specification looks like a technical guideline, which led to a lot of confusion in the past and which might be the reason why the specification now states in bold letters that it is not a technical guideline.

Bibliography

[ADI05] Adigres - austrian literature online digital repository software. WWW page, 2005. <http://adigres.sourceforge.org>.

[BD04] Bernd Bruegge and Allen H. Dutoit. Objektorientierte Softwaretechnik. Pearson Studium, August 2004.

[Ber02] Bryan Bergeron. Dark Ages II, When the Digital Data Die. Prentice Hall PTR, 2002.

[BRSS03] Uwe M. Borghoff, Peter Rödig, Jan Scheffczyk, and Lothar Schmitz. Langzeitarchivierung. dpunkt.verlag, 2003.

[CCS02] Reference model for an open archival information system (OAIS). Technical report, Consultative Committee for Space Data Systems, January 2002.

[Coc00] Alistar Cockburn. Writing Effective Use Cases. Addison-Wesley, 2000.

[JSR04] Jsr-000170 content repository for java technology api, 2004. <http://www.jcp.org/aboutJava/communityprocess/review/jsr170/>.

Biography

Alexander Egger studied Telematics at the Technical University Graz and the University Carlos III in Madrid. He worked on the development of software for digital libraries in several national and international research projects. He is currently employed as research scientist by the University of Applied Sciences Campus02 and works on his doctoral thesis at the Kepler University Linz. Alexander Egger does research on digital libraries as member of the Austrian Literature Online research group.

Austrian Literature Online was founded in 1999 to digitise the one thousand most important books of Austria. The Austrian Universities of Graz, Linz and Innsbruck have been first members of the project. Alexander Egger developed a first prototype of a digital library for this project and since then has been in charge of most software development projects of the group.

The most important sub projects of ALO are METAe, Books2u and Reuse, which were funded by the European Union.

With METAe the Austrian Literature Online group developed in cooperation with libraries, universities and firms from all over Europe and the USA a software to automatically extract text and meta data out of digital page images of books. Using OCR and layout analysis technologies, METAe is able to recognise the logical structure of a book (for example, the chapter structure, chapter headings, footnotes, the table of contents, etc.). Books2U replaced inter library loan with a workflow based on digitisation. Old and valuable books are no longer sent to users but digitised on demand and can be accessed via the internet. Reuse tries to add "born digital" objects into the digital library.

Currently the Austrian Literature Online group works together with the Humboldt University Berlin in the Sun Center of Excellence for Trusted Digital Repositories.

 

© Copyright 2006 Alexander Egger

Top | Contents
Previous Article
Next Article
Home | E-mail the Editor