IEEE-TCDL Position Statment
The IEEE Computer Society established the Technical Committee on Digital Libraries in 1997. It is to promote research in the theory and practice of all aspects of Collective Memory, i.e. the fields of Digital Libraries, Digital Museums and Digital Archives of all kinds.
In the past, global networks have usually transported textual information, but there is a growing need for these networks to transport other forms of information such as images, video, and audio. Until recently, electronic information sources served mainly specialized clients, but now these sources will be accessed by a wide range of users, ranging from computer specialists, discipline experts, engineers, and the general public, including novice computer users and students at all levels.
These trends have created an important research discipline: digital libraries. Several US agencies, including NASA, ARPA, and NSF, have made available over the past few years a considerable amount of money to support research in this field. The European Union, and other countries, including Canada, China, Japan, India and Australia have also invested in digital library development.
More recently the term “(Digital) Collective Memory” has been used to describe the convergence of libraries, museums, archives and collections of all kinds including those of private citizens. Especially in Europe the connection of Collective Memory with its Cultural Heritage has been recognized and plays an important role in interdisciplinary research.
For these reasons we will try to cover a wide range of “Collective Memory” and their technology in our Technical Committee on Digital Libraries and we encourage everybody to join who is interested in these subjects, may they be applied to culture, corporate knowledge, government, or private life. We will provide a discussion forum for us all via our Newsletter and our Bulletin, both of which will be edited and maintained by Bonita Wilson (who is also Editor, D-Lib Magazine).
Collective Memory development faces challenges in several areas, including the subdisciplines we summarize here.
- Storage. A Collective Memory’s storage system must be capable of storing a large amount of data in a variety of formats and accessing this data as quickly as possible. Text-only documents—stored in formats such as ASCII, LaTex, HTML, XML, and PostScript—are by far the easiest to store. Digital audio, video, simulations and animations are more difficult to store because they require significantly more storage space and their delivery is time-dependent. A typical Collective Memory technical infrastructure might use a variety of database-management systems Current DBMSs range from relational and extended relational systems to document-oriented database systems (e.g. XML). Relational DBMSs are very often used for the storage of metadata and indexes with attributes that contain pointers to files in a file system. Document-oriented database systems are slowly gaining acceptance and overcoming earlier standardisation, scaling and implementation problems. A Document Model can make it easier to specify, store, and work with real-world objects such as images, videos or maps. Compression techniques save storage and speed up transmissions. For text-only documents, freeware gzip utilities provide anywhere from 10- to 60-percent compression. Several important compression standards exist for digital images (JPEG), audio (MP3), and video (MPEG). All of these subjects will continue to be developed and improved.
- User interface. The user interface, perhaps the most important component of a digital Collective Memory, must incorporate a wide variety of techniques to afford rich, direct-manipulative interaction between users and the information they seek. For computer workstations, graphical user interfaces systems which include solutions based on HTML or XML/XSL are the status quo, but are enriched with many concepts of direct manipulative behaviour. In addition to common functionalities for search, browse, and submission/withdrawal of text-only Documents, a user interface of a digital Collective Memory system has also to provide new features, like for instance the handling of video documents, annotation as well as cross-language search and retrieval mechanisms. The inclusion of multimedia material requires an extension of the basic submission functionality for textual documents. The interface permits the insertion of video documents and their classification by means of appropriate document and metadata (e.g. based on Dublin Core and RDF oriented schemas) and allows including a description and subject classification for each document. The search functionality of user interfaces has to allow the formulation of both content-oriented and metadata-oriented query types. The navigational techniques of user interfaces for browsing usually apply key-frames to display extracts from video clips as thumbnails. The interfaces support the real-time display of video material as well as the display of related meta information, hypermedia annotation, textual content description, abstracts and key-frames.
- Classification and indexing. Classification and indexing schemes are used to collect related content into groups that are intuitive to a user. Recently the Semantic-WEB research effort has recognized the fact that classifying and indexing objects is filled with pitfalls, however, because individual perceptions of Ontology and metadata engines vary depending on subject area, context and even the feeling of the day. Another complicating factor in indexing and classifying is the tremendous amount of potential content that remains to be indexed. It is clear that manual methods for classification are insufficient for all but the most trivial Collective Memory systems. Automated classification systems differ significantly in their approaches, depending on the type of content under consideration. Classifying short stories is quite different from classifying geographic maps, both in terms of the mechanics involved and the appropriate classes. These distinctions make current automated classification efforts highly domain-specific and error-prone. Automated document classification methods can be grouped into two general approaches (statistical and analytical), but neither can yet capture sufficiently the meaning of the document contents. Image classification approaches are conceptually different from those used for text classification. Although many domain-specific systems allow "content-based" querying, most are relegated to a very narrow range of images and may require the services of human classifiers. Video classification and indexing requires systems that can parse video into semantically relevant portions. As with image classification, the type of classification and indexing performed on video is driven by the types of queries posed by users. The classification of audios, musical notations, maps, sequence data, etc. presents additional research challenges.
- Information retrieval. The concepts underlying information retrieval were conceived long before computers and information systems were employed to store library and Collective Memory materials. In this domain, there exist a variety of information retrieval and data-mining techniques, including metadata searching and content searching for all data types. It is difficult to pinpoint quantitatively the effectiveness of information retrieval and data-mining; only an individual user can determine what is truly useful. Techniques to improve retrieval effectiveness include preprocessing documents to extract additional metadata before storing them in a document base. Research also focuses on automating the creation and maintenance of user profiles and applying these profiles to information retrieval. Software agents are an extension of filtering techniques, although filtering tends to imply passive mechanisms whereas the use of agents implies a more proactive approach. Many people have put forth definitions of software agents, ranging from an adaptable information filter to an autonomous program that works in conjunction with or on behalf of a human user. Software agents also embody the notion of improving over time as they record additional user actions and reactions.
- Content delivery. Once an object of interest has been located in the Collective Memory, it may be delivered in several ways. If the content is small, such as 100 pages of text or a 100-Kbyte-image file, it may be delivered through the same channel used for information retrieval and querying. Content such as movies and software, however, demands much higher bandwidth. In these cases, delivery may be over dedicated lines (for example, cable TV or videoconferencing systems) or satellite-based systems as they still offer a more reliable quality of service than the Internet. Increased demands for networking bandwidth come from two main fronts. Firstly, the number of Collective Memory users will undoubtedly increase. If the Internet is any indication, exponential growth in the number of users will be the rule. Secondly, as the delivery of multimedia data becomes the norm, the demands for high bandwidth increase. However, high bandwidth, in and of itself, is not enough to support Collective Memories. The intelligent use of bandwidth and the ability to guarantee a quality of service for a given time period are also required. Today's open networking standards such as TCP/IP and the ever-growing Internet make it clear that successful digital libraries must be built on an open, interoperable networking infrastructure. Current wired or wireless LANs run at 10 to 100 Mbits per second. Wide area Internet's backbones run at 1.5 Mbps to 2.5 Gbps, while links to individual organizations fall in the 56 Kbps to 1.5 Mbps. Individual users typically connect to the Internet through service providers, local universities, or other organizations with 64 Kbps to a number of Mbps. However, individual users that are using wireless telephones are frequently restricted to 9600 bps, resulting in a big challenge when delivering Collective Memory objects.
- Presentation. Users of a traditional library usually want to read a book or watch a videotape; other uses are rare. With digital technology, it now becomes possible to listen to a book being read, watch a video of a musical performance alongside the original score, or hold a virtual object. Other possible uses are highly personal—individuals may dream up many distinct variations. Collective Memory’s presentation systems must be flexible and highly customizable. They must also be aware of the output hardware's capabilities and limitations, automatically adjusting to deliver the best possible presentation quality at all times. In addition the system should know and adjust automatically to personal preferences in contents and presentation for individuals (e.g. blind people).
- Administration and Preservation. Traditional libraries store a copy of a book or other documents. Traditional museums store physical artefacts. Collective Memory systems may store several versions of a document in a way that makes multiple revisions by multiple authors possible. They may store different digital presentations of physical artefacts. In addition, a digital Collective Memory may have multiple owners in terms of the sources of the content and annotations made to the content of the library and it has to ensure that properly rights are maintained or violations can at least be tracked. An administrative system ensures that materials intended for public viewing can indeed be viewed by anyone while private collections and personal annotations may only be viewed by a selected group or single individual. And data-versioning techniques track the history of such revisions. Security mechanisms must be put into place to ensure that only authorized users gain access. Current digital collections employ the basic security measures offered by the supporting operating systems. For example, any digital library can restrict access using username and password authentication and protect files using group membership and file-access rights. This basic security will not meet the demands of large-scale digital libraries. Digital Collective Memories will have to be preserved and made available for long periods of time. Changes in technology, organisational structures, responsibilities and simple aging of electronic storage media endanger preservation and access. Techniques, processes and controls have to be developed and established to solve this serious problem of these digital memories.
Contexts and strategic visions
Besides considering a Collective Memory as a container extending conventional library (cataloguing) practice, it is necessary to enlarge the level of granularity as many new areas dealing with digital repositories of information are under development. Digital Earth, Digital Sky, Digital Bio, Digital Law, Digital Art, Digital Music and most of all Digital Libraries for Education provide new metadata schemas focused on the information content rather than on information entities. The traditional handling of electronic versions of books, journal articles, images and videos has to be expanded by considering new concepts, theories, models and hypothesis, as well as experimental results and measurements. The TCDL will have particular care in following the technical challenges related to these new arising fields.
Technical Committee on Digital Libraries
Membership in the Technical Committee on Digital Libraries is free, you don’t even have to be a member of the IEEE Computer Society. We invite you to join and contribute ideas, suggestions, comments, and time. For more information, see our home page at http://www.ieee-tcdl.org or come through the IEEE Computer Society's home page at http://www.computer.org