|
|
Position Statement
|
|
(PDF-Version)
Past Chairman:
Erich J. Neuhold, Fraunhofer-IPSI, Germany
neuhold@ipsi.fhg.de
Administration Assistant: Marcello L'Abbate, Fraunhofer-IPSI, Germany
labbate@ipsi.fhg.de
Executive Board
The IEEE Computer Society established the Technical Committee on
Digital Libraries in 1997. It is to promote research in the theory
and practice of all aspects of Collective Memory, i.e. the fields
of Digital Libraries, Digital Museums and Digital Archives of all
kinds.
In the past, global networks have usually transported textual
information, but there is a growing need for these networks to
transport other forms of information such as images, video, and
audio. Until recently, electronic information sources served mainly
specialized clients, but now these sources will be accessed by a
wide range of users, ranging from computer specialists, discipline
experts, engineers, and the general public, including novice
computer users and students at all levels.
These trends have created an important research discipline:
digital libraries. Several US agencies, including NASA, ARPA, and
NSF, have made available over the past few years a considerable
amount of money to support research in this field. The European
Union, and other countries, including Canada, China, Japan, India
and Australia have also invested in digital library
development.
More recently the term “(Digital) Collective Memory”
has been used to describe the convergence of libraries, museums,
archives and collections of all kinds including those of private
citizens. Especially in Europe the connection of Collective Memory
with its Cultural Heritage has been recognized and plays an
important role in interdisciplinary research.
For these reasons we will try to cover a wide range of
“Collective Memory” and their technology in our
Technical Committee on Digital Libraries and we encourage everybody
to join who is interested in these subjects, may they be applied to
culture, corporate knowledge, government, or private life. We will
provide a discussion forum for us all via our Newsletter and our
Bulletin, both of which will be edited and maintained by Bonita
Wilson (who is also Editor, D-Lib Magazine).
Technical challenges
Collective Memory development faces challenges in several areas,
including the subdisciplines we summarize here.
- Storage. A Collective Memory’s storage system must
be capable of storing a large amount of data in a variety of
formats and accessing this data as quickly as possible. Text-only
documents—stored in formats such as ASCII, LaTex, HTML, XML,
and PostScript—are by far the easiest to store. Digital
audio, video, simulations and animations are more difficult to
store because they require significantly more storage space and
their delivery is time-dependent. A typical Collective Memory
technical infrastructure might use a variety of database-management
systems Current DBMSs range from relational and extended relational
systems to document-oriented database systems (e.g. XML).
Relational DBMSs are very often used for the storage of metadata
and indexes with attributes that contain pointers to files in a
file system. Document-oriented database systems are slowly gaining
acceptance and overcoming earlier standardisation, scaling and
implementation problems. A Document Model can make it easier to
specify, store, and work with real-world objects such as images,
videos or maps. Compression techniques save storage and speed up
transmissions. For text-only documents, freeware gzip utilities
provide anywhere from 10- to 60-percent compression. Several
important compression standards exist for digital images (JPEG),
audio (MP3), and video (MPEG). All of these subjects will continue
to be developed and improved.
- User interface. The user interface, perhaps the most
important component of a digital Collective Memory, must
incorporate a wide variety of techniques to afford rich,
direct-manipulative interaction between users and the information
they seek. For computer workstations, graphical user interfaces
systems which include solutions based on HTML or XML/XSL are the
status quo, but are enriched with many concepts of direct
manipulative behaviour. In addition to common functionalities for
search, browse, and submission/withdrawal of text-only Documents, a
user interface of a digital Collective Memory system has also to
provide new features, like for instance the handling of video
documents, annotation as well as cross-language search and
retrieval mechanisms. The inclusion of multimedia material requires
an extension of the basic submission functionality for textual
documents. The interface permits the insertion of video documents
and their classification by means of appropriate document and
metadata (e.g. based on Dublin Core and RDF oriented schemas) and
allows including a description and subject classification for each
document. The search functionality of user interfaces has to allow
the formulation of both content-oriented and metadata-oriented
query types. The navigational techniques of user interfaces for
browsing usually apply key-frames to display extracts from video
clips as thumbnails. The interfaces support the real-time display
of video material as well as the display of related meta
information, hypermedia annotation, textual content description,
abstracts and key-frames.
- Classification and indexing. Classification and indexing
schemes are used to collect related content into groups that are
intuitive to a user. Recently the Semantic-WEB research effort has
recognized the fact that classifying and indexing objects is filled
with pitfalls, however, because individual perceptions of Ontology
and metadata engines vary depending on subject area, context and
even the feeling of the day. Another complicating factor in
indexing and classifying is the tremendous amount of potential
content that remains to be indexed. It is clear that manual methods
for classification are insufficient for all but the most trivial
Collective Memory systems. Automated classification systems differ
significantly in their approaches, depending on the type of content
under consideration. Classifying short stories is quite different
from classifying geographic maps, both in terms of the mechanics
involved and the appropriate classes. These distinctions make
current automated classification efforts highly domain-specific and
error-prone. Automated document classification methods can be
grouped into two general approaches (statistical and analytical),
but neither can yet capture sufficiently the meaning of the
document contents. Image classification approaches are conceptually
different from those used for text classification. Although many
domain-specific systems allow "content-based" querying, most are
relegated to a very narrow range of images and may require the
services of human classifiers. Video classification and indexing
requires systems that can parse video into semantically relevant
portions. As with image classification, the type of classification
and indexing performed on video is driven by the types of queries
posed by users. The classification of audios, musical notations,
maps, sequence data, etc. presents additional research
challenges.
- Information retrieval. The concepts underlying
information retrieval were conceived long before computers and
information systems were employed to store library and Collective
Memory materials. In this domain, there exist a variety of
information retrieval and data-mining techniques, including
metadata searching and content searching for all data types. It is
difficult to pinpoint quantitatively the effectiveness of
information retrieval and data-mining; only an individual user can
determine what is truly useful. Techniques to improve retrieval
effectiveness include preprocessing documents to extract additional
metadata before storing them in a document base. Research also
focuses on automating the creation and maintenance of user profiles
and applying these profiles to information retrieval. Software
agents are an extension of filtering techniques, although filtering
tends to imply passive mechanisms whereas the use of agents implies
a more proactive approach. Many people have put forth definitions
of software agents, ranging from an adaptable information filter to
an autonomous program that works in conjunction with or on behalf
of a human user. Software agents also embody the notion of
improving over time as they record additional user actions and
reactions.
- Content delivery. Once an object of interest has been
located in the Collective Memory, it may be delivered in several
ways. If the content is small, such as 100 pages of text or a
100-Kbyte-image file, it may be delivered through the same channel
used for information retrieval and querying. Content such as movies
and software, however, demands much higher bandwidth. In these
cases, delivery may be over dedicated lines (for example, cable TV
or videoconferencing systems) or satellite-based systems as they
still offer a more reliable quality of service than the Internet.
Increased demands for networking bandwidth come from two main
fronts. Firstly, the number of Collective Memory users will
undoubtedly increase. If the Internet is any indication,
exponential growth in the number of users will be the rule.
Secondly, as the delivery of multimedia data becomes the norm, the
demands for high bandwidth increase. However, high bandwidth, in
and of itself, is not enough to support Collective Memories. The
intelligent use of bandwidth and the ability to guarantee a quality
of service for a given time period are also required. Today's open
networking standards such as TCP/IP and the ever-growing Internet
make it clear that successful digital libraries must be built on an
open, interoperable networking infrastructure. Current wired or
wireless LANs run at 10 to 100 Mbits per second. Wide area
Internet's backbones run at 1.5 Mbps to 2.5 Gbps, while links to
individual organizations fall in the 56 Kbps to 1.5 Mbps.
Individual users typically connect to the Internet through service
providers, local universities, or other organizations with 64 Kbps
to a number of Mbps. However, individual users that are using
wireless telephones are frequently restricted to 9600 bps,
resulting in a big challenge when delivering Collective Memory
objects.
- Presentation. Users of a traditional library usually
want to read a book or watch a videotape; other uses are rare. With
digital technology, it now becomes possible to listen to a book
being read, watch a video of a musical performance alongside the
original score, or hold a virtual object. Other possible uses are
highly personal—individuals may dream up many distinct
variations. Collective Memory’s presentation systems must be
flexible and highly customizable. They must also be aware of the
output hardware's capabilities and limitations, automatically
adjusting to deliver the best possible presentation quality at all
times. In addition the system should know and adjust automatically
to personal preferences in contents and presentation for
individuals (e.g. blind people).
- Administration and Preservation. Traditional libraries store
a copy of a book or other documents. Traditional museums store
physical artefacts. Collective Memory systems may store several
versions of a document in a way that makes multiple revisions by
multiple authors possible. They may store different digital
presentations of physical artefacts. In addition, a digital
Collective Memory may have multiple owners in terms of the sources
of the content and annotations made to the content of the library
and it has to ensure that properly rights are maintained or
violations can at least be tracked. An administrative system
ensures that materials intended for public viewing can indeed be
viewed by anyone while private collections and personal annotations
may only be viewed by a selected group or single individual. And
data-versioning techniques track the history of such revisions.
Security mechanisms must be put into place to ensure that only
authorized users gain access. Current digital collections employ
the basic security measures offered by the supporting operating
systems. For example, any digital library can restrict access using
username and password authentication and protect files using group
membership and file-access rights. This basic security will not
meet the demands of large-scale digital libraries. Digital
Collective Memories will have to be preserved and made available
for long periods of time. Changes in technology, organisational
structures, responsibilities and simple aging of electronic storage
media endanger preservation and access. Techniques, processes and
controls have to be developed and established to solve this serious
problem of these digital memories.
Contexts and strategic visions
Besides considering a Collective Memory as a container extending
conventional library (cataloguing) practice, it is necessary to
enlarge the level of granularity as many new areas dealing with
digital repositories of information are under development. Digital
Earth, Digital Sky, Digital Bio, Digital Law, Digital Art, Digital
Music and most of all Digital Libraries for Education provide new
metadata schemas focused on the information content rather than on
information entities. The traditional handling of electronic
versions of books, journal articles, images and videos has to be
expanded by considering new concepts, theories, models and
hypothesis, as well as experimental results and measurements. The
TCDL will have particular care in following the technical
challenges related to these new arising fields.
Technical Committee on Digital Libraries
Membership in the Technical Committee on Digital Libraries is
free, you don’t even have to be a member of the IEEE Computer
Society. We invite you to join and contribute ideas, suggestions,
comments, and time. For more information, see our home page at
http://www.ieee-tcdl.org or come through the IEEE Computer
Society's home page at http://www.computer.org, or send e-mail to
labbate@ipsi.fhg.de.
|
|