IEEE TCDL Bulletin
space space

TCDL Bulletin
Current 2006
Volume 2   Issue 2


What did the Others Say?

Probabilistic Indexing and Retrieval Models in Annotation-based Discussions

Ingo Frommholz
University of Duisburg-Essen
D-47048 Duisburg, Germany



Annotations, supporting different facets of Digital Libraries, can manifest themselves in different forms and dimensions. In the form of shared, textual and nested annotations they can be used to model discussions. Such discussions contain additional information that can be interesting for both document and discussion search. In my thesis I am going to develop a formal annotation model and, based on this, discuss and evaluate probabilistic indexing and retrieval models for annotation-based document and discussion search.


1. Motivation

There is a lot of ongoing research in the design and development of software systems incorporating annotations. As a recent study shows [3], annotations support different facets of Digital Libraries (like creation, management, access and retrieval, and effective use) in several ways. Annotations can manifest themselves in different forms and dimensions, ranging from simple text highlighting and personal notes through (typed) links between documents up to nested and shared annotations with which collaborative discussions about a specific topic are realised. In the latter case, annotations and the annotated object form an annotation thread. Such threads consist of (one or more) documents and nested annotations discussing the documents and their content. Depending on one's viewpoint, either a document and its annotations are equal or annotations are seen as auxiliary documents for the document they are annotating. An example of annotation threads are the talkback discussions found in ZDNet News (, and also the annotation model created within the COLLATE project, where threads are formed with annotations and discourse structure relations among them [5]. When regarding documents and annotations as equal, annotation threads are similar to discussion threads as we know them from USENET News or email discussion lists. In fact, articles in discussion forums are annotations.

The nodes in the annotation thread and the links among them form a certain kind of document context for each node, the context of "what others said about a document or its content". While in traditional information retrieval, lots of models and methods have been developed for document search, the idea of using the annotation context for retrieval is quite new. In my thesis, I will define and evaluate methods to employ the annotation context for information retrieval. There are two main scenarios which can be identified:

  1. Document search: Users are only interested in the main documents of the repository. In this scenario, annotations are just auxiliary documents and are not equal to the main documents. Nonetheless, since annotations contain information about the annotated documents, annotation-based document search will exploit the context given by the corresponding annotation thread.
  2. Discussion search: In this scenario, users are interested in parts of or the whole annotation thread satisfying their information need. Discussion search views annotations as equal to the main documents of the repository. The classic example is searching in discussion groups like they can be found in USENET.

In both scenarios, it is crucial to consider the context spawned by the annotation thread, which in collaborative discussions means taking into account what others said about a document.

2. Proposed Research

When showing how to exploit the annotation context for information retrieval, three main tasks have to be done:

  1. Find a suitable formalisation for annotations and annotatable objects.
  2. Define and implement suitable context-based retrieval approaches for document and discussion search.
  3. Evaluate these approaches with regard to (w.r.t.) their retrieval effectiveness.

In my work I will focus on the creation and application of probabilistic retrieval models, either logic-based or statistic. Before discussing retrieval issues, I propose a formalisation of annotations and annotatable objects1.

2.1 Annotatable Objects, Annotations and Annotation Threads

Let DO be the set of digital objects in our knowledge space. An annotatable object is an object that can be annotated. Let AO be the set of annotatable objects; AO is a subset of DO.

Examples of such annotatable objects, which are application-dependent, might be whole documents, document passages, sets of documents, folders, persons, locations, and other annotations. Annotatable objects might be parts of other objects. To express this, we define a special containment relation among objects: Let o,o' be digital objects. We say that o contains o', if o' is a part of o and o' ! = o. For instance, in a document a section s might be a part of chapter c. A textual passage p is contained in the corresponding document d. A selection of an image is part of its image.

The set of annotations A is a subset of DO. An annotation a is a tuple <AOa,Ta,Ma,ba> with AOa subset of AO as the set of annotatable objects the annotation refers to; due to the view of annotations as metadata [3], AOa must not be empty. Ta is the set of annotation types of a. Ma is the set of attribute-value-pairs containing metadata about the annotation. ba is the body of the annotation, which might consist of text, references, and graphics.

The definition of an annotation is flexible, covering certain annotation scenarios as the following examples show.

Examples of annotations

Figure 1: Annotation Examples

Example 1: Textual Annotations. Imagine the system lets users write (textual) comments on one or more documents. Let d1 and d2 be documents. The user Joe judges that these are good introductions to a certain topic, and expresses this judgement by writing a new text b="The papers are good introductions", so we get

a= <{d1,d2},{judgement},{(author,"joe")},b>

Going on with the example, another user, Sally, disagrees with Joe's view and expresses this by annotating the body of the first annotation:

a'= <{b},{disagreement},{(author,"sally")},b'>

with, e.g., b'="I don't think so" as the textual sign of this annotation. Note that a' actually annotates b. If a' was referring to a instead of b, this could be interpreted as if Sally disagrees that a is a judgement!

Example 2: Referential Annotations. Assume two documents d1 and d2. Joe finds a statement in a passage that contradicts another statement in another passage, so he identifies the two passages p1 part of d1, and p2 part of d2, and connects them with a reference in the annotation body:

a= <{p1},{contradiction},{(author,"joe")},"p2*">

with p2* meaning "reference to p2". Sally again disagrees with Joe's views and creates another annotation:

a'= <{a},{disagreement},{(author,"sally")},b'>

with b' something like "I don't see any contradiction here." The first annotation creates a typed link between passages. The second one annotates this link, but not the passage p2, so a' refers to a.

Example 3: Highlight Annotations. In a document d, Joe finds a passage p very interesting and highlights it.

a= <{p},{important},{(author,"joe")},"_">

("_" means that the tuple item is empty). The annotation examples are depicted in Fig. 1.

In my research I focus on nested, textual annotations. Nested annotations can support collaboration and can be the building blocks for discussion. Nested annotations together with the documents to which they refer form an annotation thread.

Annotation Thread. An annotation thread is a directed, acyclic graph G=(N,E) with N as the set of nodes and E as the set of edges between nodes. A node n represents either an annotation or a document, or parts of them; leaves in annotation threads are always documents. (n',n) is an edge if n is an annotation of n' or n' is part of n. The function l labels edges with a set of link types; let l((n',n)) = Tn' if n is element of AOn' or l((p,n))={"part_of"} if p is part of n.

2.2 Document Search with Annotations

Here we regard a scenario where a user is interested in the main documents, and not in its annotations. Retrieval functions return a ranking of documents based on their corresponding retrieval weight w.r.t. the query.

2.2.1 Retrieval Options

Several options w.r.t. document search can be identified when incorporating annotations. In principle, these are relevant for discussion search as well.

Access to documents through annotations
In this scenario, users look for documents having a certain comment, and they only remember this comment. All documents containing relevant annotations are relevant for the user.

Vocabulary problem
Authors of documents and users posing a query might use different terms to express the same concepts. In this scenario, we also deal with a third group, the annotators. The vocabulary problem might be decreased when the chances are higher that a document can be topically related to terms not contained in the document itself, but in its annotations.

Topic emphasis
We can raise the certainty that a document is about a topic if it has annotations that are about this topic as well. If, for example, users discussed a certain topic that also appears in the document, they associated the document with this topic.

Users annotating a passage found this passage important enough to spend some time annotating it (even if the annotation is negative). The more users annotate a specific passage, the more we get an n-way-consensus [16] that this passage has some value. Section 2.3.1 discusses this scenario.

2.2.2 Feature-based Indexing Approach

In a feature-based approach, a document is represented as a vector of terms with an additional relevance description, a feature vector x(t,d) for each term t in document d. While typical statistical approaches calculate P(R|t,d), the probability that d indexed with t is relevant, a feature-based approach estimates the probability P(R|x(t,d)) and uses this value as the term weight of t in d [10]. The feature vector describes certain features a term has in a document. As an example, in HTML documents, such features might be the occurrence of a term in the title of the document, or the frequency of the term in the document, or whether the term is emphasised or not [13]. When it comes to annotations, certain other term features are possible. Ongoing research has to define and apply suitable features. As an example, the occurrence of a term in the body of a direct annotation a of a document d is a possible feature.

2.2.3 Logic-based Retrieval Approach

Here we follow the notion of retrieval as uncertain inference [19]. The different semantics and pragmatics of annotations, as well as different information needs, can result in different possible retrieval functions. One simple function considers the case that annotations are seen as an extension of the document content. We say that a document is relevant to a query if at least one of its index terms is also a query term, or if there exists at least one annotation that is accessed and relevant to the query. A possible formulation of the problem in predicate logic would be

A possible formulation of the problem described above.

where qterm(q,t) states that t is a query term of q, indexterm(d,t) means that d is indexed with term t, acc(d,a) says that when accessing d, we also access a, and finally r(q,d) expresses that d is relevant to q. To adhere to the notion of information retrieval as uncertain inference, probabilistic datalog (pDatalog) can be applied for the calculation of the retrieval weights (refer to [11] for further details). pDatalog uses function-free Horn clauses and can cope with probabilistic facts and rules. This way, we can assign probabilities to facts, e.g. the weight of indexterm(d,t) might be estimated using a normalisation ntf(t,d) of the frequency of t in d. The weight of indexterm(d,t) is then the probability that d can be indexed with t. The probability of acc(d,a) can be interpreted as the probability that a user reads document d and from there also moves to a. The estimation of this probability might depend on certain values like the author of a or the total number of annotations referring to d. Based on such probabilistic facts, pDatalog calculates the probability that an event occurs as follows: Assuming that n events are independent, pDatalog computes the probability of the combination of these events as

Calculation as described immediately above

Consider a document d and a corresponding annotation a that are represented as a set of index terms with corresponding term weights; this can be expressed in pDatalog using probabilistic facts, e.g.:

0.5 indexterm(d,information)
0.3 indexterm(d,retrieval)
0.4 indexterm(a,retrieval)
0.6 indexterm(a,search)

which means that the term weight of "information" in d is 0.5, etc. Consider that we access each annotation with probability 0.8, so we get

0.8 acc(d,a).

Having the query q="information retrieval", the query is represented by the facts


(If no probability is given, we assume 1.) Applying the rules above, we get

Formula for qterm

We can also introduce probabilities for qterm(q,t), e.g., by taking the normalised inverse document frequency (ntf) of t. This way, we get a tf x idf-like weighting scheme in the model above.

2.3 Discussion Search

Two different scenarios can be identified:

  1. The user is looking for relevant annotations; the retrieval system should return a ranked list of annotations.
  2. The user is looking for relevant annotation threads and best entry points within them. This is a refinement of the first scenario: a ranked list of annotations can be clustered w.r.t. the annotation thread to which they belong. The retrieval system would return a ranked list of best entry points. This task is similar to the related tasks of XML retrieval and topic distillation.

2.3.1 Searching for Relevant Annotations

Feature-based indexing approach
This approach is similar to the one described in Subsection 2.2.2. The goal is to find suitable features considering the context provided by the annotation thread.

Logic-based retrieval approaches
Retrieval functions returning annotations should take the subjective nature of annotations into account. Annotations in a discussion often contain unproven or wrong arguments and might even just be flames. Such annotations are usually not highly rated by the user, even if their topical relevance is high. As an example, an annotation gaining lots of disagreement should be handled differently than an annotation gaining a lot of agreement. With respect to their effect on the retrieval status value, we can identify positive and negative links between annotations – for example, "agreement" would be a positive link type, and "disagreement" a negative one. A link between annotations b and a is given if there exists an edge between b and a or a passage p part of a in the annotation thread. A negative link decreases the retrieval weight of its source, whereas a positive link increases it. The categorisation of link types means that we can observe positive and negative evidence on statements made in annotations, determined by the corresponding link type. The goal is to use this evidence in a balanced form. In predicate logic, we can express this as

formula expressed in predicate logic

and state that there is a positive or negative link, respectively, between b and a. Rules 1 and 2 exploit the positive evidence they get from the document and the annotation thread, respectively, while the third rule uses negative evidence to infer that the annotation is not relevant to the query. By possibly inferring r(q,a) and !r(q,a) for an annotation a, we gain inconsistent knowledge. To cope with this, we need four-valued logics introducing new truth values besides true and false, namely inconsistent and unknown. Furthermore, we have to assume an open world, which means missing evidence for r(q,a) does not imply !r(q,a). Four-valued probabilistic datalog is a suitable means to deal with these requirements [18]. A possible realisation of the model above with four-valued probabilistic datalog is shown in [9].

An open research question is the categorisation in positive or negative link types if this information is not given explicitly. Machine-learning techniques for a corresponding classification, similar to sentiment classification [17] might be applied to estimate the probability that an annotation is positive or negative and thus provide probabilities for pos_link and neg_link, respectively.

Another possible retrieval function is based on the assumption that merely the fact that annotations refer to parts of the annotated object can be interpreted that these parts are important, even if the annotations are negative (otherwise users would not spend time to annotate them). For an annotation a, we look at the parts p of a that were annotated by some annotation b (i.e., p is element of AOb)2 . We call b a highlight annotation of a in this case. We find this situation in email discussions, where users usually quote some passage from the previous message and write some comment about it. This comment can be seen as an annotation, while the quoted passage marks the part of the previous email the annotation refers to. As discussed before, any passage p of a with p element of and b is seen as a separate digital object, and all these passages p provide a context for the annotation. A possible retrieval strategy is taking the relevance of these passages w.r.t. the query into account, yielding

forumula for an annotation

part_of(a,p)if p is a part of a and refers_to(b,p) if p is element of AOb. It is obvious that p contributes to r(q,a) in two ways – either by being part of a and thus directly contributing to r(q,a), or via r(q,p). p thus does not introduce new terms, but decreases the uncertainty that a might be relevant w.r.t. the query.

Annotations are closely connected to the object they annotate, so the content of these objects is also a context to determine the relevance of an annotation a. Assertions in annotations might not be understandable without considering the context in which they were made. Examples are again email discussions, where one (sometimes) has to read the quoted text in order to understand what an email is about. We thus have to consider if the object the annotation belongs to is relevant to the query, which might imply that the annotation is relevant itself:

formula image

In the above, considers(a,p) describes the fact that the passage p is actually considered and not neglected when reading a. P(considers(a,p)) is thus the degree to which the context given by the referred passage influences the retrieval weight. It should be noted that the models proposed here can be adapted to the problem of document search described in Section 2.2.

2.3.2 Relevant Annotation Threads and Best Entry Points

In a discussion, a single annotation might not be sufficient to satisfy a user's information need. As an example, when the user is looking for pros and cons about a specific topic, usually one single annotation will not contain all relevant information. The information the user seeks is probably scattered among several discussion articles. To satisfy his or her information need, the user has to read parts of annotation threads where the pros and cons are discussed. Therefore, the system should not return a ranked list of annotations, but provide hints for the user regarding relevant subthreads and provide the user with suitable entry points into the discussion. The idea is to post-process the retrieval status value of annotations, as it is calculated by the means discussed in Subsection 2.3.1, and find suitable ways to identify relevant subthreads and best-entry points.

2.4 Evaluation

All proposed retrieval approaches have to be evaluated w.r.t. their retrieval effectiveness. In particular, the following steps have to be performed:

  1. Set up test collections for document and discussion search, and for the classification of annotations into positive and negative ones. To set up such a collection, a snapshot of ZDNet News has been harvested, consisting of approx. 4,700 articles and 91,600 nested annotations. This collection was submitted to the INEX3 heterogeneous collection track, and example queries and relevance judgements (for both documents and annotations) are going to be created during the INEX runs. For discussion search, our group participated in the TREC Enterprise Track4 , where a test collection for enterprise search is built upon email archives from W3C. We submitted runs using both highlight annotations and the context spawned by quotations, which mark the parts of the previous email to which the annotation belongs.
  2. For discovering relevant annotation threads, traditional recall/precision metrics might not be sufficient. A single annotation might not be exhaustive enough; on the other hand, an annotation subthread returned by the system might not be specific, since it contains other topics as well. Exhaustivity and specificity are metrics well known from XML retrieval [15]. The idea is to introduce similar metrics for annotation subthread search as well, since we are interested in subthreads that are both specific and exhaustive.
  3. Further evaluate the effectiveness of the proposed methods by measuring recall and precision, and possibly exhaustivity and specificity.

3 Related Work

Since an annotation thread can be seen as a hypertext, annotation-based retrieval has a strong relation to Hypertext Information Retrieval (HIR) [4]. An example numeric HIR algorithm that is similar to the logic-based approaches proposed here is the spreading activation approach proposed by Frei and Stieger [7]. This approach also incorporates typed and negative links.

One interesting feature-based approach for discussion search is reported in [20] by Xi et al. They define several content, thread-tree and author features. Their results show that using the thread context for discussion search improves retrieval effectiveness. Xi et al. neglect quotations from previous messages.

Golovchinsky et al. use annotations for relevance feedback by only considering highlighted terms instead of whole relevant documents for document search [12]. Experiments show a gain in retrieval effectiveness against classic relevance feeback.

Fuhr and Rölleke [18] use four-valued probabilistic datalog for the retrieval of structured documents. An open issue is how to determine whether a comment is positive or negative when this information is not explicitly given. This leads to a non-topical categorization problem, which is similar to sentiment classification, where the goal is to distinguish positive from negative reviews. An interesting approach to sentiment classification, where machine learning classifiers are used, is described in [17].

The nature of annotations, their role in Digital Libraries and Collaboratories and possible annotation-based retrieval functions are discussed in [3, 1, 8, 16]. These reflections about annotations can be the basis for defining intelligent retrieval methods exploiting the annotation context. Agosti and Ferro see annotations as context for document search and describe retrieval methods based on HIR and data fusioning [2].


[1] Maristella Agosti and Nicola Ferro. Annotations: Enriching a digital library. In Constantopoulos and Sølvberg [6], pages 88-100.

[2] Maristella Agosti and Nicola Ferro. Annotations as context for searching documents. In Fabio Crestani and Ian Ruthven, editors, Information Context: Nature, Impact, and Role: 5th International Conference on Conceptions of Library and Information Sciences, CoLIS 2005, volume 3507 of Lecture Notes in Computer Science, pages 155-170, Heidelberg et al., June 2005. Springer.

[3] Maristella Agosti, Nicola Ferro, Ingo Frommholz, and Ulrich Thiel. Annotations in digital libraries and collaboratories - facets, models and usage. In Rachel Heery and Liz Lyon, editors, Research and Advanced Technology for Digital Libraries. Proc. European Conference on Digital Libraries (ECDL 2004), Lecture Notes in Computer Science, pages 244-255, Heidelberg et al., 2004. Springer.

[4] Maristella Agosti and Alan F. Smeaton, editors. Information Retrieval and Hypertext. Kluwer Academic Publishers, Boston et al., 1996.

[5] H. Brocks, A. Stein, U. Thiel, I. Frommholz, and A Dirsch-Weigand. How to incorporate collaborative discourse in cultural digital libraries. In Proceedings of the ECAI 2002 Workshop on Semantic Authoring, Annotation & Knowledge Markup (SAAKM02), Lyon, France, July 2002.

[6] Panos Constantopoulos and Ingeborg T. Sølvberg, editors. Research and Advanced Technology for Digital Libraries. Proc. European Conference on Digital Libraries (ECDL 2003), Lecture Notes in Computer Science, Heidelberg et al., 2003. Springer.

[7] H. P. Frei and D. Stieger. The use of semantic links in hypertext information retrieval. Information Processing and Management, 31(1):1-13, January 1994.

[8] I. Frommholz, H. Brocks, U. Thiel, E. Neuhold, L. Iannone, G. Semeraro, M. Berardi, and M. Ceci. Document-centered collaboration for scholars in the humanities – the COLLATE system. In Constantopoulos and Sølvberg [6], pages 434-445.

[9] Ingo Frommholz, Ulrich Thiel, and Thomas Kamps. Annotation-based document retrieval with four-valued probabilistic datalog. In Thomas Roelleke and Arjen P. de Vries, editors, Proceedings of the first SIGIR Workshop on the Integration of Information Retrieval and Databases (WIRD'04), pages 31-38, Sheffield, UK, 2004.

[10] N. Fuhr and C. Buckley. A probabilistic learning approach for document indexing. ACM Transactions on Information Systems, 9(3):223-248, 1991.

[11] Norbert Fuhr. Probabilistic Datalog: Implementing logical information retrieval for advanced applications. Journal of the American Society for Information Science, 51(2):95-110, 2000.

[12] Gene Golovchinsky, Morgan N. Price, and Bill N. Schilit. From reading to retrieval: freeform ink annotations as queries. In Proceedings of the 22nd International Conference on Research and Development in Information Retrieval, pages 19-25, New York, 1999. ACM.

[13] Norbert Gövert, Mounia Lalmas, and Norbert Fuhr. A probabilistic description-oriented approach for categorising Web documents. In Susan Gauch and Il-Yeol Soong, editors, Proceedings of the Eighth International Conference on Information and Knowledge Management, pages 475-482, New York, 1999. ACM.

[14] Kalervo Järvelin, James Allen, Peter Bruza, and Mark Sanderson, editors. Proceedings of the 27st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, 2004. ACM.

[15] Gabriella Kazai, Mounia Lalmas, and Arjen P. de Vries. The overlap problem in content-oriented XML retrieval evaluation. In Järvelin et al. [14], pages 72-79.

[16] Catherine C. Marshall. Toward an ecology of hypertext annotation. In Proceedings of the ninth ACM conference on Hypertext and hypermedia: links, objects, time and space - structure in hypermedia systems, pages 40-49, 1998.

[17] Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 79-86, 2002.

[18] T. Rölleke and N. Fuhr. Retrieval of complex objects using a four-valued logic. In Proceedings of the 19th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 206-214, New York, 1996. ACM.

[19] C. J. van Rijsbergen. A non-classical logic for information retrieval. The Computer Journal, 29(6):481-485, 1986.

[20] Wensi Xi, Jesper Lind, and Eric Brill. Learning effective ranking functions for newsgroup search. In Järvelin et al. [14], pages 394-401.


1. This formalisation should be seen as a preliminary one and subject to further discussion.

2. It is an open question what it means for our scenario if annotations refer to a whole document instead of just a part of it.

3. INitiative for the Evaluation of XML Retrieval, <>.

4. TREC Enterprise Track, <>.


© Copyright 2006 Ingo Frommholz

Top | Contents
Previous Article
Next Article
Home | E-mail the Editor