Volume 5 Issue 3
Winter 2009
ISSN 1937-7266

Web Resource Categorization Using Social Annotations as Metadata

Sue Yeon Syn

University of Pittsburgh
School of Information Sciences
135 N. Bellefield Ave.
Pittsburgh PA 15260


The emergence of social annotation systems opens up the potential to involve large numbers of non-expert web users in the metadata generation process. Annotations that are descriptive of resources can be used as metadata. This research examines how to make effective use of user created annotations in classifying web resources. As social annotations are not controlled, it is significant to find quality annotations to represent the content of resource. The goal of this proposal research is to find the best set of annotations representing the resource out of the tag sets provided by users to categorize web resources.

Categories and Subject Descriptors

H.3.7 [Information Storage and Retrieval]: Digital Libraries – System issues

General Terms:

Measurement, Performance, Experimentation.


Social Annotation, Web Resource Categorization, Metadata, Measurement.


Metadata for web resources is one of the bases of the Semantic Web. Generating metadata for web resources is costly and does not scale well when human catalogers are involved. Research on automatic metadata generation has shown that when human effort is reduced, the quality of metadata is sacrificed. Some studies turned their focus to the possibility of letting novice catalogers participate in metadata generation process [20, 21]. However, it is difficult to motivate non-professional users to create metadata information. In the past few years, social annotation systems have gained popularity among web users as a method of organizing, filtering, and retrieving web resources. Social annotation systems have successfully involved users in the tagging process by providing services to motivate and also benefit users to tag, for example bookmarking favorite links, organizing and sharing pictures, and getting recommendations on related web pages. Some researchers have already identified the value of social annotations as the power of people and power to the people [17]. Annotations as descriptive terms of resources can be used as metadata.

A previous study has shown how semi-automated systems can allow novices to participate in metadata generation process [20]. While tools improve the quality of metadata produced by novices, in comparison with experts, novices were less stable in generating proper semantic metadata, i.e. keywords and subject classification. With the increasing popularity on social annotation systems, the potential in using social annotation as a source of metadata is being explored. Social annotation systems can simplify the involvement of a large number of users and improve the metadata generation process, especially for semantic metadata. By using social annotations as a type of metadata, this research aims to find a method to classify web resources. In this research, social annotation systems are considered as a source for non-professional catalogers’ participation into the metadata generation process, and social annotations are considered as a type of metadata on web resources. The question arises as to whether tools mining social annotations can enable less skilled classifiers to generate quality metadata. Because social annotations are not controlled vocabulary, there are still problems in finding quality terms to represent the content of a resource. This research examines ways to deal with those problems to gain a better set of annotations representing the resource from the tags provided by users.


Social Annotations

Annotation systems allow users to tag or categorize different types of resources. Annotations in annotation systems are generally a one word description of the resource. Users benefit from annotation systems as they help users find resources easily with assigned tags/keywords and organize resources better. Social annotation systems let users share their annotations with other users. By socializing the tagging activity and tagged resources, it is not only possible to share users’ resources, but also to share tags. Users can categorize or assign keywords to resources of similar content. Users can share the tags or systems can suggest tags from other users. These functions help users to use common terms within specific domains. It is also possible to form user groups with shared interests by sharing resources and collaboratively creating tags. Shared annotations make it possible to use tags for better resource finding. Different types of social annotation systems are being developed, some for electronic resources (e.g. Del.icio.us is a social bookmarking system for web pages; Flickr is a social tagging system for image sharing) and others for non-electronic resources (e.g. CiteULike is a bibliography sharing system that focuses on academic research papers; and LibraryThing is a tagging system for books and publications).

Researchers are beginning to look at ways that social annotations might be used. In general, social annotation systems are based on a collection of 3-tuples consisting of users, tags, and resources [6, 7, 9, 11, 13, 23]. One stream of research relates to improving information retrieval (IR). There are many possible uses of social annotations to improve search results. First of all, one may consider annotations as one type of index for documents. Although using tags as an index does not fully solve linguistic problems of full text indexing, tags are expected to provide more precise semantic information with shared agreement, and can be used to index or rank web resources [2, 3, 4, 13, 19, 21]. Second, annotations can be used to build ontologies as a part of the Semantic Web [13, 16, 22]. Since annotations provide semantic information about web resources, it may be possible to extend and organize tags into ontologies. In the information retrieval and the Semantic Web domain, from the 3-tuples, tag-resource elements are more focused. Annotations can also be used to form community networks. This kind of research emphasizes the social aspect of tagging systems [11, 13, 16, 22]. The ease of tag input in many social annotation systems encourages web users to participate in the tag creation process. Since tags are assigned to a resource by different users collaboratively, the triple association and networks can be formed from the linkage of elements of the triple model. While the tag-resource sets are more critical in retrieval research, the user element of the triple becomes very significant in network research.

Web Resource Categorization

Categorization of web resources has evolved as one method to improve web information retrieval along with full-text indexing. Up to now, controlled vocabulary and natural language processing are the most widely used methods for web resource categorization. A controlled vocabulary can address the shortcomings of full-text indexing. However, it cannot be deployed in a scalable fashion due to a lack of qualified professionals and the sheer number of resources that need to be classified. Natural language processing, such as clustering, helps categorization done by a machine. This automates the process of controlled vocabulary generation but introduces other problems related to semantics. From the Semantic Web point-of-view, tags can play a role as a type of annotation providing semantic information about the web resources for categorization.

There is growing interest in determining if annotations can be used as a type of metadata useful in web resource classification [10, 12, 13, 14, 17, 19]. Macgregor and McCulloch [10] argued that social annotation systems let users participate in the organization of web resources, and make it possible to lower the cost of web resource metadata creation. Noll and Meinel [13, 14] have examined tags by comparing with web document metadata, i.e. HTML Meta tags, to define characteristics of tags in terms of metadata and web document classification. Quintarelli [17] introduced folksonomy as one type of user-generated classification that emerges through bottom-up consensus. In using annotations, involvement by the public is considered important, although some trade-off between quality of metadata and metadata ecology is necessary. Since users input annotations without any restriction, terms used for tags may contain misspelled terms, compound terms, single and plural forms, personal tags, and single-use tags. Although annotations may be used that have a meaning known only to their creator, there are clearly some tags that have shared social meaning as well [5]. Shirky [19] discusses how tags should be organized to produce meaning. Annotations can be applied as raw keywords that represent the user’s resource description. Rethlefsen [18] proposes structuring tags when representing them to users to let them benefit from it effectively. Related to concerns about tag quality when used as metadata, the results from the steve.museum study [21] showed that the terms provided by non-specialists for museum collections are positive. It demonstrated that using tags assigned by general users might help bridge the semantic gap between professional discourse and the popular language of the museum visitor. Therefore, the results also supported using tags and folksonomies as metadata. Guy and Tonkin [5] discuss how to improve tag quality and how to educate tag creators to make use of folksonomy metadata. They suggested that providing users with helpful heuristics and introducing structure within tags might encourage users to select and create good tags.


Followings are the research questions of this study.

  1. How can the quality of annotations be measured?
  2. How can the annotation noise be reduced?
  3. How can the optimal set of annotations for web resource classification be determined from a tag set?
  4. Can a subset of annotations be found that provide quality metadata for classification?


As the goal of this study is to classify web resources rather than retrieve or rank them, we want to select important annotations (meta-terms) and remove meaningless ones (noise) from the tag set. Several preliminary observations were made to find a method to determine the better annotations to use to represent a resource, namely annotations as metadata.

Observation 1. Social annotation systems allow users to input a term at a time. Therefore annotations with multiple terms are often input as a single complex term (e.g. “semantic” and “web”) or compound terms (e.g. “semanticweb”, “semantic-web”, “semantic_web”).

Observation 2. A user can create only one tag set for a document.

Observation 3. A user can assign a term as an annotation only once in a tag set for a resource. That is, annotation Ai cannot be assigned multiple times by user Ul for resource Rj, nor can a user explicitly weight the importance of an annotation for a resource.

Observation 4. Social annotation includes terms that are idiosyncratic as users input annotations for personal use. Examples include personal notes (e.g. “*****”) and compound words (e.g. “toread”), that does not provide good metadata information for the resource.

With the information provided from social annotation systems, possible factors to consider are the order of tag input and the time a tag was created. The basic assumption on tag order is that the sooner a tag appears the more important the tag might be. Golder and Huberman [4] argued that the position of a tag and its frequency are related – frequently used tags will appear before less frequently used tags. The first annotation appearing in a tag set should be expected to be the most important annotation for describing the document. The time a tag was created in a tag set may have an impact on tag frequency since users can accept annotations that were already assigned by other users. Therefore the first user who bookmarked a document with some annotation can influence the next users in terms of selecting certain terms as annotations. Both the order of tag input and the time of tag input may impact the decision on the importance of tags. Despite the potential of influence by these factors, we find it hard to define any relation on the impact of them. Therefore, we consider the frequency of an annotation as one simple factor to determine proper annotations of a document. However, simple frequency may not be the best measure of significance of an annotation.

We introduce the concept of annotation dominance (AD) as one way to measure the importance of an annotation. Before turning to what we mean by annotation dominance, it is useful to review some of the things we know. As explained above, social annotation systems generate relationships with resources, annotations and users. The 3-tuples relationships along with our observations can explain the following. (1) Number of times an annotation is associated with a resource can be understood as frequency of a tag. The highest frequency of an annotation for a resource cannot exceed the number of users who bookmarked the resource. (2) Number of resources with an annotation is the portion of documents that has an annotation in the collection. (3) Number of users bookmarking a resource equals the number of tag sets for a resource. (4) Number of users using an annotation is the portion of users who use a term as an annotation from the whole user group. These can be considered as the factors related to the annotation dominance. We begin to think about annotation dominance by thinking about term weighting in information retrieval: could a metric similar to TF-IDF be applied and adjusted to find representative terms in the tag set? Annotation Dominance (AD) is a way of measuring how often the annotation is used related to a resource. Considering that an annotation can be associated with a resource by a user only a single time, AD provides the importance of an annotation in a document. It is formulized as

where A is annotation and R is resource. Annotation Dominance should reflect the difference in importance of annotations when distribution of annotations on a resource is different. For example, Figure 1 shows different cases of tag distribution for a document. The first case has two annotations with equal frequency, and the second case has two annotations with one dominant annotation (annotation A) and another very low frequent annotation (Annotation B). In the first case both annotations are equally important, whereas in the second case only annotation A is important. Therefore the important of annotation A in case 1 and in case 2 should be treated differently, although annotation A was assigned 10 times for both resources. Obviously, many other situations are possible making the development of a simple yet comprehensive heuristic difficult.

Another metric, Cross Resource Annotation Usage (CRAU) is considered as a means to offset weight of general annotations. CRAU is formulized as

where A is annotation and R is resource. The Cross Resource Annotation Usage is designed to remove annotations that are used broadly in the document corpus. If an annotation is assigned for every document in the collection, we consider it to be a weak candidate as an annotation to clarify the document classification. The Cross Resource Annotation Usage measure gives a lower score for a general annotation and gives a higher score to a specific annotation.

Our preliminary test with over 1,700 web resources from Delicious showed that AD effectively represents annotation dominance by frequency, namely the more agreed an annotation is by users the higher AD it gets. On the other hand, CRAU gives a high score when an annotation is less used in the collection. However, specific annotations are not always topic-specific or domain-specific terms. Annotations with high CRAU contain idiosyncratic terms or personalized terms that are considered not to be useful in representing the topic category of resource content. Thus we modified CRAU to remove idiosyncratic terms defined as CRAU’.

, where A is annotation, R is resource, and U is user. With the last part added to CRAU, CRAU’ penalizes idiosyncratic annotation by giving weight 0 to the annotation that appears once in only one resource in the collection by only one user. In doing so, CRAU’ removes the long tail of the tag distribution. For instance, from Figure 1, CRAU’ gets rid of some portion of annotations that occurs once, i.e. in case 2, the annotations that get 0 score by CRAU’ measure are annotations from B to J.

From our preliminary dataset from Delicious, we observed the impact of introduced metrics in selecting important annotations for classification. Table 1 show an example of one web page from our dataset with top ranked annotations by AD, AD×CRAU’, and CRAU’.

Table 1. Annotations from a Webpage Titled
“Online Storage: Online Storage Feature-by-Feature Comparison Chart

From the table, it can be observed that AD×CRAU’ appears to give somewhat better selection of annotations than AD solely showing that CRAU’ may have a positive impact on removing generally used annotations such as “web” in this example. In addition, the CRAU’ column shows that after removing idiosyncratic annotations, annotations with high CRAU’ contained some interesting types of terms such as proper nouns (e.g. “drop.io” and “box.net” in this example) and compound terms (e.g. “filestorage”, “onlinestorage”, etc. in this case). These compound terms are interesting and may provide a rich source of data for classification. We observe that the terms appear to form a compound term with a ‘subcategory’ and ‘supercategory’ structure. To be specific, all compound terms with “storage” in this example reflect types of storage, i.e. file storage and online storage.


From the example shown in Table 1, we consider that if we can identify the relationship by splitting compound terms, the high CRAU’ scored annotations can provide relevant terms with relations. To find an optimal terms using AD and CRAU’ measures, we propose to make use of those types of terms to define categories by generating a heuristics and metrics, named as classification potential (CP) measure.

To determine compound terms, a dictionary can be used to find if an annotation is made of two or more words. They can be noun-noun combination or adjective-noun combination. Further analysis can be performed to define the relationship of compound words. By determining related annotations among the tag set or from compound annotations, it may be easier to find the domain or categorize the resources. The related tags can be defined (1) by finding synonyms from dictionaries or thesauri, and (2) by relating annotations that co-occurs frequently in the collection. Detailed analysis on defining relationship with super and sub concepts would be helpful in bringing different concept levels out to classify resources from non-hierarchical tag sets. Proper nouns include names of sites, products, persons, places, concepts, etc. that can be possible candidates of instances of categories. Certain level of proper nouns can be easily discovered (1) by dictionary search. Others can be determined (2) by URL check for the name of website, and (3) web search for specific names.

With further analysis of high CRAU’ weight terms, AD-CRAU’ weighting can determine the subset of tag set that consists of annotations useful in classifying a resource. AD-CRAU’ weighting will define (1) high frequency terms appearing as annotations in a resource, (2) specific terms representing a resource, and (3) related annotations (possibly with levels). The selected annotations should be representative metadata for a resource and therefore can be useful to classify resources.


The goal of this research is to find optimal annotations to categorize a resource from its tag set. It is expected that the metrics, AD and CRAU’, will generate an annotations set optimized for categorization of web resources. For current stage, two methods are suggested for evaluation.

Method 1: Relevant Judgment

To evaluate their performance, we will evaluate the proposed annotation set to see whether experts agree as to its value in classifying resources. The dataset for evaluation will contain both metadata by experts and annotations by novice users. Data will be gathered based on a dataset that provides professionally created metadata, i.e. keywords or categories, for example Open Directory Project. Based on the dataset, annotation information can be crawled from social annotation systems such as Delicious. The proposed annotations set will be compared with the professionally generated metadata. Catalogers, as experts, will rate how well each term represents the resource. The terms from professionally created metadata and user assigned annotations will be provided in a random order to the subjects. Agichtein et al. [1] suggested modified Discounted Cumulative Gain (DCG) as a means to assess retrieval ranking - called Normalized Discount Cumulative Gain at K (NDCG at K). NDCG is based on a prior work by Jarvelin and Kekalainen [8]. Basically, human judges rate on how relevant each retrieval result is on an n-point scale. We will perform a similar ranking, but in this case, based on the relevance of each of the randomly proposed classification terms for the given resource.

Method 2: Cluster Similarity

To evaluate the performance of suggested annotations set, we will evaluate categories of web resources created based on the proposed annotation set to see whether their categories are competitive with automatically created and/or human created ones. The dataset for evaluation will contain categorization by machines and/or experts and annotations by novice users. Data will be gathered based on a dataset that provides categories, for example Open Directory Project and Yahoo. Based on the dataset, annotation information can be crawled from social annotation systems such as Delicious. The categorization of collected web resources based on existing categories and categories created by proposed annotations will be compared.


This paper outlines a proposal work for classifying web resources with social annotations. There has been much effort to make use of annotations with the emergence of social annotation systems. Many have introduced research on improving ranking retrieval results and building network with a good use of characteristics of social annotations. With the insight of the need and the difficulty in generating metadata for web resources, this research by applying social annotations as a method to generating metadata is expected to simplify the process of web resource classification.


[1] Agichtein, E., Brill, E., and Dumais, S. 2006. Improving Web Search Ranking by Incorporating User Behavior Information. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Seattle, WA, USA, August 06-11, 2006).
[2] Choochaiwattana, W. 2008. Using Social Annotations to Improve Web Search. Doctoral Thesis. University of Pittsburgh.
[3] Choochaiwattana, W. and Spring, M. B. 2009. Applying Social Annotations to Retrieve and Re-rank Web Resources. In Proceedings of International Conference on Information Management and Engineering (Kuala Lumpur, Malaysia, April 03 - 05, 2009). ICIME 2009.
[4] Golder, S. A., and Huberman, B. A. 2006. Usage patterns of collaborative tagging systems. Journal of Information Science, 32, 2 (Apr. 2006), 198-208.
[5] Guy, M. and Tonkin, E. 2006. Folksonomies – Tidying up Tags? D-Lib Magazine, 12, 1 (Jan. 2006), http://www.dlib.org/dlib/january06/guy/01guy.html
[6] Hotho, A., Jaschke, R., Schmitz, C., and Stumme, G. 2006. Information Retrieval in Folksonomies: Search and Ranking. In Proceedings of the 3rd European Semantic Web Conference (Budva, Montenegro, June 11 - 14, 2006). 411-426.
[7] Hotho, A., Jaschke, R., Schmitz, C., and Stumme, G. 2006. Trend Detection in Folksonomies. In Proceedings of First International Conference on Semantics and Digital Media Technology (Athens, Greece, December 06 - 08, 2006). SAMT 2006. 56-70.
[8] Jarvelin, K. and Kekalainen, J. 2000. IR evaluation methods for retrieving highly relevant documents. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development on Information Retrieval (Athens, Greece, July 24-28, 2000).
[9] John, A. and Seligmann, D. 2006. Collaborative Tagging and Expertise in the Enterprise. Collaborative Web Tagging Workshop in the 15th International World Wide Web Conference (Edinburgh, Scotland, May 23 - 26, 2006). WWW2006.
[10] Macgregor, G. and McCulloch, E. 2006. Collaborative tagging as a knowledge organization and resource discovery tool. Library Review, 55, 5. 291-300.
[11] Marlow, C., Naaman, M., Boyd, D., and Davis, M. 2006. HT06, Tagging Paper, Taxonomy, Flickr, Academic Article, ToRead. In Proceedings of the17th Conference on Hypertext and Hypermedia 2006 (Odense, Denmark, August 22 - 25, 2006). 31-40.
[12] Mathes, A. 2004. Folksonomies – Cooperative Classification and Communication Through Shared Metadata. Online Report. http://www.adammathes.com/academic/computer-mediated-communication/folksonomies.pdf
[13] Mika, P. 2007. Ontologies Are Us: A Unified Model of Social Networks and Semantics. Journal of Web Semantics, 5, 1 (Mar. 2007). 5-15.
[14] Noll, M. G. and Meinel, C. 2007. Author vs Readers – A Comparative Study of Document Metadata and Conetent in WWW. In Proceedings of the 2007 ACM symposium on Document Engineering (Winnipeg, Manitoba, Canada, August 28 – 31, 2007). DocEng 2007. 177-186.
[15] Noll, M. G. and Meinel, C. 2008. Exploring Social Annotations for Web Classification. In Proceedings of the 2008 ACM symposium on Applied Computing (Fortaleza, Ceara, Brazil, March 16 – 20, 2008). SAC 2008. 2315-2320.
[16] Ohmukai, I., Hamasaki, M., and Takeda, H. 2005. A Proposal of Community-based Folksonomy with RDF Metadata. In Proceedings of the ISWC 2005 Workshop on End User Semantic Web Interaction (Galway, Ireland, November 7, 2005). ISWC 2005.
[17] Quintarelli, E. 2005. Folksonokies: power to the people. Online Report. ISKO Italy-UniMIB meetings (Milan, Italy, June 24, 2005). http://www.iskoi.org/doc/folksonomies.htm
[18] Rethlefsen, M. L. 2007. Tags Help Make Libraries Del.icio.us: Social bookmarking and tagging boost participation. Library Journal, 15 (Sep. 2007). http://www.libraryjournal.com/article/CA6476403.html
[19] Shirky, C. 2005. Ontology is Overrated: Categories, Links, and Tags. http://www.shirky.com/writings/ontology_overrated.html
[20] Syn, S.Y. and Spring, M.B. 2008. Can a system make novice users experts?: Analysis of metadata created by novices and experts with varying levels of assistance. Int. J. Metadata, Semantics and Ontologies, 3, 2. 122–131.
[21] Trant, J. 2006. Exploring the potential for social tagging and folksonomy in art museums: proof of concept. New Review of Hypermedia and Multimedia, 12, 1 (Jun. 2006). 83-105.
[22] Wu, H., Zubair, M., and Maly, K. 2006. Harvesting Social Knowledge from Folksonomies. In Proceedings of the 17th Conference on Hypertext and Hypermedia 2006 (Odense, Denmark, August 22 - 25, 2006). 111-114.
[23] Wu, X., Zhang, L. and Yu, Y. 2006. Exploring Social Annotations for the Semantic Web. In Proceedings of the 15th International World Wide Web Conference (Edinburgh, Scotland, May 23 - 26, 2006). WWW2006. 417-425.