Volume 4 Issue 1
Spring 2008
ISSN 1937-7266

Merging Gazetteers

Øyvind Vestavik and Ingborg T. Sølvberg

Norwegian University of Science and Technology
{oyvindve, ingeborg}@idi.ntnu.no

In geographic information retrieval we need gazetteers in order to reason about place names occuring in textual documents. There are many gazetteers available with different scope, granularity and structure. Given the diversity of gazetteers and the fact that there is no global identifiers for places, merging such gazetteers requires a mapping of formats (schema level mapping) in order to identify fields/attributes holding the same kind of information and an instance level mapping to identify records in gazetteers beeing merged descibing the same place.

We propose an instance level mapping based on four heuristics: 1) Geographic proximity, 2) Similar name(s), 3) same or near same kind of place (feature type) and 4) common administrative regions.

For each pair of gazetteer entries in the gazetteers beeing merged we calculate an average score based on the heuristics above. Mappings for pairs of entires that obviosly cannot be describing the same place given one of the heuristics abouve are descarded. From the rest of the mappings best matches are selected and mappings with very low overall score are discared.

Challenging aspect of the mapping procedure includes names being spelled differently in different languages, including transcirption of language-specific characters, different names being userd for the same place in different languages, culturally biased classification of feature types and changing administrative borders and structures.

For more information on this work, please visit www.idi.ntnu.no/~oyvindve

