Having developed the Greek DBpedia, OKFN Greece and the OKFN’s Open Linguistic working group in specific, is now involved in the development the Greek linguistic Linked Open Data by introducing the Greek DBpedia Spotlight.
DBpedia Spotlight is an application that automatically spots and disambiguates words or phrases of text documents that might be sources of DBpedia and annotates them with DBpedia URIs. DBpedia Spotlight functions in three stages:
- Spotting stage: In this stage the application spots words or phrases that may indicate mentions of DBpedia resources.
- Cadidate selection stage: In this stage the DBpedia resource names retrieved for the spotted words/phrases, are mapped to candidate disambiguations.
- Disambiguation stage: In this stage, the text around each spotted term is used as information to select the most likely disambiguation among the cadidates selected in the previous stage.
Another offline training and configuration stage is necessary to precede the above stages for the extraction of an extended lexicon and the creation of an index using DBpedia and Wikipedia data. The extended lexicon and the created index include the following:
- DBpedia Labels created from Wikipedia page titles
- DBpedia Redirects that indicate synonyms or alternative expressions of terms described by specific URIs
- DBpedia Disambiguations. These terms can be combined with many URIs, so it is rather ambiguous which URI is the most suitable to dismbiguate them.
- Wikilinks that is the page links that interconnect the Wikipedia articles and whose anchor text contains terms of the above categories. Along with each wikilink extracted, the paragraph representing the context of the wikilink’s occurence is also pre-processed.
DBpedia spotlight implements the Aho-Corasick string matching algorithm in the spotting stage described above along with the use of Apache Lucene over the index built in the offline training / configuration stage. For the disambiguation of the spotted words/phrases, a VSM representation of the DBpedia resources is used along with a variant of the TF-IDF technique for determining the weight of words based on their ability to distinguish between candidates of a given term.
In addition, some configuration parameters are offered for tuning throught the web interface of the application. The user can:
- Exexute SPARQL queries in order to narrow down the disambiguations to the ones they are only instarested in e.g. Cities with population over 100,000.
- Specify the classes they are only interested in (e.g. persons, cities, etc). DBpedia Spotlight makes full use of the DBpedia ontology in order to disambiguate only terms of the selected classes (this feature is not yet available in the Greek DBpedia Spotlight).
- Set the following tuning parameters:
a. Confidence: It ranges from 0 to 1. The larger the value of confidence is, the stricter and more selective the application becomes as far as the disambiguation process is concerned, taking into account the topical pertinence of the words/terms and the overall ambiguity of the text.
b. Contextual score: It ranges from 0 to 1. When large values of contextual score are selected, the application does not annotate terms with little topical pertinence.
c. Support: It is used to specify the minimum number of inlinks a DBpedia resource has to have in order to be annotated.
The Greek DBpedia Spotlight with full compatibilty with the Greek characters, encoded in UTF-8, was implemented by graduate student Ioannis Avraam under the supervision of Dr. Charalampos Bratsas, coordinator of OKFN: Greece. The project was organised by the OKFN Greece in coordination with the Web Science Master Program of the Aristotle University of Thessaloniki. The Greek DBpedia spotlight is deployed as a Web service and features a user interface at http://dbpedia-spotlight.okfn.gr/. The source code is open and available inder Apache license V2 at https://github.com/iavraam/dbpedia-spotlight.git (dbpediaSpotlight_el branch).
- Kontokostas D., Bratsas C., Auer S., Hellmann S., Antoniou I., Metakides G., 2012, Internationalization of Linked Data. The case of the Greek DBpedia edition. In the Journal of Web Semantics: Science, Services and Agents on the World Wide Web, Volume 15, Sept 2012, pp. 51–61, http://dx.doi.org/10.1016/j.websem.2012.01.001.
- Status Quo and Perspectives, by Christian Chiarcos and Sebastian Hellmann