The Cl@rity Program is a revolutionary application of the Hellenic Public Sector. All citizens are able to access the public documents, laws and official decisions of government agencies, public institutions and independent public authorities from a single website.
Nevertheless, despite the great effort put in the Cl@arity Program, there is still room for improvement. Many added value services based on the great Cl@rity Open Data API can be created. One of these applications is ΥπερΔιαύγεια (a.k.a. SuperCl@rity).
SuperCl@rity provides full text search indexing and searching capabilities in the content of all documents published through the Cl@rity program (currently, arround 6 million items). This way, users can perform better searches and discover content that would not have been found otherwise, as Cl@rity does not support this feature but is restricted only to metadata search (and most recently google site search). SuperCl@rity advanced features include:
- Full text index of all documents published in Cl@rity.
- Optical Character Recognition (OCR): Many documents included in Cl@rity are PDFs containing images only (e.g. scanned Fax documents). This issue hinders text indexing and renders Cl@rity search incapable of functioning properly. To resolve this problem, an OCR text extraction facility was implemented using Tesseract OCR open source software.
- Document preview for faster searching. Users do not need to download large PDF files in order to find out after all that they are looking at the wrong document. The Preview of the 1st page of all documents accelerates searching, results browsing and selection of the most relevant item.
- Advanced search filters: per organization, document type, signer and publication date.
- Open Access APIs to SuperCl@rity via OpenSearch and OAI-PMH protocols: Anyone is free to reuse SuperCl@rity content.
To conclude, we have to admin that the Cl@rity program is a challenge to any programmer who feels that he is a responsible & active citizen in our society. The publication of public data in Cl@rity is only the first step in their reuse and exploitation towards innovation and transparency in Greek Public Sector. There are also some other good initiatives such as the Cl@rity Link Data project http://thedatahub.org/dataset/diavgeia and Greek Spending Visualization http://www.greekspending.com/ but this is only the beginning.
For more information regarding SuperCl@rity, check out this SlideShare presentation.