RDF Graph Summarization Based on Approximate Patterns

Mussab Zneika, ETIS Lab, ENSEA - University of Cergy-Pontoise - CNRS, Cergy, France
Claudio Lucchese, HPC Lab., ISTI-CNR, Pisa, Italy
Dan Vodislav, ETIS Lab, ENSEA - University of Cergy-Pontoise - CNRS, Cergy, France
Dimitris Kotzinos, ETIS Lab, ENSEA - University of Cergy-Pontoise - CNRS, Cergy, France

Apr. 20 2016

Short paper accepted at ISIP ’15: PostProceeding of the 9th International Workshop on Information Search, Integration and Personalization [1].

Abstract. The Linked Open Data (LOD) cloud brings together information described in RDF and stored on the web in (possibly distributed) RDF Knowledge Bases (KBs). The data in these KBs are not necessarily described by a known schema and many times it is extremely time consuming to query all the interlinked KBs in order to acquire the necessary information. But even when the KB schema is known, we need actually to know which parts of the schema are used. We solve this problem by summarizing large RDF KBs using top-K approximate RDF graph patterns, which we transform in an RDF schema that describes the contents of the KB. This schema describes more accurately the KB even in the cases when a schema exists because it describes the actually used schema, which corresponds to the existing data. We add information on the number of various instances of the patterns, thus allowing the query to estimate the expected results. That way we can then query the RDF graph summary to identify whether the necessary information is present and if it is present in significant numbers whether to be included in a federated query result.

References

[1]   Mussab Zneika, Claudio Lucchese, Dan Vodislav, and Dimitris Kotzinos. Rdf graph summarization based on approximate patterns. In ISIP ’15: PostProceeding of the 9th International Workshop on Information Search, Integration and Personalization. Springer CCIS Series, 2015.

Share on