Summarizing Linked Data RDF Graphs Using Approximate Graph Pattern Mining
Accepted at EDBT ’16: the 19th International Conference on Extending Database Technology .
Abstract. The Linked Open Data (LOD) cloud brings together information described in RDF and stored on the web in (possibly distributed) RDF Knowledge Bases (KBs). The data in these KBs are not necessarily described by a known schema and many times it is extremely time consuming to query all the interlinked KBs in order to acquire the necessary information. To tackle this problem, we propose a method of summarizing large RDF KBs using top-K approximate RDF graph patterns, which we transform in an RDF schema that describes the contents of the KB. We also add information on the number of various instances of the patterns. Thus we can then query the RDF graph summary to identify whether the necessary information is present and if it is present in signiﬁcant numbers whether to be included in a federated query result.