Now, We Open Up the Power
In our recent installments we have been looking at how to search — ultimately, of course, related to how to extract — information from our knowledge graph, KBpedia, and the various large-scale knowledge bases to which it maps, such as Wikipedia, DBpedia, and Wikidata. We’ve seen that owlready2 offers us some native search capabilities, and that we can extend that by indexing additional attributes. What is powerful about knowledge graphs, however, is that all nodes and all edges are structural from the get-go, and we can easily add meaningful structure to our searches by how we represent the pieces (nodes) and by how we relate, or connect, them using the edges.
Today’s knowledge graphs are explicit in organizing information by structure. The exact scope of this structure varies across representations, and certainly one challenge in getting information to work together from multiple locations and provenances is the diversity of these representations. Those are the questions of semantics, and, fortunately, semantic technologies and parsers give us rich ways to retrieve and relate that structure. So, great, we now have structure galore! What are we going to do with it?
Well, this structured information exists, literally, everywhere. We have huge online structured datestores, trillions of semi-structured Web pages and records, and meaningful information and analysis across a rich pastiche of hierarchies and relationships. What is clear in any attempt to solve a meaningful problem is that we need much external information as well as much grounding in our internal circumstances. Problem solving can not be separated from obtaining and integrating meaningful information.
Thus, it is essential that we be able to query external information stores on an equivalent basis to our local ones. This equivalence requires both internal and external sources be structured and queriable on an equivalent basis, which is where the W3C-enabled standards and SPARQL come in.
The Role of SPARQL
I think one can argue that the purpose of semantic technologies like RDF and OWL is to enable a machine-readable format for human symbolic information. As a result, we now have a rich suite of standards and implementations using those standards.
The real purpose, and advantage, of SPARQL is to make explicit all of the structural aspects of a knowledge graph to inspection and query. Because of this intimate relationship, SPARQL is more often than not the most capable and precise language for extracting information from ontologies or knowledge graphs. SPARQL, pronounced “sparkle”, is a recursive acronym for SPARQL Protocol and RDF Query Language, and has many syntactical and structural parallels with the SQL database query language.
All explicit assignments of a semantic term in RDF or OWL or their semantic derivatives can be used as a query basis in SPARQL. Thus, SPARQL is the sine qua non option for obtaining information from an ontology or knowledge graph. SPARQL is the most flexible and responsive way to manipulate a semantically structured information store.
Let’s inspect the general components of a SPARQL query specification:
This figure is from Lee Feigenbaum’s SPARQL slides, included with other useful links under the Additional Documentation below.
Note that every SPARQL query gets directed to a specific endpoint, where access to the underlying RDF datastore takes place. These endpoints can be either local or accessed via the Web, with both examples shown below. In a standalone situation, the endpoint location is indicated by the FROM
keyword. In our examples using RDFLib via Owlready2, these locations are set to a Python object.
Extended Startup
Let’s start again with the start-up script we used in the last installment, only now also opening rdflib
and relating its namespace graph
to the world
namespace of KBpedia.
#
) out.= 'C:/1-PythonProjects/kbpedia/sandbox/kbpedia_reference_concepts.owl'
main # main = 'https://raw.githubusercontent.com/Cognonto/CWPK/master/sandbox/builds/ontologies/kbpedia_reference_concepts.owl'
= 'http://www.w3.org/2004/02/skos/core'
skos_file = 'C:/1-PythonProjects/kbpedia/sandbox/kko.owl'
kko_file # kko_file = 'https://raw.githubusercontent.com/Cognonto/CWPK/master/sandbox/builds/ontologies/kko.owl'
from owlready2 import *
= World()
world = world.get_ontology(main).load()
kb = kb.get_namespace('http://kbpedia.org/kko/rc/')
rc
= world.get_ontology(skos_file).load()
skos
kb.imported_ontologies.append(skos)
= world.get_ontology(kko_file).load()
kko
kb.imported_ontologies.append(kko)
import rdflib
= world.as_rdflib_graph() graph
We could have put the import
statement for the RDFLib package at the top, but anywhere prior to formatting the query is fine.
We now may manipulate the knowledge graph as we would in a standard way using (in this case) the namespace world
for owlready2 and access all of the additional functionality available via RDFLib using the (in this case) the graph
namespace. This is a great example of the Python ecosystem at work.
Further, because of even greater integration, there are some native commands in Owlready2 that have been mapped to RDFLib making the syntax and conventions in working with both libraries easier.
Basic SPARQL Forms
In the last installment we presented two wrinkles for how to express your SPARQL queries to your local datastore. This form I noted looked closer to a standard SPARQL expression shown in Figure 1:
= list(graph.query_owlready("""
form_1 PREFIX rc: <http://kbpedia.org/kko/rc/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT DISTINCT ?x ?label
WHERE
{
?x rdfs:subClassOf rc:Mammal.
?x skos:prefLabel ?label.
}
"""))
print(form_1)
[[rc.AbominableSnowman, 'abominable snowman'], [rc.Afroinsectiphilia, 'Afroinsectiphilia'], [rc.Eutheria, 'placental mammal'], [rc.Marsupial, 'pouched mammal'], [rc.Australosphenida, 'Australosphenida'], [rc.Bigfoot, 'Sasquatch'], [rc.Monotreme, 'monotreme'], [rc.Vampire, 'vampire'], [rc.Werewolf, 'werewolf']]
* Owlready2 * Warning: ignoring cyclic subclass of/subproperty of, involving:
http://kbpedia.org/kko/rc/Person
http://kbpedia.org/kko/rc/HomoSapiens
The query above has a warning message we can ignore and lists all of the direct sub-classes to Mammal in KBpedia.
The last installment also offered a second form, which is the one I will be using hereafter. I am doing so because this form, and its further abstraction, is a more repeatable approach. In general, this advantage is because we can take this format and abstract it into a ‘wrapper’ that encapsulates the method of making the SPARQL call separate, abstracted from the actual SPARQL specification. We will increasingly touch on these topics, but for now this is the format we will take:
= """
form_2 PREFIX rc: <http://kbpedia.org/kko/rc/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT DISTINCT ?x ?label
WHERE
{
?x rdfs:subClassOf rc:Mammal.
?x skos:prefLabel ?label.
}
"""
= list(graph.query_owlready(form_2))
results print(results)
[[rc.AbominableSnowman, 'abominable snowman'], [rc.Afroinsectiphilia, 'Afroinsectiphilia'], [rc.Eutheria, 'placental mammal'], [rc.Marsupial, 'pouched mammal'], [rc.Australosphenida, 'Australosphenida'], [rc.Bigfoot, 'Sasquatch'], [rc.Monotreme, 'monotreme'], [rc.Vampire, 'vampire'], [rc.Werewolf, 'werewolf']]
These two examples cover how to access the local datastore.
External SPARQL Examples
We really like what we have seen with the SPARQL querying of the internal data store using RDFLib within Owlready2. But what of querying outside sources. (And, would it not be cool to be able to mix-and-match internal and external stuff?)
As we try to use RDFLib as is against external SPARQL endpoints we quickly see that we are not adequately identifying and talking with these sites. Well, we have been here before, but the nature of stuff with Python and packages and dependencies and such often requires another capability.
Some quick poking turns up that we are lacking a HTTP-aware ‘wrapper’ to external sites. We turn up a promising package in sparqlwrapper
. We discover it is on conda-forge
so we back out the system, and at the command line add the package:
$ conda install sparqlwrapper
We again get the feedback to the screen as the Anaconda configuration manager does its thing. When finally installed and the prompt returns, we again load up Jupyter Notebook and return to this notebook page.
We are now ready to try our first external example, this time to Wikidata, after we import SPARQLwrapper
and set our endpoint target to Wikidata (https://query.wikidata.org/sparql
):
from SPARQLWrapper import SPARQLWrapper, JSON
from rdflib import Graph
= SPARQLWrapper("https://query.wikidata.org/sparql")
sparql
"""
sparql.setQuery( PREFIX schema: <http://schema.org/>
SELECT ?item ?itemLabel ?wikilink ?itemDescription ?subClass ?subClassLabel WHERE {
VALUES ?item { wd:Q25297630
wd:Q537127
wd:Q16831714
wd:Q24398318
wd:Q11755880
wd:Q681337
}
?item wdt:P910 ?subClass.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
""")
sparql.setReturnFormat(JSON)= sparql.query().convert()
results print(results)
{'head': {'vars': ['item', 'itemLabel', 'wikilink', 'itemDescription', 'subClass', 'subClassLabel']}, 'results': {'bindings': [{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q537127'}, 'subClass': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q8667674'}, 'itemLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'road bridge'}, 'itemDescription': {'xml:lang': 'en', 'type': 'literal', 'value': 'bridge that carries road traffic'}, 'subClassLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'Category:Road bridges'}}, {'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q11755880'}, 'subClass': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q8656043'}, 'itemLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'residential building'}, 'itemDescription': {'xml:lang': 'en', 'type': 'literal', 'value': 'building mainly used for residential purposes'}, 'subClassLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'Category:Residential buildings'}}, {'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q16831714'}, 'subClass': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q6259373'}, 'itemLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'government building'}, 'itemDescription': {'xml:lang': 'en', 'type': 'literal', 'value': 'building built for and by the government, such as a town hall'}, 'subClassLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'Category:Government buildings'}}, {'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q24398318'}, 'subClass': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q5655238'}, 'itemLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'religious building'}, 'itemDescription': {'xml:lang': 'en', 'type': 'literal', 'value': 'building intended for religious worship or other activities related to a religion; ceremonial structures that are related to or concerned with religion'}, 'subClassLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'Category:Religious buildings and structures'}}, {'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q25297630'}, 'subClass': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q7344076'}, 'itemLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'international bridge'}, 'itemDescription': {'xml:lang': 'en', 'type': 'literal', 'value': 'bridge built across a geopolitical boundary'}, 'subClassLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'Category:International bridges'}}]}}
Great! It works, and our first information retrieval from an external site!
Let me point out a couple of things about this format. First, the endpoint already has some built-in prefixes (wd:
and wdt:
) so we did not need to declare them in the query header. Second, there are some unique query capabilities of the Wikidata site noted by the SERVICE
designation.
When first querying a new site it is perhaps best to stick to vanilla forms of SPARQL, but as one learns more it is possible to tailor queries more specifically. We also see that our setup will allow us to take advantage of what each endpoint gives us.
So, let’s take another example, this one using the DBpedia endpoint, to show how formats may also differ from endpoint to endpoint:
from SPARQLWrapper import SPARQLWrapper, RDFXML
from rdflib import Graph
= SPARQLWrapper("http://dbpedia.org/sparql")
sparql
"""
sparql.setQuery( PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX schema: <http://schema.org/>
CONSTRUCT {
?lang a schema:Language ;
schema:alternateName ?iso6391Code .
}
WHERE {
?lang a dbo:Language ;
dbo:iso6391Code ?iso6391Code .
FILTER (STRLEN(?iso6391Code)=2) # to filter out non-valid values
}
""")
sparql.setReturnFormat(RDFXML)= sparql.query().convert()
results print(results.serialize(format='xml'))
Notice again how the structure of our query code is pretty patterned. We also see in the two examples how we can specify different query results serializations (JSON
and RDFXML
in these examples) for our results sets.
Additional Documentation
The idea of a SPARQL tutorial is outside of the defined scope of this CWPK series. But, the power of SPARQL is substantial and it is well worth the time to learn more about this flexible language, that reminds one of SQL in many ways, but has its own charms and powers. Here are some great starting links about SPARQL:
- Lee Feigenbaum’s SPARQL by Example: The Cheat Sheet
- SPARQL in 11 minutes video by Bob DuCharme
- Learning SPARQL by Bob DuCharme
- sparqlwarpper documentation
- Wikidata SPARQL query examples.
*.ipynb
file. It may take a bit of time for the interactive option to load.