owlready2 Appears to be a Capable Option
In the CWPK #2 and #4 installments to this Cooking with Python and KBpedia series, we noted that we would reach a decision point when we needed to determine how we will manipulate our knowledge graphs (ontologies) using the Python language. We have now reached that point. Our basic Python environment is set (at least in an initial specification) and we need to begin inputting and accessing KBpedia to develop and test our needed management and build functions.
In our own efforts over the past five years or more, we have used the Java OWL API initially developed by the University of Manchester. The OWL API is an integral part of the Protégé IDE (see CWPK #5) and supports OWL2. The API is actively maintained. We have been very pleased with the API’s performance and stability in our earlier KBpedia (and other ontology) efforts. In our own Clojure-based work we have used a wrapper around the OWL API. A wrapper using Python is certainly a viable (perhaps even best) approach to our current project.
We still may return to this approach for reasons of performance or capabilities, but I decided to first explore a more direct approach using a Python language option. This decision is in keeping with this series’ Python education objectives. I prefer for these lessons to use a consistent Python style and naming conventions, rather than those in Java. I was also curious to evaluate and test what presently exists in the marketplace. We may gain some advantages from a more direct approach; we may also discover some gotchas or deadends that initial due diligence missed. We can always return to Plan B with a wrapper around the existing OWL API.
If we do need to revert and take the wrapper approach, the leading candidate for the wrapper is py4j. Initial research suggests other Python bridges to Java such as Jython or JPype are less efficient and less popular than py4j. pyJNIus had a similar objective to py4j but has seen no development activity for 4-6 years. The ROBOT tool for biomedical ontologies points the way to how Python can link through py4j. Even if our Python-based approach herein works great, we still may want to embrace py4j as we move forward given the wealth of ontology-related applications written in Java. But I digress.
There is no acclaimed direct competitor to the OWL API in Python, though there are pieces that may approximate its capabilities. Frankly, I was surprised after beginning my due diligence with the relative dearth of Python tools for working with OWL. Many of the Python projects that do or did exist harken back years. There was a bulge of tool-making in the mid-2000s using Python that has since cooled substantially, with two notable exceptions I discuss below.
One of those exceptions is RDFLib, a Python library for working with RDF. RDFLib provides a useful set of parsers and serializers and a plug-in architecture, but directly lacks OWL 2 support. FuXi was an OWL reasoner based on RDFLib that used a subset of OWL, but is now abandoned. SuRF is an object-RDF mapper based on RDFlib that enables manipulations of RDF triples, but is somewhat dated. rdftools had a similar objective to RDFLib, but has been abandoned from about 5-7 yrs ago. owlib is a 5-yr old API to OWL built using RDFLib to simplify working with OWL constructs; it has not been updated and is inactive. More currently, infixowl is a RDFLib Python binding for the OWL abstract syntax, which makes it more like the wrapper alternative. Though not immediately applicable to our OWL needs, we may later embrace RDFLib for parsers and serializers or as a useful library for the typologies in KBpedia.
Then there are a number of tools independent of RDFLib. SETH was an attempt at a Python OWL API that still required the JVM from about a dozen years back, and is now largely abandoned (though available via CVS repository). funowl is a pythonic API that follows the OWL functional model for constructing OWL and it provides a py4j or equivalent wrapper to the standard Java OWL libraries. It appears to be active and is worth keeping an eye on. The ontobio Python module is a library for working with ontologies and associations to outside entities, though it is not an ontology manager.
Fortunately, the second exception is owlready2, a module for ontology-oriented programming in Python 3, including an optimized RDF quadstore. A number of things impressed me about owlready2 in my due diligence. First, its functionality fit the bill for what I wanted to see in an ontology manager dealing with all CRUD (create-read-update-delete) aspects of an ontology and its components. Second, I liked the intent and philosophy behind the system as expressed in its original academic paper and home Web site (see Additional Documentation below). Third, the project is being actively maintained with many releases over the past two years. Fourth, the documentation level was comparatively high for an open-source project and clearly written and understandable. And, last, there is an existing extension to owlready2 that adds support for RDFLib, should we also decide to add that route.
One concern arising from my diligence is the lack of direct Notation3 (N3) file support in owlready2, since all of KBpedia’s current ontology files are in N3. According to owlready2’s developer, Jean-Baptiste Lamy, N-Triples, which are a subset of N3, are presently supported by owlready2. We can test and see if our N3 constructs load or not. If they do not, we can save out our ontology files in RDF/XML, which owlready2 does support. (Indeed, use of the RDF/XML format has proven to be the better approach.) Alternatively, we can do file conversions with RDFLib or the Java OWL API. File format conversions and compatibility will be a constant theme in our work, and this potential hurdle is not unlike others we may face.
Thus, while the pickings were surprisingly thin for off-the-shelf OWL tools in Python, owlready2 appears to have the requisite functionality and currentness and to be a reasonable initial choice. Should this choice prove frustrating, we will likely fall back onto the py4j wrapper to the OWL API or funowl.
So, now with the choice made, it is time to set up our directory structure and install owlready2.
Here is our standard main directory structure with the owlready2 additions noted:
|-- PythonProject
|-- Python
|-- [Anaconda3 distribution]
|-- Notebooks
|-- CWPKNotebook
|-- owlready2 # place it at top level of project
|-- kg # for knowledge graphs (kgs) and ontologies
|-- scripts # for related Python scripts
|-- TBA
After making these changes on disk, it is time to install owlready2, which is easy:
conda install -c conda-forge owlready2
You will see the reports to the screen terminal as we noted before, and you will need to agree to proceed. Assuming no errors are encountered, you will be returned to the command window prompt. You can then invoke ‘Jupyter Notebook
‘ again.
Finding and Opening Files
Let’s begin working with owlready2
by loading and reading an ontology/knowledge graph file. Let’s start with the smallest of our KBpedia ontology files, kko.owl
(per the instructions above this is the kko.n3
file converted to RDF/XML in Protégé). (You may download this converted file from here.) I will also assume you stored this file under the owlready2/kg
directory noted above.
As you begin to work with files in Python on Windows, here are some initial considerations:
- In Windows, a full file directory path starts with a drive letter (
C:, D:
. etc.). In Linux and OS-X, it starts with “/
“ - Python lets you use OS-X/Linux style slashes “
/
” in Windows. Recommended is to use a format such as ‘C:/Main/FirstDirectory/second-directory/my-file.txt
‘ - Relative addressing is allowed, with the current directory understood to be the one where you started your interpreter (Jupyter Notebook in our case). However, that is generally not best practice. Python embraces the concept of Current Working Directory. CWD is the folder your Python is operating from, which might vary by application, such as Jupyter Notebook. The CWD is the
'root
‘ for your current session. What this means is that relative file addresses can be tricky to use. You are best off using the absolute reference to all of your files.
When you work with online file documents, you will need to use different Python commands and conventions, as the examples below show. We will offer more explanation on this specific option when the code below is presented.
Here are some general references that can explain files and paths further:
- https://www.pitt.edu/~naraehan/python2/file_path_cwd.html
- https://docs.python.org/3/library/pathlib.html
- https://realpython.com/working-with-files-in-python/#pythons-with-open-as-pattern
- https://automatetheboringstuff.com/chapter8/.
To find what your CWD is for your current session:
import os
dir(os)
Note there are a couple of things going on in this snippet. First, we have imported the Python built-in module called ‘os
‘. Not all commands are brought into memory when you first invoke Python. In this case, we are invoking (or ‘importing’) the os
module.
Second, we have invoked the dir
command to get a listing of the various functions within the os
module. So, go ahead and shift+enter
this cell or Run it from the Jupyter Notebook menu to see what os
contains.
We can invoke other functions with a similar syntax. Another option besides dir
is to get help
on the current module:
help(os)
Note these same dir
and help
commands can be applied to any (module)
active in the system.
This next example shows another function in os
called ‘walk
‘. We invoke this function by calling the combined module and function notation using the dot (.) syntax (‘os.walk
‘). We will add a couple more statements to get our directory listing to display (‘print()
‘) the directory file names to screen:
for dirpath, dirnames, files in os.walk('.'):
print(f'Found directory: {dirpath}')
for file_name in files:
print(file_name)
One of the first things you will learn about Python is that there are often multiple modules, and modules within external libraries, that may be invoked for a given task. It takes time to discover and learn these options, but that is also one of the fun parts of the language.
Our next example shows just this, using a new package, pathlib
, useful for local files, that has some great path management functions. (This library will be one of our stalwarts moving forward.)
Remember we can import
functions from add-ons beyond the Python built-ins. We do so via modules again using the import statement, but we now need to identify the library (or ‘package’) where that module resides. We do so via the ‘from
‘ statement. Remember, external libraries need to be downloaded and registered via Anaconda (conda
or conda-forge
) prior to use if they are not already installed on your system. (Recall that our installed packages are at C:\1-PythonProjects\Python\pkgs
based on my own configuration.
In this next example we are using the home
command within the Path
module in the pathlib
package. The home
command tells us where the ‘root
‘ is for our current notebook:
from pathlib import Path
= Path.home()
home print(home)
C:\Users\Michael
Windows is a tricky environment for handling file names, since the native operating system (OS) requires back-clashes (‘\
‘) rather than forward-slashes (‘/
‘) and also requires the drive designation for absolute paths. We also have the issue of relative paths, which because of CWD (common working directory) can get confused in Python (or rather, in our use of Python).
One habit is to adopt the convention of declaring your file targets as a variable (say, ‘path
‘), make sure the reference is good, and then refer to the ‘path
‘ object in the rest of the code to prevent confusion. One code approach to this, including a print of the referenced file is:
= r'C:\1-PythonProjects\owlready2\kg\kko.owl' # see (A)
path # path = 'https://raw.githubusercontent.com/Cognonto/CWPK/master/sandbox/builds/ontologies/kko.owl' # see (A)
with open(path) as fobj: # see (B)
for line in fobj:
print (line, end='')
Note, this example may not work unless you are using local files.
We get the absolute file name (A) on Windows by going to its location within Windows Explorer, highlighting our desired file in the right panel, and then right-clicking on the path listing shown above the pane and choosing ‘Copy address as text’; that is the information placed between the quotes on (A). Note also the ‘r
‘ switch on this line (A) (no space after ‘r
‘!), which means ‘raw’ and enables the Windows backslashes to be interpreted properly. Go ahead and shift+enter
this file and see the listing (which is also useful to surface any encoding issues, which will appear at the end of the file listing should they exist).
Now, the example above is for local files. If you are using the system via MyBinder, we need to load and view our files from online. Here is a different format for accessing such information:
import urllib.request
= 'https://raw.githubusercontent.com/Cognonto/CWPK/master/sandbox/builds/ontologies/kko.owl'
path for line in urllib.request.urlopen(path):
print(line.decode('utf-8'), end='')
A couple of items for this format deserve comment. First, we need to import a new package, urllib
, that carries with it the functions and commands necessary for accessing URLs. There are multiple options available in Python for doing so. This particular one presents, IMO, one of the better formats for viewing text files. Second, we declare the UTF-8 encoding, a constant requirement and theme through the rest of this CWPK series. And, third, we add the attribute option of end=''
in our print
statement to eliminate the extra lines in the printout that occur without it. Python functions often have many similar options or switches available.
In any case, the above gives us the basis to load the upper ontology of KBpedia called KKO. We now turn to how we begin to manage our knowledge graphs.
Import an Ontology
So, let’s load our first ontology into owlready2
applying some of these concepts:
from owlready2 import *
# the local file option
# onto = get_ontology(path).load()
# the remote file (URL) option
= get_ontology(path).load() onto
Inspect Ontology Contents
We do not get a confirmation that the file loaded OK, the object name of which is onto
, except no error messages appeared (which is good!). Just to test if everything proceeded OK, let’s ask the system to return (print
to screen) a known class from our kko.owl ontology called ‘Generals
‘:.
print(onto.Generals)
Can apply to all of the ontology components (in this case the class, ‘Generals’).
We can also list all of the classes in the ontology:
list(onto.classes())
list(onto.disjoint_classes())
Armed with these basics we can begin to manipulate the components in our knowleldge graph, the topic for our next installment.
Additional Documentation
Here is additional documentation on owlready2:
- The original academic paper
- owlready2 documentation
- owlready2 PyPl project
- owlready2 conda-forge project
- Source files
- owlready2 .load() function
- Mailing list.
*.ipynb
file. It may take a bit of time for the interactive option to load.
The working link to owlfeady2 sources is https://bitbucket.org/jibalamy/owlready2/src/master/
Thx; link fixed.
Mike
Are you sure that Owlready 2 is indeed a capable interface? I tried loading an Ontology file I had built in Protege, and it failed that first test. See https://github.com/pwin/owlready2/issues/13
Furthermore, when I looked into the code, I got a bad code smell almost right away: there is no use of preexisting libraries to read ontologies, it’s all hand-written for Owlready. There is a strong “not invented here” issue in this code base.
What would be best is probably some python interface to OWLAPI, but I have no idea what it would take to build a good one. Fighting through the Java source and whatever javadocs is not an appealing alternative.
I stand by my assertion that owlready2 is the best available Python option.
Thanks, Mike