This Standard Ontology Editor/IDE is an Essential Part of Your Toolkit
Though there are commercial alternatives, one essential part of your starting toolkit to work with ontologies (a term we use interchangeably with knowledge graph, though not all researchers do) is the Protégé editor. Protégé is an open-source ontology development framework (IDE) with more than 370,000 users. Protégé comes in two versions: one for the desktop, now in version 5.x, and one that is Web-based. We will be working with the desktop version for the Cooking with Python and KBpedia series.[1][2]
If you already have Protégé installed and are pretty comfortable with it, you may skip this installment. Otherwise, let’s spend about 15-30 min of effort so that you can set up your own local environment to work with KBpedia.
You first need to download and install Protégé. Go to the Protégé download page and follow the instructions for your particular operating system. You should fill out the new user registration (though you can claim you are already registered and still download it directly). The version I installed for this example is version 5.50 (though any of the version 5.2 forward should be fine as well.) The Protégé distribution comes as a zip file, so you should unzip it into a directory of your choice. To complete the set-up you will also need the most recent version of Java installed on your machine; it you do not have it, here are installation instructions.
Next, to start up Protégé, invoke the executable in your Protégé directory. It will take a few seconds for the program to load. Once the main screen appears, go to File and then Open from URL, and then pick, say, http://protege.stanford.edu/ontologies/camera.owl, as shown by (1):
We’ll get into KBpedia in earnest in the next installment, but if you want an early peek, you could also enter either https://github.com/Cognonto/kbpedia/blob/master/versions/2.50/kko.n3 (KBpedia upper ontology) or https://github.com/Cognonto/kbpedia/blob/master/versions/2.50/kbpedia_reference_concepts.zip (the full KBpedia, which you will need to unzip in a Web-accessible location and update this URL) into the dialog box in Figure 1. (Note: you may need to update the version reference to a later version depending on when you read this.) You will note that the next screen shots use the ‘full’ KBpedia example.
Upon entry, you will see the Protégé main screen as shown in Figure 2. Let me briefly cover some of the main conventions of the program. The three key structural aspects of the Protégé program are its main menu, its tab structure, and the views (or panes) shown for each tab where it appears on the standard interface (5). At start-up we always begin at the Active ontology tab, for which I highlight some of its key panes and functionality:
The ontology header section (1) is where all of the metadata for the knowledge graph resides. Such material includes title, creators, version notes and so forth. The metrics for the ontology resides in the second view (2). In this case, for example, this version of KBpedia has about 58,000 classes (reference concepts) and more than 5,000 properties. We also see in the third view (3) that KBpedia requires the SKOS and KKO ontology imports. Also note the search button (4), which we will use frequently, and the tab structure and order (5). We will modify that structure in later installments.
Because Protégé, like many integrated development environments (IDEs), is highly configurable, let’s detour for a short step to see how we can modify how our program looks. I am going to delete and add tabs to make the tab structure conform to the remaining screen shots.
To change tabs in Protégé, let’s refer to Figure 3:
We effect the general layout of the system using the Window → Tabs option from the main menu. You delete a tab by clicking on the arrow shown for each tab as presented in the standard interface. You add tabs by selecting one of the options in the Tabs menu (2). Note that active tabs are indicated by the checkmark ( ✓ ). New tabs are added to the right of the tab sequence (3). Thus, to change the ordering of tabs, one must delete and then add tabs in the order desired. You can follow these steps if you want the tab ordering to reflect the screen shots below. This same main menu Window option is where you can change the views (panes) for each tab.
When these class tabs are to your liking, we can apply these same conventions and approaches to the properties (relations) for the knowledge graph, as I show in Figure 4. First, note (1) we have split our properties into three groups: object properties, data properties, and annotation properties:
These are the standard splits in the OWL language. How we use these splits and their relation to the guidance of Charles Sanders Peirce is described in later installments. In essence, object properties are those that connect to an item (with a URI or IRI) already in the system; data properties are literal strings and descriptions connected to the subject item; and annotation properties are those that describe or point to the item. We’ll just use an object property example here, though the use and navigation applies to the other two property categories as well.
The Object properties tab in Figure 4 also has a search function (2), exactly similar to what was described for classes. We also see a tree structure at the left that works the same as for classes (3). As before, you can use a combination of scrolling, tree expansions and searching to discover the other properties in your knowledge graph. Do make sure and check out the Data properties and Annotation properties tabs as well.
Throughout this CWPK series we will be using examples from Protégé and comparing them to direct interaction with the code base using Python. These later installments will cover most of the standard use and maintenance cases you will likely encounter with your knowledge graphs.
A Note on Performance and Preferences
You may experience some performance issues with Protégé as it comes out of the box, especially as we begin working with the relative large KBpedia in earnest. One likely cause are the memory settings that you may find in the run.bat
file that you can find in the main directory where you installed Protégé. As a quick fix, try updating these settings in that file to these values before the next time you start the application:
-Xmx2500M -Xms2000M
Also note there are many customization options in Protégé. If you get captivated with the tool, I encourage you to explore the plugins available and the ways to modify the application interface. See especially File → Preferences, with the Renderer and Plugin tabs good places to look. Again, we will touch on some of these aspects in later articles.
Some Suggested Protégé Resources
- Protégé 5 Introductory Documentation
- Protégé 5 Documentation
- Pizza Tutorial (Protégé 4) (full listing)
- Pizzas in 10 Minutes
- Protégé Plug-ins
- Protégé mailing list.
The section on an early peek of KBpedia has the following URL: (https://github.com/Cognonto/kbpedia/blob/master/versions/2.50/kko.n3)
When I tried this URL Protégé produced errors. I then tried the following and was able to successfully load into Protégé:
https://raw.githubusercontent.com/Cognonto/kbpedia/master/versions/2.50/kko.n3