Sometimes These Releases Get Complicated
Well, I just completed a five-part article series on major changes to KBpedia that I have been writing over the past few months. Sometimes releases with their version increment numbers seem pretty artificial and don’t always reflect the real changes that were underfoot. Such is this case.
I am pleased to today release version 1.50 of KBpedia. Virtually no changes have occurred in this version with respect to size or scope in comparison to the last release, v 1.40 in February 2017. Rather, this current release is more of a story of consolidation and re-organization for what was already there. Still, these re-organizations feel like they have been pretty substantial, and what is being released today is the cleanest version ever. And, oh, by the way, KBpedia now has a complete predicate organizational schema. So, let’s look at some of these changes.
The Predicates Addition
The background to the five-part series on relations in KBpedia makes the point that most knowledge graphs focus on nouns and little attention has been given to properties or relations, especially as a classification of signs with key relevance to knowledge representation (KR). Actions, through exertions or perceptions, drive events that create cognition, categorizations and new knowledge. Yet we have a relatively poor KR vocabulary for handling relations of all types, be they attributes, external relations, or representations. Since actions drive the real changes in the world, understanding them and their relationships, plus a more rigorous means for identifying and extracting them, should also lead to better fact and relation extraction from unstructured data (text). This is essential for completing the integration of unstructured data with structured data.
The idea of categorizing predicates is not common in the knowledge representation space, but the writings and Logic of Relatives of Charles Sanders Peirce [1], among many of his other writings, help provide guidance for how to think about such matters. That is what we have been doing over the past three years of thinking specifically about properties (OWL sense) or predicates.
Theoretical ideas resulting from reading and study needed to be subjected to real data sources and their attributes as test beds for the theory. In these cases, theory always gives way to facts, so actual data representations, a key benefit from Wikidata, brings practical guidance to theoretical constructs.
Throughout, Peirce’s Logic of Relatives and other writings, particularly his three universal categories, proved invaluable for discerning and deciding edge cases. Broad categorization is relatively easy. The head-scratching always occurs at the interfaces, the margins, the transitions from one understood category to another. Yet this is also the area where the most insight and understanding occurs.
My own current choices have taken some years to gestate, and they may likely change still again. What I understand Peirce’s methodologies to be are not the limiting factor; rather, it is understanding the nature and attributes of whatever object is under scrutiny. It seems there is always something more to learn about anything.
The focus on verbs v nouns also transported me to better understand the nature of the event, action-reaction model. This process also helped bring understanding that events are particulars along with entities, which combined represent all of the real things in the present. (Particulars are a Secondness in terms of Peirce’s universal categories.)
Follow-ons from the Predicates Addition
The organization of relations into attributes (A:A), external relations (A:B) and representations (re:A) has resulted in the addition of about 66 properties to KBpedia, now expressed in this version 1.50. These properties, in turn, have been mapped to about 2500 Wikidata properties, representing more than 90 percent of the property occurrences within that knowledge base. Via one or more properties, this mapping now extends KBpedia’s coverage to about 30 million entities. Future efforts will extend this property coverage to some of the other major KBpedia knowledge bases, including the DBpedia ontology, schema. org and GeoNames. Look for these mappings in future releases.
The addition of these predicates also resulted in some fairly significant updates to the upper structure of KBpedia via the KBpedia Knowledge Ontology, or KKO. We not only added properties to KBpedia, but classed and categorized the predicates into the KKO node structure. This parallel treatment in both properties and classes is one classic technique for being able to reason over predicates [2]. More than 10% of the KKO knowledge graph was changed in version 1.50 to accommodate these changes.
Other Notable Changes
In the process of making these changes we noticed another flaw in the KBpedia knowledge graph, largely the result from earlier inheritances from OpenCyc. Namely, the existing subsumption structure often made direct subClassOf
assertions to grandparents or greater. For example, a wasp may be a form of insect, which is a form of arthropod, which in turn is a form of animal. Yet, rather than let inference handle these connections, the original subsumption links might have assigned wasp directly as a sub-class of animal. Though this assertion is correct, it is confusing to mix lower level classes (such as wasps) with higher level ones such as birds, reptiles or mammals, which are more directly sub-classes of animals. We found and cleaned up about 8,000 mixed subsumption assignments in the earlier KBpedia. This clean-up leads to a much easier understood and streamlined hierarchical structure. We will continue to clean such unneeded assignments as they are discovered.
Another change was to add a further 2200 definitions to the existing entries. KBpedia still has an issue of missing definitions, with about one-quarter of the structure still lacking them. But, again, we made a 14% improvement in the coverage of definitions and altLabels
in this most recent version. We are committed to working through and completing these assignments.
Where possible, we also added missing mappings to Wikipedia and Wikidata. About 76% of KBpedia now has mappings to Wikipedia. We are committed to raise this coverage to the theoretical limit of about 90%.
In the nearly six months since the last version 1.40 release, tens of thousands of changes have been made to KBpedia. We estimate the entire structure has been re-built from scratch more than 100 times in the interim, each time testing for logic and inconsistencies. The net result is a pretty clean structure from top to bottom, including refinements to all of the existing 80 or so typologies in the system, especially the 30 “core” ones. We believe the overall structure to be much cleaner and more readily understood than prior versions.
Besides these specific changes, we also decided to dedicate KBpedia to its own Web site. This independent identity is in keeping with our desire to establish KBpedia on its own, separate from our company Cognonto as the sponsor. We anticipate further changes along these lines for subsequent releases.
To Learn More
There is much documentation and an active knowledge graph on the KBpedia site. You can also run a demo showing how KBpedia information can inform a relatively simple tagger. The entire upper structure for KBpedia, KKO, is also available for download and inspection. I particularly recommend the separate demo version of KKO, which labels the major nodes according to the Peircean universal categories of Firstness, Secondness and Thirdness. Please note this separate demo version is for learning purposes only, and is not actually used in the online knowledge graph.
The Warmest of Notes About Fred
Another aspect of the changes to KBpedia over the past few months has been the unfortunate end to my business partnership with Fred Giasson. Fred and I have worked directly and constantly with one another for nearly the past decade. While we will continue to be partners in our open source efforts, our formal business relationship has come to an end. My work decade with Fred has been one of the most rewarding of my career. I have had the tremendous, great fortune to work with some of the best and most renowned developers of my time. Fred belongs in that pantheon, if not at the top of it. He is one of the most thoughtful, innovative and disciplined computer scientists of my experience.
Fred and I are three decades apart in age, and also have different native tongues. Fred now has two children and family needs that demand better stability and benefits and consistent income than our business years exhibited. With my own senior years closing in, I personally also want to do more writing and public service. Such are the natural tensions of life that cause highly successful partnerships to move in their own directions. I already miss my daily interactions with Fred. But I am happy to report he is in a stable position with great job satisfaction and prospects. I could not be happier for him and his family. I also know we will be working together for many years to come on our shared passions.
As part of this transition, I now own and run most of the test and build scripts that Fred developed during our joint business tenure. As a non-developer, it is a testament to Fred’s skills that it has been relatively straightforward for me to adopt and embrace his scripts. It is funny. We had worked together for years, but it is only now that I truly appreciate his unique skills and fantastic practices in creating code useful and maintainable by others. Kudos, my friend.
Such kinds of changes often engender other thoughts and changes. I will be sharing some of these with you in articles to come. For today, however, I am most thankful for being able to release version 1.50 of KBpedia. And, to say thanks and pay honor to a computer scientist of the first rank, Fred Giasson.