Intro to an Ongoing Series of More than 70 Recipes to Work with KBpedia
We decided to open source the KBpedia knowledge graph in October 2018. KBpedia is a unique knowledge system that intertwines seven ‘core’ public knowledge bases — Wikipedia, Wikidata, schema.org, DBpedia, GeoNames, OpenCyc, and standard UNSPSC products and services. KBpedia’s explicit purpose is to provide a computable scaffolding and design for data interoperability and knowledge-based artificial intelligence (KBAI).
Written primarily in OWL 2, KBpedia includes more than 58,000 reference concepts, mapped linkages to about 40 million entities (most from Wikidata), and 5,000 relations and properties, all organized according to about 70 modular typologies. KBpedia’s upper structure is the KBpedia Knowledge Ontology. KKO weaves the major concepts from these seven core knowledge bases into an integrated whole based on the universal categories and knowledge representation insights of the great 19th century American logician, polymath and scientist, Charles Sanders Peirce.
We have continued to expand KBpedia’s scope and refine its design since first released. Though the entire structure has been available for download through a number of version releases, it is fair to say that only experienced semantic technologists have known how to install and utilize these files to their fullest. Further, we have an innovative ‘build-from-scratch’ design in KBpedia that has not yet been shared. Our objective in this ongoing series is to publish daily ‘recipes’ over a period of about four months on how the general public may learn to use the system and also to build it. With that knowledge, it should be easier to modify KBpedia for other specific purposes.
The mindset we have adopted to undertake this series is that of a focused, needful ‘newbie.’ The individual we have in mind may not know programming and may not know ontologies, but is willing to learn enough about these matters in order to move forward with productive work using KBpedia (or derivatives of it). Perhaps our newbie knows some machine learning, but has not been able to bring multiple approaches and tools together using a consistent text- and structure-rich resource. Perhaps our newbie is a knowledge manager or worker desirous of expanding their professional horizons. This focus leads us to the very beginning of deciding what resources to learn; these early decisions are some of the hardest and most impactful for whether ultimate aims are met. Those with more experience may skip these first installments, but may find some value in a quick scan nonetheless.
The first installments in our series begin with those initial decisions, move on to tools useful throughout the process, and frame how to load and begin understanding the baseline resources of the KBpedia open-source distribution. We then discuss standard knowledge management tasks that may be applied to the resources. One truism about keeping a knowledge system relevant and dynamic is to make sure the effort put into it continues to deliver value. We then begin to conduct work with the system in useful areas that grow in complexity from intelligent retrieval, to entity and relation extractions, natural language understanding, and machine learning. The intermediate part of our series deals with how to build KBpedia from scratch, how to test it logically, and how to modify it for your own purposes. Our estimate going in is that we will offer about 75 installments in this series, to conclude before US Thanksgiving. Aside from a few skipped days on holidays and such, we will post a new installment every business day between now and then.
The ‘P‘ in CWPK comes from using the Python language. In our next installment, we discuss why we chose it for this series over KBpedia’s original Clojure language roots. Because Python is a new language for us, throughout this series we document relevant aspects of learning that language as well. Adding new language aspects to the mix is consistent with our mindset to embrace the newbie. Even if one already knows a programming language, extracting maximum advantage from a knowledge graph well populated with entities and text is likely new. As we go through the process of Python earning its role in the CWPK acronym, we will take some side paths and find interesting applications or fun wrinkles to throw into the mix.
We will also be bringing this series to you via the perspective of our existing systems: Windows 10 on desktops and laptops, and Linux Ubuntu on Amazon Web Services (AWS) cloud servers. These are broadly representative systems. Unfortunately, our guidance and series will have less direct applicability to Apple or other Linux implementations.
Look for the next installment tomorrow. As we put out an installment per business day over the next four months, we’ll learn much together through this process. Please let me know how it is going or what you would like to learn. Let the journey begin . . . .