Let’s Recap Some Useful Python Guidance
All installments in this Cooking with Python and KBpedia series have been prepared and written in order, except for this one. I began collecting tips about how best to use the cowpoke package and Python about the time this installment occurred in sequence in the series. I have accumulated use tips up through CWPK #60, and now am reaching back to complete this narrative.
Since we are principally working through the interactive medium of Jupyter Notebook for the rest of this CWPK series, I begin by addressing some Notebook use tips. Most of the suggestions, however, deal with using the Python language directly. I have split that section up into numerous sub-topics.
An interesting aspect of using owlready2 as our API to OWL is its design decision to align classes within RDF and OWL to the class concept in Python. My intuition (and the results as we proceed) tells me that was the correct design decision, since it affords using Python directly against the API. However, it does impose the price of needing to have data types expressed in the right form at the right time. That means one of our Python tips is how to move owlready2 class objects into strings for manipulation and then the reverse. It is these kinds of lessons that we have assembled in this installment.
General Tips
I began these CWPK efforts with a local system and my local file directories. However, as I began to release code and reference ontologies, my efforts inexorably shifted to the GitHub environment. (One can also use Bitbucket if the need is to keep information proprietary.) This focus was further reinforced as we moved our code into the cloud, as discussed in the latter installments. Were I to start a new initiative from scratch, I would recommend starting with a GitHub-like focus first, and use Git to move code and data from there to all local and other remote or cloud environments.
Notebook Tips
In just a short period of time, I have become an enthusiast about the Notebook environment. I like the idea of easily opening a new ‘cell’ and being able to insert code that executes or to provide nicely formatted narrative the explains what we are doing. I have also come to like the Markdown markup language. I have been writing markup languages going back to SGML, XML, and now HTML and wikitext. I have found Markdown the most intuitive and fastest to use. So, I encourage you to enjoy your Notebook!
Here are some other tips I can offer about using Notebooks:
- Keep narrative (Markdown) cells relatively short, since when you run the cell the cursor places at bottom of cell and long narrative cells require too much scrolling
- Do not need to keep importing modules at the top of a cell if they have been imported before. However, you can lose notebook state. In which case, you need to Run all of the cells and in order to get back to the current state
- When working through the development of new routines, remember to run Kernel → Restart & Clear Output. You will again need to progress through the cells to return to the proper state, but without clearing after an error you can get a run failure just because of residue interim states. To get to any meaningful state with KBpedia, one needs at least to invoke these resources:
= 'C:/1-PythonProjects/owlready2/kg/kbpedia_reference_concepts.owl'
main = 'http://www.w3.org/2004/02/skos/core'
skos_file = 'C:/1-PythonProjects/owlready2/kg/kko.owl'
kko_file
from owlready2 import *
= World()
world = world.get_ontology(main).load()
kb = kb.get_namespace('http://kbpedia.org/kko/rc/')
rc
= world.get_ontology(skos_file).load()
skos
kb.imported_ontologies.append(skos)
= world.get_ontology(kko_file).load()
kko kb.imported_ontologies.append(kko)
- When using a cell in Markdown mode for narratives, it is sometimes useful to be able to add HTML code directly. A nice Jupyter Notebook WYSIWYG assistant is:
- https://github.com/genepattern/jupyter-wysiwyg; install via:
However, after a period of time I reversed that position, since I found using the assistant caused all of the cell code to be converted to HTML vs Markdown. It is actually easier to use Markdown for simple HTML formatting
conda install -c genepattern jupyter-wysiwyg
- https://github.com/genepattern/jupyter-wysiwyg; install via:
- I tend to keep only one or two Notebook pages active at a time (closing out by first File → Save and Checkpoint, and then File → Close and Halt), because not properly closing a Notebook page means it is shown as open next you open the Notebook
- When working with notebook files, running certain cells that cause long lists to be generated or large data arrays to be analyzed can cause the notebook file when saved to grow into a very large size. To keep notebook file sizes manageable, invoke Cell → Current Output → Clear on the offending cells
- When starting a new installment, I tend to first set up the environment by loading all of the proper knowledge bases in the environment, then I am able to start working on new routines.
Python Tips
We have some clusters of discussion areas below, but first, here are some general and largely unconnected observations of working with Python:
- A file name like
exercise_1.py
is better than the nameexercise-1.py
, since hyphens are disfavored in Python - When in trouble, be aggressive using Web search. There is tremendous online Python assistance
- When routines do not work, make aggressive us of
print
statements, including a label or recall of a variable to place the error in context (alsologging
, but that is addressed much later in the series) - Also use counters to make sure items are progressing properly through loops, which is more important when loops are nested
- Take advantage of the Notebook interactive environment by first entering and getting code snippets to work, then build up to more formal function definitions
- When debugging or trying to isolate issues, comment out working code blocks to speed execution and narrow the range of inspection
- Watch out for proper indenting on loops
- Stay with the most recent/used versions of Python. It is not a student’s job to account for the legacy of a language. If earlier version compatibilty is needed, there are experienced programmers from that era and you will be better able to recognize the nuances in your modern implementation
- I think I like the dictionary (‘
dict
‘) data structure within Python the best of all. Reportedly Python itself depends heavily on this construct, but I have founddict
to be useful (though have not tested the accompanying performance betterment claims) - Try to always begin your routines with the ‘preliminaries’ of first defining variables, setting counters or lists to empty, etc.
Anatomy of a Statement
A general Python statement tends to have a form similar to:
world.search(iri = "*luggage*", _case_sensitive = False)
The so-called ‘dot’ notation shows the hierarchy of namespaces and attributes. In this instance, ‘world’ is a namespace, and ‘search’ is a function. In other instances it might be ‘class’ ‘property’ or other hierarchical relationships.
An item that remains confusing to me is when to use namespace prefixes, and when not. I think as I state a couple of times throughout this CWPK series, how ‘namespace’ is implemented in Python is not intuitive to me, and has likely not yet been explained to me properly.
The arguments for the function appear within the parentheses. When first set up, many functions have ‘default’ arguments, and will be assigned if not specifically stated otherwise. There are some set formats for referring to these parameters; one Web resource is particularly helpful in deciphering them. You may also want to learn about the drawbacks of defaults. Generally, as a first choice, you can test a function with empty parentheses and then decompose from there when it does not work or work as you think it should.
The dir
and type
statements can help elucidate what these internal parameters are:
dir(world.search)
['__call__',
'__class__',
'__delattr__',
'__dir__',
'__doc__',
'__eq__',
'__format__',
'__func__',
'__ge__',
'__get__',
'__getattribute__',
'__gt__',
'__hash__',
'__init__',
'__init_subclass__',
'__le__',
'__lt__',
'__ne__',
'__new__',
'__reduce__',
'__reduce_ex__',
'__repr__',
'__self__',
'__setattr__',
'__sizeof__',
'__str__',
'__subclasshook__']
or type
:
type(world.search)
method
Directories and files
Any legal directory or file name is accepted by Python. For Windows, there is often automatic conversion of URI slashes. But non-Linux systems should investigate the specific compatibilities their operating systems have for Python. The differences and nuances are small, but can be a source of frustration if overlooked.
Here are some tips about directories and files:
- Don’t put a
"""comment"""
in the middle of a dictionary listing - A
"""comment"""
header is best practice for a function likely to be used multiple times - When reading code, the real action tends to occur at the end of a listing, meaning that understanding the code is often easier working bottom up, as references higher up in the code are more often preliminary or condition setting
- Similarly, many routines tend to build from the inside out. At the core are the key processing and conversion steps. Prior to that is set-up, after that is staging output
- Follow best practices for directory set ups in Python packages (see CWPK #37).
Modules and libraries
A module name must be a valid Python name, limited to letters, digits and ‘_’s.
Modules are the largest construct in Python programs. Python programs consist of multiple module files, either included in the base package, or imported into the current program. Each module has its own container of variables. Variable names may be duplicated across modules, but are distinguished and prevented from name clashes by invoking them with the object (see earlier discussion about the ‘dot notation’). You can also assign imported variables to local ones to keep the dot notation to a minimum and to promote easier to read code.
Namespaces
Python is an object-oriented language, wherein each object in the system has a name identifier. To prevent name conflicts, Python has a namespace construct wherein any grouping of existing object names may be linked to a namespace. The only constraint, within Python’s naming conventions, is that two objects may not share the same name within a given namespace. They may share names between namespaces, but not within.
The namespace construct is both assigned automatically based on certain Python activities, and may also be directly set by assignment. Import events or import artifacts, like knowledge graphs or portions thereto, are natural targets for this convenient namespace convention.
When items are declared and how they are declared informs the basis of a namespace for a given item. If it is only a variable declared in a local context, it has that limited scope. But the variable may be declared in different ways or with different specified scope, in a progression that goes from local to enclosed to global and then built-in. This progression is known as the LEGB scope (see next).
All of this is logical and seemingly straightforward, but what is required by Python in a given context is dependent on just that: context. Sometimes it is difficult to know within a Python program or routine exactly where one is with regard to the LEGB scope. In some cases, prefixes are necessary to cross scope boundaries; in other cases, they are not. About the best rule of thumb I have been able to derive for my own experience is to be aware the the ‘dot notation’ hierarchies in my program objects, and if I have difficulties getting a specific value, to add or reduce more scope definitions in the ‘dot notation’.
To gain a feel for namespace scope, I encourage you to test and run these examples: https://www.programiz.com/python-programming/namespace.
Namespaces may relate to Python concepts like classes, functions, inner functions, variables, exceptions, comprehensions, built-in functions, standard data structures, knowledge bases, and knowledge graphs.
To understand the Python objects within your knowledge graph namespace, you can Run the following cell, which will bring up a listing of objects in each of the imported files via its associated namespace:
= 'C:/1-PythonProjects/owlready2/kg/kbpedia_reference_concepts.owl'
main = 'http://www.w3.org/2004/02/skos/core'
skos_file = 'C:/1-PythonProjects/owlready2/kg/kko.owl'
kko_file print('startup()-> ', dir())
from owlready2 import *
print('owlready()-> ', dir(), '\n')
= World()
world = world.get_ontology(main).load()
kb = kb.get_namespace('http://kbpedia.org/kko/rc/')
rc print('main_import()-> ', dir(), '\n')
= world.get_ontology(skos_file).load()
skos
kb.imported_ontologies.append(skos)print('skos_import()-> ', dir(), '\n')
= world.get_ontology(kko_file).load()
kko
kb.imported_ontologies.append(kko)print('kko_import()-> ', dir(), '\n')
You’ll see that each of the major namespaces (sometimes ontologies) list out their internal objects as imported. You may pick any of these objects, and then inspect its attributes:
dir(DataProperty)
You can always return to this page to get a global sense of what is in the knowledge graph. Similarly, you may import the cowpoke package (to be defined soon!) or any other Python package and inspect its code contents in the same manner. So, depending on how you lay out your namespaces, you may readily segregate code from knowledge graph from knowledge base, or whatever distinctions make sense for your circumstance.
LEGB Rule
The scope of a name or variable depends on the place in your code where you create that variable. The Python scope concept is generally presented using a rule known as the LEGB rule. The letters in the acronym LEGB stand for Local, Enclosing, Global, and Built-in scopes. A variable is evaluated in sequence in order across LEGB, and its scope depends on the context in which it was initially declared. A variable does not apply beyond the scope in which it was defined. One typical mistake, for example, is to declare a local variable and then assume it applies outside of its local scope. Another typical mistake is to declare a local variable that has the same name as one in a broader context, and then to wonder why it does not operate as declared when in a broader scope.
Probably the safest approach to the LEGB scope is to be aware of variables used in the core Python functions (‘built-ins’) or those in imported modules (the ‘global’ ones) and to avoid them in new declarations. Then, be cognizant that what you declare in local routines only apply there, unless you take explicit steps (through declaration mechanisms) or the use of namespace and dot notation to make your intentions clear.
Setting Proper Data Type
Within owlready2, classes and properties are defined and treated as Python classes. Thus, when you retrieve an item or want to manipulate an item, the item needs to be specified as a proper Python class to the system. However, in moving from format to format or doing various conformance checks, the representation may come into the system as a string or list object. Knowing what representation the inputs are compared with the desired outputs is critical for certain activities in cowpoke. So, let’s look at the canoncial means of shifting data types when dealing with listings of KBpedia classes.
From Python Class to String
Much of the staging of extractions is manipulating labels as strings after retrieving the objects as classes from the system. There is a simple iterator that allows us to obtain sets of classes, loop over them, and convert each item to a string in the process:
new_str_items = []
for item in loop:
a_item = item.curResource # Retrieves curResource property for item
a_item = str(a_item) # Converts item to string
new_str_items.append(a_item) # Adds to new string list
If you have nested items within loops, you can pick them up using the enumerate
in the loop specification.
From Sting to Python Class
The reverse form has us specifying a string and a namespace, from which we obtain the class data type:
for item in loop:
var1 = getattr(rc, str_item) # Need to remove prefix and get from proper namespace (RC)
var2 = getattr(rc, str_parent) # May need to do so for parent or item for property
var1.is_a.append(var2)
The general challenge in this form is to make sure that items and parents are in the form of strings without namespaces, and that the proper namespace is referenced when retrieving the actual attribute value. Many code examples throughout show examples of how to test for trap multiple namespaces.
Additional Documentation
I have not been comprehensive nor have found a ‘great” Python book in relation to my needs and skills. I will likely acquire more, but here are three more-or-less general purpose Python introductions that have not met my needs:
- Python Crash Course – the index is lightweight and not generally useful; too much space devoted to their games examples; seems to lack basic program techniques
- Python Cookbook – wow, I hardly found anything of direct relevance or assistance
- Learning Python – perhaps the best of the three, but my verson is for Python 2 (2.6) and also lacks the intermediate, practical hands on I want (did not upgrade to later version because of the scope issues).
It actually seems like online resources, plus directed Web searches when key questions arise, can overcome this lack of a general intro resource per the above.
Another useful source are the RealPython video tutorials, with generally the first introductory one in each area being free, has notable ones on:
- classes
- variables
- lists and tuples
- dictionaries
- functions
- inner functions (some nice code examples)
- built-in functions
- exceptions
- comprehensions
- modules and packages
- LEGB scope (see this for namespace exploration examples)
- data types
- reading and writing files, including CSV.
*.ipynb
file. It may take a bit of time for the interactive option to load.