Building a rule-based knowledge system with Protégé-OWL and JessTab.

Liesbeth Flobbe
2005

Introduction

The project

The goal is to create a small application that combines ontologies with rule-based reasoning, and to report on the possibilities and usefulness of such an integration.

I chose to create a system that checks if a configuration of computer parts meets certain constraints. The domain knowledge of such a constraint checking system can be reused for a configuration application (creating the configurations based on certain requirements) or for troubleshooting (determining how a bad configuration can be fixed). I chose this task because it is relatively simple. I did not want to do a classification task, because so much classification can be carried out be OWL reasoners that I wasn't sure I would need a rule-based reasoner for it.

The domain of computer configurations has been chosen because it is very compositional. A hierarchical ontology seems very appropriate for such a domain.

The building blocks

Protégé-OWL

Protégé is a tool to develop ontologies. The Protégé-OWL plugin the creation of ontologies in the OWL language, which is based on description logic and has more features than the standard frame-based format that Protégé uses. I used Protégé 3.1.1 for this project.

Jess

Jess is a rule engine written in Java. It carries out forward chaining, rule-based reasoning. I used Jess 7.03b for this project.

JessTab

JessTab is a Protégé-plugin that allows the integration of Jess and Protégé. It maps Protégé-objects to Jess facts and allows Jess to be run from within a Protégé tab. I used JessTab 1.4.

Not used: OWL reasoners

I should mention an important tool that was not used: OWL reasoners. OWL reasoners can perform classification tasks on OWL ontologies.

The domain

The original idea was to check configurations for complete computer systems. However, I had to restrict myself to a few core components: processors, motherboards, and memory modules. Information about specific components was gathered from the Norrod-site, and then cross-checked and complemented using the BalusC-server. I will give some short introduction about the important constraints in the domain.

The core component of a computer, to which all other components are connected, is the motherboard. The motherboard provides a collection of sockets and slots in which other components can be inserted. It also provides the wiring so that everything can communicate with each other. It contains a few chips (the chipset) that regulate this communication. It is ultimately the chipset that determines what components can be used and at what speed these components can communicate. By using certain sockets and slots on the motherboard, additional restrictions are created. For example: some of the chipsets I modelled supported both DDR and DDR2 memory, but none of the motherboards using these chipsets provided any slots in which memory modules of the second type would fit.

Current processors (CPU's) are manufactured by two manufacturers: Intel and AMD. Processors are connected to the motherboard through a socket. The socket used by the motherboard and processor must be identical, or they will not physically fit together. Even if the sockets match, new processors can have features that are not supported by the motherboard (or actually the chipset).

Traditionally, communication with the memory is handled by the chipset. However, for modern AMD processors, the memory controller has been intregrated with the processor. Properties that used to be properties of the chipset are now properties of the processor. The socket 939 family of processors requires that memory modules be placed as matched pairs: you cannot use only one module, you must use a multiple of two. For Intel processors, the motherboard determines whether or not this dualchannel memory is required - it is independent of the processor used.

How to use the demo

I did not create a stand-alone application. I did not provide for any input or output mechanisms, except that which was standard in the tools I had.

The Protégé-database is stored in the computerparts.pprj and computerparts.owl files. The rules for JessTab are stored in a separate text file, jesstab-rules.txt. I was unable to store the rules in the knowledge base itself. According to the documentation, it should be possible, but I think it doesn't work because of the difference between the native Protégé format and the OWL format. Once Protégé has opened the knowledge base, the rules should be copy-pasted into the prompt at the bottom of the Jess window. To check if it worked, open the Facts- and Rules-tab and see if a list of entries has appeared there.

To perform constraint checking, type (run) at the Jess prompt. Constraint checking will be performed on all (direct) instances of the Configuration class. If no output appears, this means that the configuration is OK. If you type (run) a second time, nothing will happen unless rules or instances have been changed. Use (reset) to restore Jess to its original state first.

Data representation in Jess

Domain knowledge can be divided in rules and facts. The rules react to additions, deletions, and changes to facts. Jess has two kinds of facts: ordered and unordered.

Ordered facts

An ordered fact is simply a list. The first field acts as a sort of category for the fact. For example, the fact (cpumult 9.0) could be used to represent the fact that the CPU multiplier has a value of 9.0.

Rules will fire if the facts on their left hand side can be matched. This is the case if the facts are all true, or if all the variables in the rule can be assigned such that all the facts are true. An example of a rule would be:

(defrule cpufreq		
	(cpumult ?x)			;; the CPU multiplier is x
	(cpufsb ?y)			;; and the front side bus is y
	=>
	(assert (cpufreq (* ?x ?y)))	;; assert the new fact that the cpu frequency is x * y

In this example, I use the first field to name a property, and the next field to give the value of this property. This resembles predicate logic. Another possibility would be to to group various properties into one fact:

(cpu 939 200 11.0)

where the second field is the socket type, the third field the front side bus frequency, and the last field the multiplier. However, if you do this, you must be very careful to use the correct field for each property. This is unpractical for larger number of properties.

Unordered facts

Unordered facts resemble objects in object-oriented languages. Objects have named fields in which data appears. An example of an unordered fact is be:

(Processor (socket 775) (fsb 200) (mult 14.0))

Ordered facts are defined in templates. A template defines all the fields and the kind of data they can contain. Templates can extend other templates. For example, I could create an IntelP4-template as an extension of the Processor-template, with an extra field supports_hyperthreading. Rules that use the Processor-template can not use this extra field, but rules that use the IntelP4-template can. However, if a IntelP4 'fact' (aka object) is asserted, rules that use the Processor-template can still match this fact. This means that inheritance, a basic feature of hierarchical ontologies, is supported.

The rule for calculating CPU frequency with unordered facts would be:

(defrule
	(Processor (fsb ?x) (mult ?y))
	=>
	(assert (cpufreq (* ?x ?y)))	

Facts imported by JessTab

Facts that are imported from Protégé by JessTab have a slightly different structure than in the unordered example above. The easiest way to see what happens is to go to JessTab in Protege, type (mapclass ComputerPart), and then look at the Facts tab. We see that the head of the facts is not the class name (Intel_P4, Motherboard etc.) but always object. The class name has been stored in a field called is-a. JessTab has also added a field OBJECT which will be explained below. The properties defined in Protégé follow as normal fields with the same name.

The rule for calculating the cpu frequency would be:

(defrule
	(object (is-a Processor)
		(fsb ?x)
		(mult ?y))
	=> 
	(assert (cpufreq (* ?x ?y)))	

Unfortunately, there is a problem. This rule will only match instances that are defined directly in the class Processor, not instances that are defined in subclasses such as Intel_P4 and AMD_K8_K9. To match these instances, we must use an explicit check in the rule:

(defrule
	(object (is-a ?class&:(classorsuperclassp Processor ?class))
		(fsb ?x)
		(mult ?y))
	=> 
	(assert (cpufreq (* ?x ?y)))	

This rule will bind the class of the object to the variable ?class and then immediately check if the Processor class is identical to it or a superclass of it.

The OBJECT field

You may have wondered which processor we were trying to match in the previous rules. All processors that we know of, or only the processor that is selected for use in the system we want to check? The rules shown so far are very clumsy: they fire for each and every processor, but the fact they assert is overwritten every time the rule fires.

The OBJECT field provides a unique reference to each object. This makes it possible to select specific objects. The OBJECT field corresponds to OWL's rdf:if field. The value can be used for object properties (properties whose range is a class, as opposed to datatype properties whose range is a datatype).

In Protégé, I created a Configuration class, which contains computer configurations. To select the specific processor used in a Configuration, the following rule fragment can be used:

	(object (is-a Configuration)
		(hasProcessor ?proc))
	(object (OBJECT ?proc))
	... 
	=> 
	...

The field hasProcessor in the class Configuration is on object property. The value of the object property (the reference) is bound to the ?proc variable, and then the object with this ID is selected. In my actual rules, this mechanism is also just to distinguish different configurations from each other.

Design decisions

I started my design with using only primitive classes. This means that instances are explicitly placed in a class (and not implicitely, based on the values of their properties). I later found out that defined classes wouldn't have worked (see next section).

I created two 'top' classes (just below Thing): ComputerPart and Configuration. Processors, Motherboards, Chipsets, and Memory Modules, are all subclasses of ComputerPart. Using a Chipset class is not strictly necessary: each motherboard has only one chipset, so all the properties of the chipset could be modelled as properties of the motherboard. However, several motherboards can have the same chipset, so when using chipsets, you only have to enter this data once. Therefore, I used a separate Chipset class, and gave Motherboards a object property hasChipset. I needed to make sure that the user could not directly choose chipsets as part of a Configuration, so I divided my computer parts into buyable parts (processors, motherboards, memory modules), and non-buyable parts.

Processors were further subdivided into AMD_K8_K9 processors and Intel_P4 processors. This is because the K8/K9 families of processors have their own memory controller, and no other processors do. This subclass therefore needs certain extra properties not present in the general Processor class. For Chipsets, the same subdivision was necessary, because again, chipsets for K8/K9 processors have different properties than other chipsets.

The Configuration class contains Configurations that are checked by Jess. Each configuration has three object properties: a processor, a motherboard, and a memory module. It also has one datatype property: the number of memory modules. This format is very restricted: if a user wants to use several different memory modules, this is not possible. However, it makes it easy to check the configuration.

The ontology contains a lot of properties that aren't used. For example: no motherboard supports DDR2 memory, so I could have omitted all properties that deal with DDR2 memory. All motherboard for Intel processors I found support hyperthreading, so the properties and the rule dealing with hyperthreading are unnecessary. Certainly, the application would have been a lot more interesting if some cutting edge new components or some older components had been included. Anyway, the design should still be usable if such addition are made, so I didn't delete these superfluous properties.

Testing OWL features

Defined classes

I created a class Socket939Processor, with the necessary and sufficient conditions of being a processor and having the value "socket 939" for the socket property. Of course, nothing happened to my instances. It takes on OWL reasoner to put the appropriate instances in this class. I then manually placed an instance in this class (in addition to its primary, AMD_K8 class). I remapped the Processor, but unfortunately, nothing changed in Jess. The instance still had the same (is-a AMD_K8) field, and no new fields to represent its membership of Socket939Processor. There wasn't a new fact with (is-a Socket939Processor) either (but two facts for the same instance wouldn't be good, anyway). I couldn't find any functions that return all the classes of in instance in the manual. So I conclude that:

Object hierarchy

For my next test, I made the properties hasMotherboard, hasMemoryModule, and hasProcessor all subproperties of a new property hasComputerPart. I remapped the Configuration class. In Jess, the new Configuration instances had both their old fields and a new field hasComputerPart, containing references two all three parts. I also verified that I could write rules that accessed those parts (but it wasn't easy - one value per field is a lot more convenient). So:

It should be noted that this feature is not unique to OWL, but already part of the standard Protégé environment.

Functional properties

Functional properties are not really relevant to Jess. I made most of my properties functional (a motherboard has only one chipset, a computer configuration has only one motherboard. In OWL, when a relation between object A and B is marked as functional, you are allowed to assume that no other set of objects A - something satisfies this relation. Under rule-based knowledge systems, the assumption is simply that the relation A - something is not true because - if the OWL knowledge base was consistens - it can not be found or proven. This is the closed world assumption.

Inverse, symmetric and transitive properties

I created a property partOf as the inverse of hasComputerPart. All ComputerPart now have this new property, but Protégé doesn't automatically fill in the values. Again, I suppose it is up to an OWL reasoner to do this. Jess can't do anything with property values it can't see. However, it is possible in Jess to iterate over all configurations and give this property its appropriate values. I conclude:

I think the same is true for Symmetric and Transitive properties: if Protégé doesn't fill in appropriate values for these properties, Jess doesn't see them. But Jess can be used to write functions which do precisely that: instantiate these properties. Of course, so can OWL reasoners.

In conclusion, it seems to me that Jess does not support any features that are specific to OWL, but that its general support for Protégé knowlegde bases is still useful, even for OWL projects.

Conclusion

How does the current project compare to a project carried out in Jess only? The use of an ontology has split the domain knowledge in two parts: the facts and the rules. I noticed that as a result, I found it more important to make the ontology self-explanatory and usable by someone who didn't know the rules. The ontology could be reused for an application that uses diffents rules - for example rules for creating or optimising a configuration. Reusability is of course one of the goals of OWL. Although for a single small project, a pure Jess approach might be easier, once more shared ontologies are available, it can be very useful to build a knowledge application on top of it.

I was however a bit disappointed to see that inheritance is easier to use in a pure Jess application than it was when mapping an ontology through JessTab. Also, it is important to realise that many classification tasks, traditionally the domain of rule-based knowledge systems, are probably better carried out through OWL reasoners. It would be interesting to see what could be accomplished if OWL reasoning and rule-based reasoning would be integrated into a single tool.

From the documentation of JessTab, I get the impression that JessTab was not really designed to build expert systems on top of ontologies, but more for managing and maintaining ontologies (for example, by initialising inverse and symmetric properties).

An interesting approach with the same goals as this project is the Semantic Web Rule Language development. The proposal describes a way to represent and store rules - similar to those found in rules-bases systems - in an OWL knowledge base itself. Unfortunately, the proposal is only a representation language - not an implementation that actually reasons with it. However, once these implementations are developed, an OWL+SWRL approach would be very appropriate for a project like this.

It should be noted that rule-based knowledge systems operate under a closed world assumption, while the OWL standard assumes an open world. I'm curious how future SWRL reasoners will handle open world reasoning.