DIG 2.0 proposal for the standard extension: Non-standard Inferences

This version:: http://lat.inf.tu-dresden.de/~turhan/NSI.html
Authors:: Anni-Yasmin Turhan, Dresden University of Technology; Yusri Bong, Dresden University of Technology

Abstract

this document is a proposal for a standard extension and an interface for non-standard inferences for DIG 2.0. Currently, this proposal is under construction.

Introduction
Requirements
The supported NSIs
The NSI XML schema extension
- A. Ask Statements
- B. Response Statements
NSI Examples
Open Issues

Non-standard inferences (NSI) are more and more recognised as useful means to realise new kinds of applications. The number of applications that use non-standard inferences in their framework is growing. To implement these kinds of applications it is desirable to make use of existing implementations of NSIs. In order to have easy access to the implementations of the NSIs, an extension of the DIG interface to NSIs is desirable. There are different NSIs investigated and developed by different research groups. This extension proposal focuses on the NSIs studied at the TU Dresden. However, the extensions for NSIs are not restricted to this set of inferences. For an overview of the non-standard inferences covered here please see [NSIs]. So far all of the here proposed NSIs are the ones implemented in the system SONIC. We try to propose a general scheme here for NSIs, but at this stage the proposal is still quite tailored to this system.

The terminologies acceptable as input for most of the NSIs covered here are currently restricted to unfoldable (i.e. acyclic and with unique definitions) terminologies and to a set of concept constructors that is in most cases restricted to the set of sub-boolean operators plus (unqualified) number restrictions.

Requirements

We have a couple of technical requirements for the environment in which our NSI extension operates. The NSI component works as a DIG server for the application and as a DIG client, since tests for satisfiability and subsumption are needed during the computation of the NSIs as well as access to told information to retrieve the concept definitions of concept names. To avoid duplicate storing and classifying of the KB and to ensure that the NSI component works on the identical KB as the application, the NSI component must use the same instance of the DL reasoner as the application. This way it is guaranteed that the NSI component always works on the current version of the KB even in case the application has re-submitted an edited version of the KB.

In its current implementation the SONIC NSI component is a stand-alone application that is invoked by the application and connects to the same instance of the DIG DL reasoner as the application. It gets the port number etc. from the application. For the forthcoming version of DIG our NSI component also needs a connection to a component that implements the told information access interface as specified by the proposal for accessing told data. We assume here that one connection suffices to access both, the told information component and the core DIG reasoner, which can either be realised by a core DIG reasoner supporting the told information access interface or by the middle-ware reference implementation of the DIG group.

Besides the information for the connection the NSI component needs the URI of the knowledge base that is loaded into the reasoner. Thus the URI of the KB must be transmitted as an identifier.

In contrast to standard reasoning tasks, most NSIs return complex concept descriptions and not only Boolean values. Furthermore the NSIs take into account the "target DL" for which they are applied. For example concept approximation "translates" concept descriptions from one DL L₁ to another DL L₂. Thus the NSI extension schema must provide means to communicate the respective DL.

To sum it up, our NSI component needs the following information

in each ask:
- Port number under which a told information access component and the standard DL reasoner runs.
- Machine name where the told information access component and the standard DL reasoner runs; default is local host.
- URI of the knowledge base as ID of the KB in use.
in some queries:
- the target DL for the NSI.

The supported NSIs

The operators offered by our NSI extension are enlisted and explained in the following. First we give an abstract pseudo-LISP style syntax to illustrate the use of them. These inference services will be offered as ask statements in the XML schema. The whole NSI extension schema can be found here.

Approximation
Intuitively, the approximation of a concept description written in a DL L₁ is a translation into a typically less expressive DL L₂ with a minimal loss of information, for example computing ALEN-approximations of ALCN-concept descriptions. The following ask statement gets the approximation in the target DL w.r.t. the current TBox:
```
(approximation <concept description>, <target DL>)
```
it returns a concept description in the target DL.
Least Common Subsumer (LCS)
The LCS of a set of concept descriptions is a concept description that subsumes all input concepts and is the least one w.r.t. subsumption to do so. The most expressive DL for which the LCS inference is currently available is ALEN. The following ask statement queries for the LCS w.r.t. the current TBox:
```
(lcs <concept description>, <concept description>⁺, <target DL>)
```
it returns a concept description.
Good Common Subsumer (GCS)
The GCS of a set of concept descriptions is a concept description that subsumes all input concepts, but does not need to be the least one w.r.t. subsumption. Furthermore, the implementation performs the computation w.r.t. a background terminology. The approach supports a (usually expressive) background terminology that is extended by the (usually not so expressive) user terminology. The most expressive DL for which the GCS is available is ALE user terminology w.r.t. an ALC background terminology.
The following ask statement queries for the GCS w.r.t. the current TBox:
```
(gcs <concept description>, <concept description>⁺, <target DL>)
```
it returns a concept description.
Find Matching Concepts
This operation has one parameter: a concept pattern. Roughly speaking, a concept pattern is a concept description where variables can appear at the places where concept names can appear in usual concept descriptions. The variables that appear in the concept patterns must be distinguished from concept names in the DIG standard by an extra tag.
```
(Find_Matching_Concepts <concept pattern>)
```
Returns a set of concept names of concepts defined in the knowledge base which can be matched with the concept pattern, i.e., the variables in the concept pattern can be substituted by concept descriptions such that the resulting concept description is equivalent to the concepts in the returned set.

Find Matcher
This operation has two parameters: a concept description and a concept pattern. Matching of a concept description C and a concept pattern D answers the question whether and how the variables in the concept pattern D can be assigned to concept descriptions s.t. a concept description is obtained that is equivalent w.r.t. the current TBox to the concept description C .
```
(Find_Matcher <concept description>, <concept pattern>)
```
It returns a (possibly empty) set of assignments, i.e., pairs of the variables and concept descriptions. Each assignment yields a concept description that is equivalent w.r.t. the TBox to the input concept description, if substituted in the concept pattern.
Minimal Rewriting
The minimal rewriting of a concept description C₁ w.r.t. a TBox is a concept description C₂, which is equivalent to C1 and of minimal size. So, intuitively, a minimal rewriting returns a more compact form of a concept description. In our implementation we realised a heuristic for minimal rewriting that returns a small, but not necessary the smallest rewriting for ALE-concept descriptions.
```
(minimal_Rewriting <concept description>)
```
It returns a smaller concept description equivalent to the input concept description.

However, this list is surely not complete and this proposal is open to be extended by other NSIs such as, for example, explanation.

The XML schema for the NSI extension

Our NSI extension schema mainly focuses on the two following aspects:

Ask statements for the above mentioned inferences. Here the use of variables in the concept patterns and the use of the target DL are most notable extensions
Response statements for the queries. Since the core DIG schema does not yet supply a response schema, our proposal is the first in that direction and, of course, under construction.

The most queries require concept descriptions as input or produce concept descriptions as output. These are to be specified according to the DIG core schema. The whole XML schema for the NSI extension can be found here.

A. Ask Statements

The ask statements use parts of the DIG core schema and provides the group nsiQueries with elements that are likely to be useful for other NSIs or inference services that are not yet incorporated in DIG or one of its extensions. The root of NSI requests is the tag <asks>. Thus all NSI queries are children of this element. A request may contain several queries. As stated in the Requirements the NSI reasoner needs some information about other components. This information is provided by the following attributes of the ask element:

port - The port number where told information access component (and the DIG 2.0 standard reasoner) is listening.
machineName - The host name/IP of the machine that is running the told information access component (and the DIG 2.0 standard reasoner).
KB URI - The name/identifier of the knowledge base (TBox).

It is necessary to pass this information in each NSI request, since the DIG protocol is stateless. Each request contains at least one query. The schema for a NSI request is the following:

<xsd:element name="asks">
    <xsd:complexType>
        <xsd:sequence>
            <xsd:group ref="nsiQueries" minOccurs="1" maxOccurs="unbounded"/>
        </xsd:sequence>
        <xsd:attribute name="machineName" type="xsd:string" use="optional" default="localhost"/>
        <xsd:attribute name="port" type="xsd:int" use="required"/>
        <xsd:attribute name="kb" type="xsd:string" use="required"/>
    </xsd:complexType>
</xsd:element>

The subelement of <asks> is an element of the group "nsiQueries". The elements of nsiQueries are elements that represent "ask" query types for the NSI services. Although currently limited to the ones available in SONIC, the idea is to specify query types that might be useful for other NSIs or even other services. Before turning to the schema, we would like to explain the idea behind the new types and their naming:

basicTargetDLClassAsk: Covers kinds of queries that have a concept description (class) and a target DL as input.
findAssignmentsAsk: Covers kinds of queries that return an assignment. In our case an assignment is a set of variable name pairs. — Might be interesting to use this in the forthcoming conjunctive query proposal. In contrast to the DIG naming scheme, this is name points to the return value instead of the input value.
findAssignableConceptsAsk: Covers all kinds of queries that return a variable assignment to concepts. In contrast to the DIG naming scheme, this is name points to the return value instead of the input value.
basicTargetDLClassSequenceAsk: Covers kinds of queries that have a sequence (rather a set) of concept descriptions and a target DL as input.

These types are used in the following way in the schema:

<xsd:group name="nsiQueries">   
   <xsd:choice>
   <xsd:element name="approximation" type="basicTargetDLClassAsk"/>
        <xsd:element name="findMatcher" type="findAssignmentsAsk"/>
        <xsd:element name="findMatchingConcepts" type="findAssignableConceptsAsk"/>
        <xsd:element name="lcs" type="basicTargetDLClassSequenceAsk"/>
        <xsd:element name="gcs" type="basicTargetDLClassSequenceAsk"/>
        <xsd:element name="minimalRewriting" type="ask:basicClassAsk"/>
    </xsd:choice>
</xsd:group>

Next, we will see the type for each NSI service that is provided by our extension. Note, in the following we use notions from these "base schemas":

lang: is the namespace for http://dl.kr.org/dig/lang/schema
ask: is the namespace for http://dl.kr.org/dig/lang/asks-schema

In fact all tags extend the basicClassAsk from the core DIG 2.0 ask schema.

Since most NSIs need a target DL as input, the queries must specify these. The target DL language should be specified in string format in the attribute targetDL. We propose to chose this string according the DL community's naming convention for DLs — such as "EL" or "ALEN". An alternative would be to specify the DL in a separate schema that is then used as a URI in the NSI extension. However, if the core DIG 2.0 schema will provide a standard for naming DLs, then we will use that standard in our extension. In addition, as in the "standard" ask in the core DIG, each query has an attribute id for identification in the response.

A.1 Ask statement for Approximation

The subelement of this tag is the concept description for which the approximation has to be computed and the target DL in the attribute target.

<xsd:complexType name="basicTargetDLClassAsk">
    <xsd:complexContent>
        <xsd:extension base="ask:basicClassAsk">
            <xsd:attribute name="targetDL" type="xsd:string" use="required"/>
        </xsd:extension>
    </xsd:complexContent>
</xsd:complexType>

A.2 Ask statements for LCS and GCS

These two queries have the same type: basicTargetDLClassSequenceAsk. The subelement of this tag are (at least two) concept descriptions. The user must also state the target DL for the LCS or GCS in the attribute target.

<xsd:complexType name="basicTargetDLClassSequenceAsk">
    <xsd:complexContent>
        <xsd:extension base="ask:basicAsk">
         <xsd:sequence>
            <xsd:group ref="lang:description" minOccurs="2" maxOccurs="unbounded"/>
        </xsd:sequence>
        <xsd:attribute name="targetDL" type="xsd:string" use="required"/>
        </xsd:extension>
    </xsd:complexContent>
</xsd:complexType>

A.3 Ask statement for Find Matcher.

The subelements of this tag are a concept description followed by a concept pattern.

<xsd:complexType name="findAssignmentsAsk">
    <xsd:complexContent>
        <xsd:extension base="ask:basicAsk">
            <xsd:sequence>
                <xsd:group ref="lang:description"/>
                <xsd:group ref="conceptPattern"/>
            </xsd:sequence>
        </xsd:extension>
    </xsd:complexContent>
</xsd:complexType>

A.4 Ask statement for Find Matching Concepts

The subelement of this tag is a concept pattern.

<xsd:complexType name="findAssignableConceptsAsk">
    <xsd:complexContent>
        <xsd:extension base="ask:basicAsk">
            <xsd:group ref="conceptPattern"/>
        </xsd:extension>
    </xsd:complexContent>
</xsd:complexType>

A.5 Ask statement for Minimal Rewriting

The only subelement of this tag is the concept description that is going to be rewritten, thus we can use "basicClasAsk" from the ask schema and don't need an extension for this service.

B. Response Statements

?: Tag nsiResponse for grouping responses?
Since the core DIG does not yet have a response schema this proposal is quite preliminary and will evolve as the DIG response schema evolves.
Every individual response has to include the id of the query it refers to. Most of the services that NSI reasoner provide return (a set of) concept descriptions. These (sets of) concept descriptions should be handled according the core DIG 2.0 schema. Thus we do not propose a formal and complete schema for this kind of response. Instead we just give a proposal of valid XML document for this kind of response in the example section.
The only reasoning service that does not return a set of concept descriptions is "Find Matcher". This service returns a (possibly empty) set of of matchers, i.e, assignments, which are pairs of variables and concept names. So, we need a schema definition for a set of assignments, which is then refined to a set of matchers.

<xsd:complexType name="basicAssignment">
    <xsd:sequence>
        <xsd:element name="variable" type="lang:named"/>
        <xsd:group ref="lang:description"/>
    </xsd:sequence>
</xsd:complexType>

<xsd:complexType name="basicAssignmentsResponse>
   <xsd:sequence>
      <xsd:element name="assignment" type="basicAssignment" minOccurs="0" maxOccurs="unbounded"/>
   </xsd:sequence>
   <xsd:attributeGroup ref="ask:basicAskAttributes"/>
</xsd:complexType>

<xsd:element name="matchers" type="basicAssignmentsResponse"/>

The complete definition of our proposal of NSI extension schema can be found here.

NSI Examples

This page illustrates the use of the inferences by providing sample KBs, queries and the responses.

Open Issues

Architecture of core DIG and stand-alone DIG extensions
Referencing of DLs
Should the targetDL be a string or a URI?
Clever naming scheme for response involving variables, so that it can be re-used by the forthcoming "conjunctive query" extension.
Error Messages
The NSI extension needs different error messages and codes than the core part of DIG. Fro example there should be error codes be reserved for the following kinds of messages:
- Inference not supported for this concept language.
- TBox is not unfoldable.
- The concept pattern given has no variable.
Can some of them be shared with other extensions?

References

[NSIs]: Using Non-standard Inferences in Description Logics — what does it buy me?
[GCS]: Computing the GCS w.r.t. a Background terminology.

Anni-Yasmin Turhan

Last modified: Thu Sep 7 15:23:17 CEST 2006