1
$\begingroup$

I have a custom query language which expresses containment relations between variables. Containment in this context is simply an object (A) in programming language X (java/C#/python etc: a language with support for objects) having a child object (B)

The statements of interest are like these

A CONTAINS B CONTAINS C AND D 

The statement above describes all objects of type A that has a child object of type B which must have child objects of both C AND D. So this language defines containment constraints on objects, with support for boolean operators (AND, OR, NOT). Another example where parenthesis are used to define precedence would be:

A CONTAINS B CONTAINS (C AND (D CONTAINS E OR F)) 

I need to turn these statements to boolean expressions, and then I need to transform the boolean expressions to disjunctive normal form. I've came up with a method to do so, it is just that my method is ad hoc, and I don't really now if I can provide a mathematical definition of my transformation. Here is an attempt:

I'm taking every X CONTAINS Y statement on its own, and creating a variable in propositional logic represented by Y, that is X CONTAINS Y is a proposition that is represented by Y variable. To include X, I simply create a root proposition such as "Current data context CONTAINS X" which gives me X propositional logic variable. Therefore A CONTAINS B CONTAINS C AND D is transformed into the following set of propositions which in turn give me the variables:

CONTEXT CONTAINS A -> Boolean variable A     A CONTAINS B ->Boolean variable B B CONTAINS C ->Boolean variable C B CONTAINS D -> Boolean variable D 

and these propositions are turned into boolean variables and stitched together via boolean and/or/not operators:

A && B && (C && D)  

(I'm using && and || above to distinguish between AND keyword in the query statements which is an AND for containment.)

When boolean expressions are turned into disjunctive normal form, I end up re-writing the original containment statements so that OR operators end up as the root of operands which contain only AND/NOT operators.

Given what I'm doing, can I say that I am transforming the statements to propositional logic and then to boolean expressions? I would not like to use a mathematical definition of what I'm doing unless I'm sure my definition fits what I'm doing.

UPDATE In response to Henning Makholm's comments, here is more detail for anyone who would like to read: The query language I'm (partly) transforming is Archetype Query Language, a domain specific language for constructing queries on electronic health records. see: http://www.openehr.org/wiki/display/spec/Archetype+Query+Language+Description A, B and C above correspond to types of data elements. A real life example to containment query I've written above would be:

EHR e CONTAINS COMPOSITION c [openEHR-EHR-COMPOSITION.referral.v1] CONTAINS (OBSERVATION o openEHR-EHR-OBSERVATION-laboratory-hba1c.v1 AND OBSERVATION o1 openEHR-EHR-OBSERVATION-laboratory-glucose.v1) 

My transformation pulls OR operators in these queries to top level, because it helps later when I generate SQL to pull data from a quite tricky database schema.

1 Answers 1

1

I have some issues:

First, if you "re-writ[e] the original containment statements so that OR operators end up as the root of operands which contain only AND/NOT operators", then the result is in disjunctive normal form, not conjunctive ditto.

Second, it seems to be a somewhat pointless detour to invent new propositional/Boolean variables for your atomic "this-contains-that" formulas simply in order to put it into disjunctive (or conjunctive) normal form. It is quite common to do such manipulation directly on logical formulae with atoms that are not just propositional variables. It is of course quite alright to let some valuables represent each "this-contains-that" atom, but you should describe those valuables at ranging over such atoms, not over truth values. (And please select them from another end of the alphabet, or from a different alphabet, than the A, B, C you're already using as type names. Otherwise it becomes very confusing to read).

Third, it looks to me like your description is conflating names of element types with variables that range over concrete elements. For example, I suppose that the query A contains B contains A ought to be valid and describe an A that contains some B which again contains some A, not necessarily the original one. (Indeed, the word "contains" seems to suggest that the relation is acyclic). So the query is not just a question of finding one $A$ and one $B$ such that ($A$ contains $B$) and ($B$ contains $A$).

Therefore I think when you translate A CONTAINS (B CONTAINS (C AND D)) you ought to invent new variables to stand for the subexpressions and generate the following clauses:

p is an A q is a B r is a C s is a D (context) contains p p contains q q contains r q contains s 

Furthermore, if you have control over the design of the query language, I wonder whether the intended semantics wouldn't be clearer if the keyword were CONTAINING rather than CONTAINS.

  • 0
    I've tried to simplify the query language aspects. My question is supposed to be at a higher level. It is about the definition of the method. The query language has its tricks, and your warnings seem to relate to your interpretation of the actual implementation. You'd need a lot more information for that, which is not given here :) Thanks for all the input anyway, much appreciated.2012-11-11