Differences

This shows you the differences between two versions of the page.

--- magik-demo:user:school-example [2013/08/23 10:32]
alex
+++ magik-demo:user:school-example [2017/07/06 15:24]
@@ Line 1: / Line 1: @@
-{{:magik-demo:magik_logo.png?nolink&500|}}
-======  Completeness of School Data ======
- Content:
-    * [[#Introduction|Introduction]]
-    * [[#Example 1: TC-QC plain|Example 1: TC-QC plain]]
-    * [[#Example 2: TC-QC under Finite Domain Constraints|Example 2: TC-QC under Finite Domain Constraints]]
-    * [[#Example 3: TC-QC under Foreign Keys|Example 3: TC-QC under Foreign Keys]]
-    * [[#Example 4: TC-QC under Foreign Keys and Finite Domain Constraints|Example 4: TC-QC under Foreign Keys and Finite Domain Constraints]]
-    * [[#Example 5: TC-QC under Set and Bag semantics|Example 5: TC-QC under Set and Bag semantics]]
-===== Introduction =====
-In database applications such as information integration and decision support,
-one is in interested in data sets that are complete, in the sense that the data
-represent all relevant facts that hold in the real world. In many situations, though,
-it is only possible to guarantee partial completeness of the data, which means that
-for certain aspects of the application domain the data are complete, but not for others.
-In such a situation, one would like to know at least whether the available data are
-sufficient to answer a given query completely, that is, whether the answers to the
-query over the avalable data are the same as if the data set were complete.
-An example is the management of school data in the province of Bolzano, which motivated
-the work reported here. Data in the school information system of the province are often
-incomplete because each school individually is responsible for inserting its data into
- the system and because for certain kinds of data the contribution is optional.
-Decision makers, however, need to know whether or not the statistics on which they base
- the allocation of resources to schools are derived from complete data.
-The problem statement that MAGIK solves is the following:
-Given a collection of table completeness (TC) statements
-(meta-statements that states which parts of tables are certainly complete)
- that hold over a database instance
-D, we can ask whether they entail that for
-a query Q the set of answers Q(D) is complete, that is,
-whether Q(D) contains all answer records that one would expect
-if D were complete (it contains all information).
-In the rest we illustrate main reasoning featutes of MAGIK using the following
-toy schema:
-<code sql>
-pupil(name,level,code),
-class(level,code,branch),
-learns(name,lang).
-</code>
-===== Example 1: TC-QC plain =====
-Assume we are interested who are the pupils from the 1st level:
-<code sql>
-Q1=    SELECT p.name
-       FROM   pupil AS p
-       WHERE  p.level='1'
-</code>
-We also assume that we are complete for all pupils in the school, expressed with the following
-TC-statement
-((TC statements are written in a datalog-like syntax. Syntactically,
-a statement for a table R consists of two parts: an R-atom,
-representing a selection on R and, possibly, a condition, representing
-a semijoin with other tables. Here we adapt logic programming convention that variables start with upper-case letters, while constants
-start with lower-case letters.)):
-<code sql>
-TC1=   TABLE: pupil(Name,Level,Code)  WHERE:
-</code>
-We reason in the following way. If we are complete for all pupils in the school, then we are complete also for the pupils from the 1st level.
-Not surprisingly,  if TC1 holds over our database instance the answer obtained from query Q1 will be complete, regardless of the completeness of the other parts of the database.
- Now if we {{:magik-demo:go.jpg?nolink&10| }}run query Q1 by selecting only TC1, the MAGIK reasoner inform us that
-{{:magik-demo:ok.jpg?nolink&10| }}**<fc green>query is complete</fc>**.
-Now assume a different scenario where our database is less complete.
-In particular, we assume that we are complete only for pupils from the class ''1a'':
-<code sql>
-TC2=   TABLE: pupil(Name,1,a)         WHERE:
-</code>
-Now TC2 states the completeness only for  certain pupils on the 1st level.
-Without additional information, we don't
-know how much we are complete in the case of other classes (f.e., ''1b'' or ''1c'').
-Consequently, if we  {{:magik-demo:go.jpg?nolink&10| }}run query Q1 selecting only TC2, MAGIK informs us
-that {{:magik-demo:no.jpg?nolink&10| }}**<fc red>query is not complete</fc>**.
-Considering that the query is not complete, MAGIK analysis which parts of the database are needed for Q1 in order to be complete.
-Parts that are missing are expressed in the form of TC-statements.
-Proposed set of TC-statements is the minimal extension of the existing set of TC-statements.
-In the case our settings, we see that MAGIK suggests to complete all pupils
-from the 1st class:
-<code sql>
-TC3=   TABLE: pupil(P_Name,1,P_Code)  WHERE:
-</code>
-This is expected considering that we do not have other information about possible ''class'' codes.
-After adding TC3 and {{:magik-demo:go.jpg?nolink&10| }} running Q1 again,
- MAGIK informs us that
-{{:magik-demo:ok.jpg?nolink&10| }}**<fc green>query is complete</fc>**.
-===== Example 2: TC-QC under Finite Domain Constraints =====
-In this section we illustrate how finite domain constraints can be used to infer query completeness.
-Let us consider again query Q1
-<code sql>
-Q1=    SELECT p.name
-       FROM   pupil AS p
-       WHERE  p.level='1'
-</code>
-and TC-statement TC2
-<code sql>
-TC2=   TABLE:pupil(Name,1,a)  WHERE:
-</code>
-In addition to the previous example, we have an information that ''code'' attribute in the relation ''pupil'' can be either
-''a'' or ''b''. We express this with the following finide domain constraint:
-<code sql>
-FD1=   pupil(code)  IN {a,b}
-</code>
-We see that considering together TC2 and FD1, to answer Q1 completely, we might be only missing pupils from ''1b''.
-We {{:magik-demo:go.jpg?nolink&10| }} run the query Q1 under this setting and not surprisingly MAGIK informes us that {{:magik-demo:no.jpg?nolink&10| }}<fc red> query is not complete</fc>.
-However, if look at the MAGIK suggestions regarding the missing TC-statements, we can see that MAGIK proposes us to collect the pupil data
-about the class ''1b'', and to confirm it by adding the following TC-statement:
-<code sql>
-TC4=   TABLE:pupil(P_Name,1,b)  WHERE:
-</code>
-We follow those instructions and we call {{:magik-demo:go.jpg?nolink&10| }}run query Q1, under the configuration of TC2,TC4 and FD1.
-MAGIK reasoner inform us the now the {{:magik-demo:ok.jpg?nolink&10| }}<fc green> query is complete</fc>.
-===== Example 3: TC-QC under Foreign Keys =====
-In this section we illustrate how foreign keys impact the query completeness.
-We assume that there exist a foreign keys from ''pupil'' to ''class''
-<code sql>
-FK1: FOREIGN KEY pupil(level,code) REFERENCES class(level,code),
-</code>
-and from ''learns'' to ''pupil''
-<code sql>
-FK2: FOREIGN KEY learns(name)      REFERENCES pupil(name).
-</code>
-=== Enforced semantics ===
-Let us assume that foreign keys obey so ''enforced'' semantics. That is when the foreign keys hold both
- over the available database instance and over the ideal (complete) version of it.
-Note, that we do not have ideal version at the disposal, but only available (incomplete) version of a database.
-In general, one can assume that foreign keys hold only over the
-ideal version, while the available instance can violate foreign keys.
-This situation is plausible in information
-integration scenario. Nevertheless, we for now we assume ''enforced'' semantics.
-Now, we ask for the all pupils in the ''science'' branch that learn French language:
-<code sql>
-Q2=    SELECT p.name
-       FROM   pupila AS p, class AS c, learns AS l
-       WHERE  p.name=l.name      AND l.lang='french'
-       AND    p.level=c.level    AND p.code=c.code
-       AND    c.branch='science'
-</code>
-Additionally, we assume that we are complete for all French class attendances:
-<code sql>
-TC5=   TABLE: learns(Name,french)   WHERE:
-</code>
-Under this setting, we can reason in the following way. If for a some student exists a French class attendance in the ideal database then
-according TC5 it must exists
-also in the available databas instance. Considering that available database instance obeys the foreign key constraint FK1, then for that student there  exists a corresponding record in the ''pupil'' table. Further, considering this and FK2 there must exists a corresponding record
-in the ''class'' table as well. In other words, if this student is a
-''science'' student in ideal database instance then that student is ''science'' in the available instance as well.
-If we {{:magik-demo:go.jpg?nolink&10| }}run the query Q2, MAGIK
-takes all this into consideration,
- and it  returns that {{:magik-demo:ok.jpg?nolink&10| }} <fc green> query is complete</fc>.
-=== Non-Enforced semantics ===
-Now we relax the assumptions from the setttings above, by assuming that foreign keys are not enforced
-over the available database instance.
-If we {{:magik-demo:go.jpg?nolink&10| }} run query Q2 for the FK1, FK2 and TC4, MAGIK inform us that
-{{:magik-demo:no.jpg?nolink&10| }} <fc red> query is not complete</fc>.
-If we look at the suggestions, MAGIK proposes us to add the following TC-statements:
-<code sql>
-TC6=   TABLE: class(C_level,C_code,science)	WHERE: learns(P_name,french), pupil(P_name,P_level,C_code)
-TC7=   TABLE: pupil(P_name,P_level,C_code)	WHERE: class(C_level,C_code,science), learns(P_name,french)
-</code>
-After adding those statements and {{:magik-demo:go.jpg?nolink&10| }} running Q1 again, MAGIK  concludes that
-{{:magik-demo:ok.jpg?nolink&10| }} <fc green> query is complete</fc>.
-===== Example 4: TC-QC under Foreign Keys and Finite Domain Constraints =====
-In this section we illustrate how interplay between foreign keys and finite domains impact the query completeness.
-We illustrate this in the following scenario.
-We assume that there exists a foreign key (no metter if it is enforced or not)
-<code sql>
-FK1=  FOREIGN KEY pupil(level,code) REFERENCES class(level,code),
-</code>
-Then we know that ''branch'' in the ''class'' relation can be either ''science'' or ''humanities''.
-<code sql>
-FD2=  class(branch) IN  {humanities,science},
-</code>
-Lastly, we assume that we are complete for all pupils from the humanities branch and for all pupils from science branch expressed with the following TC-statement:
-<code sql>
-TC8=  TABLE: pupil(Name,Class,Level)  WHERE: class(Level,Code,humanities),
-TC9=  TABLE: pupil(Name,Class,Level)  WHERE: class(Level,Code,science).
-</code>
-Now we have a query that ask for all pupils:
-<code sql>
-Q3=   SELECT p.name
-      FROM   pupila AS p
-</code>
-We reason in the following way. If there is a record of a pupil in the idel database instance, then following the foreign FK1, there exists a
-corresponding records in the ''class'' relation. Due to the FD1 this record has either ''humanities'' or ''science'' on the branch position.
-If the first is the case, then following TC8 the pupil record will exists in the available database.
-If the second is the case, then following TC9 the pupil record will exists in the available database again.
-We {{:magik-demo:go.jpg?nolink&10| }} run query Q3 for FK1, FD2, TC8 and TC9, and MAGIK inform us that
-{{:magik-demo:ok.jpg?nolink&10| }} <fc green> query is  complete</fc>.
-===== Example 5: TC-QC under Set and Bag semantics =====
-In this section we illustrate the difference between set and bag semantics.
-Default SQL semantics is beg semantics. To express set semantics one needs to use ''DISTINCT'' after the ''SELECT''.
-For example consider the following two queries.
-<code sql>
-Q4=   SELECT DISTINCT l1.Name
-      FROM   learns AS l1, learns AS l2
-      WHERE  l1.Name = l2.Name
-      AND    l1.Lang = ’french’
-Q5=   SELECT l1.Name
-      FROM   learns AS l1, learns AS l2
-      WHERE  l1.Name = l2.Name
-      AND    l1.Lang = ’french’
-</code>
-Query Q4 returns a pupil if such a pupil learns French language. Each pupil will be returned at most once, considering  ''DISTINCT'' command in the
-''SELECT''. In this case appearance of ''l2'' is absolute superfouls.
-On the other hand, query Q5 returns each
-name of a learner of French as many times as there are learns tuples with that name. In this case appearance of ''l2'' is absolute necessary
-to preserve the answers that returns Q5.
-Now assume that we are complete for all French learners:
-<code sql>
-TC10=    TABLE: learns(Name,french)        WHERE:
-</code>
-If we {{:magik-demo:go.jpg?nolink&10| }}run query Q4 under consideration of TC10, MAGIK informs us that {{:magik-demo:ok.jpg?nolink&10| }} <fc green> query is  complete</fc>.
-Contrary, if we {{:magik-demo:go.jpg?nolink&10| }}run query Q5 under consideration of TC10, MAGIK informs us that {{:magik-demo:no.jpg?nolink&10| }} <fc red> query is not  complete</fc>.
-In addition to the standard SQL ''select-project-join'' queries, MAGIK allows use of ''count(*)'' and ''GROUP BY''.
-For example, we can write a query, that prints the similar result as query Q5, but in a more elegant way:
-<code sql>
-Q6=   SELECT   l1.name, count(*) AS number
-      FROM     learns AS l1, learns AS l2
-      WHERE    l1.name = l2.name
-      AND      l1.lang = ’french’
-      GROUP BY l1.Name
-</code>
-Now, if we look at MAGIK suggestions for the query Q5 (the same will be for query Q6), MAGIK proposes to
-to complete the learns tuples for the names of those pupils
-who learn French:
-<code sql>
-TC11=    TABLE: learns(Tl2_name,Tl2_lang)  WHERE: learns(Tl2_name,french),
-</code>
-(note, the request is not for all tuples, but only
-those whose Tl2_name variable satisfies the condition). We follow this instruction and
-we {{:magik-demo:go.jpg?nolink&10| }} run query Q5 (Q6) where TC10 are TC11 activated. Finally,
-MAGIK displays that {{:magik-demo:ok.jpg?nolink&10| }} <fc green> query is  complete</fc>.

ognjen.savkovic@wikispace

User Tools

Site Tools

Differences

Page Tools