User Tools

Site Tools


magik-demo:user:school-example

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

magik-demo:user:school-example [2013/08/23 10:32]
alex
magik-demo:user:school-example [2017/07/06 15:24]
Line 1: Line 1:
-{{:​magik-demo:​magik_logo.png?​nolink&​500|}} 
  
- 
-====== ​ Completeness of School Data ====== 
- ​Content:​ 
-    * [[#​Introduction|Introduction]] 
-    * [[#Example 1: TC-QC plain|Example 1: TC-QC plain]] 
-    * [[#Example 2: TC-QC under Finite Domain Constraints|Example 2: TC-QC under Finite Domain Constraints]] 
-    * [[#Example 3: TC-QC under Foreign Keys|Example 3: TC-QC under Foreign Keys]] 
-    * [[#Example 4: TC-QC under Foreign Keys and Finite Domain Constraints|Example 4: TC-QC under Foreign Keys and Finite Domain Constraints]] 
-    * [[#Example 5: TC-QC under Set and Bag semantics|Example 5: TC-QC under Set and Bag semantics]] 
-  
-===== Introduction ===== 
-In database applications such as information integration and decision support, ​ 
-one is in interested in data sets that are complete, in the sense that the data  
-represent all relevant facts that hold in the real world. In many situations, though, ​ 
-it is only possible to guarantee partial completeness of the data, which means that  
-for certain aspects of the application domain the data are complete, but not for others. ​ 
-In such a situation, one would like to know at least whether the available data are  
-sufficient to answer a given query completely, that is, whether the answers to the  
-query over the avalable data are the same as if the data set were complete. 
- 
-An example is the management of school data in the province of Bolzano, which motivated ​ 
-the work reported here. Data in the school information system of the province are often  
-incomplete because each school individually is responsible for inserting its data into 
- the system and because for certain kinds of data the contribution is optional. ​ 
-Decision makers, however, need to know whether or not the statistics on which they base 
- the allocation of resources to schools are derived from complete data. 
- 
- 
-The problem statement that MAGIK solves is the following: ​ 
-Given a collection of table completeness (TC) statements ​ 
-(meta-statements that states which parts of tables are certainly complete) 
- that hold over a database instance 
-D, we can ask whether they entail that for 
-a query Q the set of answers Q(D) is complete, that is, 
-whether Q(D) contains all answer records that one would expect ​ 
-if D were complete (it contains all information). 
- 
-In the rest we illustrate main reasoning featutes of MAGIK using the following 
-toy schema: 
-<code sql> 
-pupil(name,​level,​code),​ 
-class(level,​code,​branch),​ 
-learns(name,​lang). 
-</​code>​ 
-===== Example 1: TC-QC plain ===== 
- 
-Assume we are interested who are the pupils from the 1st level: 
-<code sql> 
-Q1=    SELECT p.name 
-       ​FROM ​  pupil AS p 
-       ​WHERE ​ p.level='​1'​ 
-</​code>​ 
- 
- 
-We also assume that we are complete for all pupils in the school, expressed with the following ​ 
-TC-statement 
-((TC statements are written in a datalog-like syntax. Syntactically,​ 
-a statement for a table R consists of two parts: an R-atom, 
-representing a selection on R and, possibly, a condition, representing 
-a semijoin with other tables. Here we adapt logic programming convention that variables start with upper-case letters, while constants ​ 
-start with lower-case letters.)): 
-<code sql> 
-TC1=   ​TABLE:​ pupil(Name,​Level,​Code) ​ WHERE: 
-</​code>​ 
- 
-We reason in the following way. If we are complete for all pupils in the school, then we are complete also for the pupils from the 1st level. 
-Not surprisingly, ​ if TC1 holds over our database instance the answer obtained from query Q1 will be complete, regardless of the completeness of the other parts of the database. 
- Now if we {{:​magik-demo:​go.jpg?​nolink&​10| }}run query Q1 by selecting only TC1, the MAGIK reasoner inform us that  
-{{:​magik-demo:​ok.jpg?​nolink&​10| }}**<fc green>​query is complete</​fc>​**. 
- 
-Now assume a different scenario where our database is less complete. ​ 
-In particular, we assume that we are complete only for pupils from the class ''​1a'':​ 
-<code sql> 
-TC2=   ​TABLE:​ pupil(Name,​1,​a) ​        ​WHERE:​ 
-</​code>​ 
- 
-Now TC2 states the completeness only for  certain pupils on the 1st level. ​ 
-Without additional information,​ we don'​t ​ 
-know how much we are complete in the case of other classes (f.e., ''​1b''​ or ''​1c''​). 
-Consequently,​ if we  {{:​magik-demo:​go.jpg?​nolink&​10| }}run query Q1 selecting only TC2, MAGIK informs us  
-that {{:​magik-demo:​no.jpg?​nolink&​10| }}**<fc red>​query is not complete</​fc>​**. 
- 
-Considering that the query is not complete, MAGIK analysis which parts of the database are needed for Q1 in order to be complete. 
-Parts that are missing are expressed in the form of TC-statements. ​ 
-Proposed set of TC-statements is the minimal extension of the existing set of TC-statements. 
-In the case our settings, we see that MAGIK suggests to complete all pupils ​ 
-from the 1st class: 
-<code sql> 
-TC3=   ​TABLE:​ pupil(P_Name,​1,​P_Code) ​ WHERE: 
-</​code>​ 
-This is expected considering that we do not have other information about possible ''​class''​ codes. 
-After adding TC3 and {{:​magik-demo:​go.jpg?​nolink&​10| }} running Q1 again, ​ 
- MAGIK informs us that 
-{{:​magik-demo:​ok.jpg?​nolink&​10| }}**<fc green>​query is complete</​fc>​**. 
-===== Example 2: TC-QC under Finite Domain Constraints ===== 
-In this section we illustrate how finite domain constraints can be used to infer query completeness. 
-Let us consider again query Q1 
-<code sql> 
-Q1=    SELECT p.name 
-       ​FROM ​  pupil AS p 
-       ​WHERE ​ p.level='​1'​ 
-</​code>​ 
-and TC-statement TC2 
-<code sql> 
-TC2=   ​TABLE:​pupil(Name,​1,​a) ​ WHERE: 
-</​code>​ 
- 
-In addition to the previous example, we have an information that ''​code''​ attribute in the relation ''​pupil''​ can be either ​ 
-''​a''​ or ''​b''​. We express this with the following finide domain constraint: 
-<code sql> 
-FD1=   ​pupil(code) ​ IN {a,b} 
-</​code>​ 
- 
-We see that considering together TC2 and FD1, to answer Q1 completely, we might be only missing pupils from ''​1b''​. ​ 
-We {{:​magik-demo:​go.jpg?​nolink&​10| }} run the query Q1 under this setting and not surprisingly MAGIK informes us that {{:​magik-demo:​no.jpg?​nolink&​10| }}<fc red> query is not complete</​fc>​. 
- 
-However, if look at the MAGIK suggestions regarding the missing TC-statements,​ we can see that MAGIK proposes us to collect the pupil data  
-about the class ''​1b'',​ and to confirm it by adding the following TC-statement:​ 
-<code sql> 
-TC4=   ​TABLE:​pupil(P_Name,​1,​b) ​ WHERE: 
-</​code>​ 
-We follow those instructions and we call {{:​magik-demo:​go.jpg?​nolink&​10| }}run query Q1, under the configuration of TC2,TC4 and FD1.  
-MAGIK reasoner inform us the now the {{:​magik-demo:​ok.jpg?​nolink&​10| }}<fc green> query is complete</​fc>​. 
-===== Example 3: TC-QC under Foreign Keys ===== 
-In this section we illustrate how foreign keys impact the query completeness. 
-We assume that there exist a foreign keys from ''​pupil''​ to ''​class'' ​ 
-<code sql> 
-FK1: FOREIGN KEY pupil(level,​code) REFERENCES class(level,​code), ​ 
-</​code>​ 
-and from ''​learns''​ to ''​pupil''​ 
-<code sql> 
-FK2: FOREIGN KEY learns(name) ​     REFERENCES pupil(name). 
-</​code>​ 
- 
- 
-=== Enforced semantics === 
- 
-Let us assume that foreign keys obey so ''​enforced''​ semantics. That is when the foreign keys hold both 
- over the available database instance and over the ideal (complete) version of it. 
-Note, that we do not have ideal version at the disposal, but only available (incomplete) version of a database.  ​ 
-In general, one can assume that foreign keys hold only over the  
-ideal version, while the available instance can violate foreign keys.  
-This situation is plausible in information ​ 
-integration scenario. Nevertheless,​ we for now we assume ''​enforced''​ semantics. 
- 
- 
-Now, we ask for the all pupils in the ''​science''​ branch that learn French language: 
-<code sql> 
-Q2=    SELECT p.name 
-       ​FROM ​  ​pupila AS p, class AS c, learns AS l 
-       ​WHERE ​ p.name=l.name ​     AND l.lang='​french'​ 
-       ​AND ​   p.level=c.level ​   AND p.code=c.code 
-       ​AND ​   c.branch='​science'​ 
-</​code>​ 
- 
-Additionally,​ we assume that we are complete for all French class attendances:​ 
-<code sql> 
-TC5=   ​TABLE:​ learns(Name,​french) ​  ​WHERE:​ 
-</​code>​ 
- 
-Under this setting, we can reason in the following way. If for a some student exists a French class attendance in the ideal database then  
-according TC5 it must exists 
-also in the available databas instance. Considering that available database instance obeys the foreign key constraint FK1, then for that student there  exists a corresponding record in the ''​pupil''​ table. Further, considering this and FK2 there must exists a corresponding record 
-in the ''​class''​ table as well. In other words, if this student is a  
-''​science''​ student in ideal database instance then that student is ''​science''​ in the available instance as well.  
-If we {{:​magik-demo:​go.jpg?​nolink&​10| }}run the query Q2, MAGIK  
-takes all this into consideration, ​ 
- and it  returns that {{:​magik-demo:​ok.jpg?​nolink&​10| }} <fc green> query is complete</​fc>​. 
- 
-=== Non-Enforced semantics === 
- 
-Now we relax the assumptions from the setttings above, by assuming that foreign keys are not enforced ​ 
-over the available database instance. 
-If we {{:​magik-demo:​go.jpg?​nolink&​10| }} run query Q2 for the FK1, FK2 and TC4, MAGIK inform us that  
-{{:​magik-demo:​no.jpg?​nolink&​10| }} <fc red> query is not complete</​fc>​. 
-If we look at the suggestions,​ MAGIK proposes us to add the following TC-statements:​ 
-<code sql> 
-TC6=   ​TABLE:​ class(C_level,​C_code,​science) WHERE:​ learns(P_name,​french),​ pupil(P_name,​P_level,​C_code) 
-TC7=   ​TABLE:​ pupil(P_name,​P_level,​C_code) WHERE:​ class(C_level,​C_code,​science),​ learns(P_name,​french) 
-</​code>​ 
-After adding those statements and {{:​magik-demo:​go.jpg?​nolink&​10| }} running Q1 again, MAGIK  concludes that  
-{{:​magik-demo:​ok.jpg?​nolink&​10| }} <fc green> query is complete</​fc>​. 
-===== Example 4: TC-QC under Foreign Keys and Finite Domain Constraints ===== 
-In this section we illustrate how interplay between foreign keys and finite domains impact the query completeness. 
-We illustrate this in the following scenario. 
-We assume that there exists a foreign key (no metter if it is enforced or not) 
-<code sql> 
-FK1=  FOREIGN KEY pupil(level,​code) REFERENCES class(level,​code), ​ 
-</​code>​ 
- 
-Then we know that ''​branch''​ in the ''​class''​ relation can be either ''​science''​ or ''​humanities''​. 
-<code sql> 
-FD2=  class(branch) IN  {humanities,​science}, ​ 
-</​code>​ 
-  
-Lastly, we assume that we are complete for all pupils from the humanities branch and for all pupils from science branch expressed with the following TC-statement:​ 
-<code sql> 
-TC8=  TABLE: pupil(Name,​Class,​Level) ​ WHERE: class(Level,​Code,​humanities), ​ 
-TC9=  TABLE: pupil(Name,​Class,​Level) ​ WHERE: class(Level,​Code,​science). 
-</​code>​ 
- 
-Now we have a query that ask for all pupils: 
-<code sql> 
-Q3=   ​SELECT p.name 
-      FROM   ​pupila AS p 
-</​code>​ 
- 
-We reason in the following way. If there is a record of a pupil in the idel database instance, then following the foreign FK1, there exists a 
-corresponding records in the ''​class''​ relation. Due to the FD1 this record has either ''​humanities''​ or ''​science''​ on the branch position. 
-If the first is the case, then following TC8 the pupil record will exists in the available database. 
-If the second is the case, then following TC9 the pupil record will exists in the available database again. 
-We {{:​magik-demo:​go.jpg?​nolink&​10| }} run query Q3 for FK1, FD2, TC8 and TC9, and MAGIK inform us that  
-{{:​magik-demo:​ok.jpg?​nolink&​10| }} <fc green> query is  complete</​fc>​. 
- 
- 
-===== Example 5: TC-QC under Set and Bag semantics ===== 
- 
-In this section we illustrate the difference between set and bag semantics. 
-Default SQL semantics is beg semantics. To express set semantics one needs to use ''​DISTINCT''​ after the ''​SELECT''​. 
-For example consider the following two queries. 
-<code sql> 
-Q4=   ​SELECT DISTINCT l1.Name 
-      FROM   ​learns AS l1, learns AS l2 
-      WHERE  l1.Name = l2.Name ​ 
-      AND    l1.Lang = ’french’ 
- 
-Q5=   ​SELECT l1.Name 
-      FROM   ​learns AS l1, learns AS l2 
-      WHERE  l1.Name = l2.Name ​ 
-      AND    l1.Lang = ’french’ 
-</​code>​ 
- 
-Query Q4 returns a pupil if such a pupil learns French language. Each pupil will be returned at most once, considering ​ ''​DISTINCT''​ command in the 
-''​SELECT''​. In this case appearance of ''​l2''​ is absolute superfouls. 
- 
-On the other hand, query Q5 returns each 
-name of a learner of French as many times as there are learns tuples with that name. In this case appearance of ''​l2''​ is absolute necessary ​ 
-to preserve the answers that returns Q5. 
- 
-Now assume that we are complete for all French learners: 
-<code sql> 
-TC10=    TABLE: learns(Name,​french) ​       WHERE: 
-</​code>​ 
- 
-If we {{:​magik-demo:​go.jpg?​nolink&​10| }}run query Q4 under consideration of TC10, MAGIK informs us that {{:​magik-demo:​ok.jpg?​nolink&​10| }} <fc green> query is  complete</​fc>​. 
-Contrary, if we {{:​magik-demo:​go.jpg?​nolink&​10| }}run query Q5 under consideration of TC10, MAGIK informs us that {{:​magik-demo:​no.jpg?​nolink&​10| }} <fc red> query is not  complete</​fc>​. 
- 
-In addition to the standard SQL ''​select-project-join''​ queries, MAGIK allows use of ''​count(*)''​ and ''​GROUP BY''​. 
-For example, we can write a query, that prints the similar result as query Q5, but in a more elegant way: 
-<code sql> 
-Q6=   ​SELECT ​  ​l1.name,​ count(*) AS number 
-      FROM     ​learns AS l1, learns AS l2 
-      WHERE    l1.name = l2.name ​ 
-      AND      l1.lang = ’french’ 
-      GROUP BY l1.Name 
-</​code>​ 
- 
-Now, if we look at MAGIK suggestions for the query Q5 (the same will be for query Q6), MAGIK proposes to  
-to complete the learns tuples for the names of those pupils 
-who learn French: 
-<code sql> 
-TC11=    TABLE: learns(Tl2_name,​Tl2_lang) ​ WHERE: learns(Tl2_name,​french),​ 
-</​code>​ 
-(note, the request is not for all tuples, but only 
-those whose Tl2_name variable satisfies the condition). We follow this instruction and  
-we {{:​magik-demo:​go.jpg?​nolink&​10| }} run query Q5 (Q6) where TC10 are TC11 activated. Finally, ​ 
-MAGIK displays that {{:​magik-demo:​ok.jpg?​nolink&​10| }} <fc green> query is  complete</​fc>​. 
- 
-  
magik-demo/user/school-example.txt · Last modified: 2017/07/06 15:24 (external edit)