This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
magik-demo:school-usecase [2012/06/25 01:39] cikm2012 [Example 1: TC-QC plain] |
— (current) | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | {{:magik-demo:magik-logo.jpg?900|}} | ||
- | ====== Use case: Completeness of data in a school information system ====== | ||
- | |||
- | ===== Introduction ===== | ||
- | The motivation for MAGIK comes from a project to create a school information system (SIS) that can give | ||
- | guarantees about the completeness of the school data. Typically, data in SIS is notoriously incomplete. | ||
- | Consequently, the school management is not able use this data for the decision support. | ||
- | |||
- | The problem statement that MAGIK solves is the following: | ||
- | Given a collection of table completeness (TC) statements | ||
- | (meta-statements that states which parts of tables are certainly complete) | ||
- | that hold over an instance | ||
- | D of our school database, we can ask whether they entail that for | ||
- | a certain query Q the set of answers Q(D) is complete, that is, | ||
- | whether Q(D) contains all answer records that one would expect if | ||
- | D were complete. | ||
- | |||
- | In the following we will illustrate main reasoning featutes of MAGIK, by the means of examples. | ||
- | We consider the following toy schema: | ||
- | <code sql> | ||
- | pupil(name,level,code), | ||
- | class(level,code,branch), | ||
- | learns(name,lang). | ||
- | </code> | ||
- | ===== Example 1: TC-QC plain ===== | ||
- | |||
- | Assume we are interested who are the pupils from the 1st level: | ||
- | <code sql> | ||
- | Q1= SELECT p.name | ||
- | FROM pupila AS p | ||
- | WHERE p.level='1' | ||
- | </code> | ||
- | |||
- | |||
- | We also assume that we are complete for all pupils in the school, expressed with the following | ||
- | TC-statement | ||
- | ((TC statements are written in a datalog-like syntax. Syntactically, | ||
- | a statement for a table R consists of two parts: an R-atom, | ||
- | representing a selection on R and, possibly, a condition, representing | ||
- | a semijoin with other tables. Here we adapt logic programming convention that variables start with upper-case letters, while constants | ||
- | start with lower-case letters.)): | ||
- | <code sql> | ||
- | TC1= TABLE: pupil(Name,Level,Code) WHERE: | ||
- | </code> | ||
- | |||
- | We reason in the following way. If we are complete for all pupils in the school, then we are complete also for the pupils from the 1st level. | ||
- | Not surprisingly, if TC1 holds over our database instance the answer obtained from query Q1 will be complete, regardless of the completeness of the other parts of the database. | ||
- | Now if we {{:magik-demo:go.jpg?nolink&10| }}run query Q1 by selecting only TC1, the MAGIK reasoner inform us that | ||
- | {{:magik-demo:ok.jpg?nolink&10| }}**<fc green>query is complete</fc>**. | ||
- | |||
- | Now assume a different scenario where our database is less complete. | ||
- | In particular, we assume that we are complete only for pupils from the class ''1a'': | ||
- | <code sql> | ||
- | TC2= TABLE: pupil(Name,1,a) WHERE: | ||
- | </code> | ||
- | |||
- | Now TC2 states the completeness only for certain pupils on the 1st level. | ||
- | Without additional information, we don't | ||
- | know how much we are complete in the case of other classes (f.e., ''1b'' or ''1c''). | ||
- | Consequently, if we {{:magik-demo:go.jpg?nolink&10| }}run query Q1 selecting only TC2, MAGIK informs us | ||
- | that {{:magik-demo:no.jpg?nolink&10| }}**<fc red>query is not complete</fc>**. | ||
- | |||
- | Considering that the query is not complete, MAGIK analysis which parts of the database are needed for Q1 in order to be complete. | ||
- | Parts that are missing are expressed in the form of TC-statements. | ||
- | Proposed set of TC-statements is the minimal extension of the existing set of TC-statements. | ||
- | In the case our settings, we see that MAGIK suggests to complete all pupils | ||
- | from the 1st class: | ||
- | <code sql> | ||
- | TC3= TABLE: pupil(P_Name,1,P_Code) WHERE: | ||
- | </code> | ||
- | This is expected considering that we do not have other information about possible ''class'' codes. | ||
- | After adding TC3 and {{:magik-demo:go.jpg?nolink&10| }} running Q1 again, | ||
- | MAGIK informs us that | ||
- | {{:magik-demo:ok.jpg?nolink&10| }}**<fc green>query is complete</fc>**. | ||
- | ===== Example 2: TC-QC under Finite Domain Constraints ===== | ||
- | In this section we illustrate how finite domain constraints can be used to infer query completeness. | ||
- | Let us again consider query Q1 | ||
- | <code sql> | ||
- | Q1= SELECT p.name | ||
- | FROM pupila AS p | ||
- | WHERE p.level='1' | ||
- | </code> | ||
- | and TC-statement | ||
- | <code sql> | ||
- | TC2= TABLE:pupil(Name,1,a) WHERE: | ||
- | </code> | ||
- | |||
- | In addition to the previous example, we have an information that ''code'' in the relation ''pupil'' can be either | ||
- | ''a'' or ''b''. We express this with a following constraint: | ||
- | <code sql> | ||
- | FD1= pupil(code) IN {a,b} | ||
- | </code> | ||
- | |||
- | We can see that considering together TC2 and FD1, to answer Q1 completely, we might be only missing pupils from ''1b''. | ||
- | We run the MAGIK reasoner under this settings and not suprisingly it informes us that query is not complete. | ||
- | |||
- | However, if look at the MAGIK suggestions regarding the missing TC-statements, we can see that MAGIK proposes us to collect the pupil data | ||
- | about the class ''1b'', and to confirm it by adding the following TC-statement: | ||
- | <code sql> | ||
- | TC4= TABLE:pupil(Name,1,b) WHERE: | ||
- | </code> | ||
- | We follow the MAGIK instructions and we call {{:magik-demo:go.jpg?nolink&10| }}run query, under the configuration of TC2,TC4 and FD. | ||
- | MAGIK reasoner inform us the now the {{:magik-demo:ok.jpg?nolink&10| }}<fc green> query is complete</fc>. | ||
- | |||
- | ===== Example 3: TC-QC under Foreign Keys ===== | ||
- | In this section we illustrate how foreign keys impact the query completeness. | ||
- | We assume that there exist a foreign keys from ''pupil'' to ''class'' | ||
- | <code sql> | ||
- | FK1: FOREIGN KEY pupil(level,code) REFERENCES class(level,code), | ||
- | </code> | ||
- | and from ''learns'' to ''pupil'' | ||
- | <code sql> | ||
- | FK2: FOREIGN KEY learns(name) REFERENCES pupil(name). | ||
- | </code> | ||
- | |||
- | |||
- | === Enforced semantics === | ||
- | |||
- | |||
- | For the foreign keys we assume, so called ''enforced'' semantics, that is when the foreign keys hold | ||
- | over the available database instance and over the ideal (complete) version of it. | ||
- | In general, one can assume that foreign keys hold only over the | ||
- | ideal version, while the available instance can violate foreign keys. | ||
- | This situation is plausible in information | ||
- | integration scenario. | ||
- | |||
- | |||
- | Now, we ask for the all pupils in the ''science'' department that attents French language: | ||
- | <code sql> | ||
- | Q2= SELECT p.name | ||
- | FROM pupila AS p,class AS,learns as l | ||
- | WHERE p.name=l.name AND l.lang='french' | ||
- | AND p.level=c.level AND p.code=c.code | ||
- | AND p.branch='science'. | ||
- | </code> | ||
- | |||
- | Additionally, we assume that we are complete for all French class attendances: | ||
- | <code sql> | ||
- | TC5= TABLE: learns(Name,french) WHERE: | ||
- | </code> | ||
- | |||
- | Under this setting, we run the query. MAGIK is reasoning in the following way. If for some student exists a French class attendance then it must exists | ||
- | also in the available databas instance. Then considering that available database instance obey foreign key constraint FK1, then for that student there exists a corresponding record in the ''pupil'' table. Further, considering this and FK2 there must exists a corresponding record | ||
- | in the ''class'' table as well. In other words, if this student is a | ||
- | ''science'' student in ideal database instance then that student is ''science'' in the available instance as well. | ||
- | Taking all this into consideration, we {{:magik-demo:go.jpg?nolink&10| }}run the query and MAGIK returns that {{:magik-demo:ok.jpg?nolink&10| }} <fc green> query is complete</fc>. | ||
- | === Non-Enforced semantics === | ||
- | |||
- | Now we relax assumptions from the setttings above, by assuming that foreign keys are not enforced | ||
- | over the available database instance. | ||
- | If we {{:magik-demo:go.jpg?nolink&10| }} run query Q2 for the FK1,FK2 and TC4, MAGIK inform us that | ||
- | {{:magik-demo:no.jpg?nolink&10| }} <fc red> query is not complete</fc>. | ||
- | If we look at the suggestions, MAGIK proposes us to add following TC-statements: | ||
- | <code sql> | ||
- | TC6= TABLE: class(C_level,C_code,science) WHERE: learns(P_name,french), pupil(P_name,P_level,C_code) | ||
- | TC7= TABLE: pupil(P_name,P_level,C_code) WHERE: class(C_level,C_code,science), learns(P_name,french) | ||
- | </code> | ||
- | After adding those statements and {{:magik-demo:go.jpg?nolink&10| }} running Q1 again, MAGIK concludes that | ||
- | {{:magik-demo:ok.jpg?nolink&10| }} <fc green> query is complete</fc>. | ||
- | |||
- | |||
- | ===== Example 4: TC-QC under Foreign Keys and Finite Domain Constraints ===== | ||
- | In this section we illustrate how interplay between foreign keys and finite domains impact the query completeness. | ||
- | We assume that foreign key | ||
- | <code sql> | ||
- | FK1= FOREIGN KEY pupil(level,code) REFERENCES class(level,code), | ||
- | </code> | ||
- | is enforced. | ||
- | Then we know that ''branch'' in the ''class'' relation can be either ''science'' or ''humanities''. | ||
- | <code sql> | ||
- | FD2= class(branch) IN {humanities,science}, | ||
- | </code> | ||
- | |||
- | Also we assume that we are complete for all pupils from the humanities branch and for all pupils from science branch expressed with the following TC-statement: | ||
- | <code sql> | ||
- | TC8= TABLE: pupil(Name,Class,Level) WHERE: class(Level,Code,humanities), | ||
- | TC9= TABLE: pupil(Name,Class,Level) WHERE: class(Level,Code,science). | ||
- | </code> | ||
- | |||
- | Now we have a query that ask for all pupils: | ||
- | <code sql> | ||
- | Q3= SELECT p.name | ||
- | FROM pupila AS p | ||
- | </code> | ||
- | |||
- | We reason on the following way. If there is a record of a pupil in the idel database instance, then following the foreign FK1, there exists a | ||
- | corresponding records in the ''class'' relation. Due to the FD1 this record has either ''humanities'' or ''science'' on the branch position. | ||
- | If the first is the case, then following TC8 the pupil record will exists in the available database. | ||
- | If the second is the case, then following TC89 then again the pupil record will exists in the available database. | ||
- | We {{:magik-demo:go.jpg?nolink&10| }} run query Q3 for the FK1, FD and TC4, and MAGIK inform us that | ||
- | {{:magik-demo:ok.jpg?nolink&10| }} <fc green> query is complete</fc>. | ||
- | |||
- | |||
- | ===== Example 5: TC-QC under Set and Bag semantics ===== | ||
- | |||
- | In this section we illustrate the difference between set and bag semantics. | ||
- | Default SQL semantics is beg semantics. To express set semantics one needs to use ''DISTINCT'' after the ''SELECT''. | ||
- | For example consider the following two queries. | ||
- | <code sql> | ||
- | Q4= SELECT DISTINCT l1.Name | ||
- | FROM learns AS l1, learns AS l2 | ||
- | WHERE l1.Name = l2.Name | ||
- | AND l1.Lang = ’french’ | ||
- | |||
- | Q5= SELECT l1.Name | ||
- | FROM learns AS l1, learns AS l2 | ||
- | WHERE l1.Name = l2.Name | ||
- | AND l1.Lang = ’french’ | ||
- | </code> | ||
- | |||
- | Query Q4 returns a pupil if such a pupil learns French language. Each pupil will be returned at most once, considering ''DISTINCT'' command in the | ||
- | ''SELECT''. In this case appearance of ''l2'' is absolute superfouls. | ||
- | |||
- | On the other hand, query Q5 returns each | ||
- | name of a learner of French as many times as there are learns tuples with that name. In this case appearance of ''l2'' is absolute necessary | ||
- | to preserve the answers that returns Q5. | ||
- | |||
- | Now assume that we are complete for all French learners: | ||
- | <code sql> | ||
- | TC10= TABLE: learns(Name,french) WHERE: | ||
- | </code> | ||
- | |||
- | If we {{:magik-demo:go.jpg?nolink&10| }}run query Q4 under consideration of TC10, MAGIK informs us that {{:magik-demo:ok.jpg?nolink&10| }} <fc green> query is complete</fc>. | ||
- | Contrary, if we {{:magik-demo:go.jpg?nolink&10| }}run query Q5 under consideration of TC10, MAGIK informs us that {{:magik-demo:no.jpg?nolink&10| }} <fc red> query is not complete</fc>. | ||
- | |||
- | In addition to the standard SQL ''select-project-join'' queries, MAGIK allows use of ''count(*)'' and ''GROUP BY''. | ||
- | For example, a more elegant way of writing query Q5 can be: | ||
- | <code sql> | ||
- | Q6= SELECT l1.name, count(*) AS number | ||
- | FROM learns AS l1, learns AS l2 | ||
- | WHERE l1.name = l2.name | ||
- | AND l1.lang = ’french’ | ||
- | GROUP BY l1.Name | ||
- | </code> | ||
- | |||
- | If we look at MAGIK suggestion for the query Q5 (the same for query Q6), MAGIK proposes to | ||
- | to complete the learns tuples for the names of those pupils | ||
- | who learn French: | ||
- | <code sql> | ||
- | TC11= TABLE: learns(Tl2_name,Tl2_lang) WHERE: learns(Tl2_name,french), | ||
- | </code> | ||
- | (note, the request is not for all tuples, but only | ||
- | those whose Tl2_name variable satisfies the condition). We follow this instruction and | ||
- | we {{:magik-demo:go.jpg?nolink&10| }} run query Q5 (Q6) where TC10 are TC11 activated. Finally, | ||
- | MAGIK displays that {{:magik-demo:ok.jpg?nolink&10| }} <fc green> query is complete</fc>. | ||
- | |||
- |