Presentation is loading. Please wait.

Presentation is loading. Please wait.

2003-3-28DASFAA2003, Kyoto, Japan1 GLASS: A Graphical Query Language for Semi-Structured Data Wei Ni Tok Wang Ling Department of Computer Science National.

Similar presentations


Presentation on theme: "2003-3-28DASFAA2003, Kyoto, Japan1 GLASS: A Graphical Query Language for Semi-Structured Data Wei Ni Tok Wang Ling Department of Computer Science National."— Presentation transcript:

1 2003-3-28DASFAA2003, Kyoto, Japan1 GLASS: A Graphical Query Language for Semi-Structured Data Wei Ni Tok Wang Ling Department of Computer Science National University of Singapore, Singapore E-mail: {niwei, lingtw}@comp.nus.edu.sg

2 2003-3-28DASFAA2003, Kyoto, Japan2 Roadmap 1. Introduction and motivation 2. ORA-SS  the data model of our language 3. GLASS, our graphical query language 4. Related works and comparison 5. Conclusion and future work

3 2003-3-28DASFAA2003, Kyoto, Japan3 1. Introduction and motivation  XML: a standard for representation, manipulation and exchange of data.  Query engines: a crucial application to exploit the full power of XML.  XQuery: a standard for querying XML data which is too difficult for non-technical users to use.  Graphical query language: a way to help users query XML data.

4 2003-3-28DASFAA2003, Kyoto, Japan4 1. Introduction and motivation (Cont.)  Criteria of a good query language  Expressiveness  Completeness  User-friendliness  Graphical Query Language: user-friendly and easy understanding

5 2003-3-28DASFAA2003, Kyoto, Japan5 2. ORA-SS  the data model of our language  ORA-SS (Object-Relationship-Attribute model for Semi- Structured data) : a rich semantic data model. Example 1: The DTD of “Department.xml” department course student name codetitle numbernamegrade 2, 1:n, 1:1 cs, 2, 1:n, 1:n cs The ORA-SS schema diagram of “Department.xml” There is a binary relationship type, named as “cs”, between course and student where one course may has one or many students and one students can take one or many courses. belongs to the binary relationship type “cs” between and

6 2003-3-28DASFAA2003, Kyoto, Japan6 3. GLASS  our graphical query language  GLASS (Graphical Query Language for Semi-Structured Data)  Targets of GLASS  support various queries including Aggregation Functions and Negation;  be clear and concise without ambiguity;  provide “freedom” to non-technical users in querying.

7 2003-3-28DASFAA2003, Kyoto, Japan7 3. GLASS  our graphical query language 3.1 General concepts in GLASS 1)Data icons i.Rectangle: the object class in ORA-SS, non-terminal element in XML (element with subelements or attributes). ii.Circle: the attribute in ORA-SS, terminal element with PCDATA only and the attribute (or attribute list) in XML. 2)Connections i.Arrow: relationship type ii.Dashed arrow: IDREF in XML iii.Line: link between output and original entities

8 2003-3-28DASFAA2003, Kyoto, Japan8 3. GLASS  our graphical query language 3.1 General concepts in GLASS (Cont.) 3)Box: the group entity in GLASS query graphs that consists of all rectangles or circles inside the box. 4)Derived entities: derived object classes and attributes are represented as dashed rectangles and dashed circles which are new data types defined by users. 5)Condition Logic Window (CLW): an optional part in a GLASS query to write logic expressions and statements (e.g. IF-THEN) for complex query conditions and/or constructions.

9 2003-3-28DASFAA2003, Kyoto, Japan9 3. GLASS  our graphical query language 3.1 General concepts in GLASS (Cont.) 6)Path Identifier  a unique name given to data icons or boxes;  indicated by prefix “$”. 7)Condition Identifier  a unique name given to the connections;  without prefix “$”. 8)Logic Expression: specifies the logic in query conditions 9)Statement: helps construct complex outputs

10 2003-3-28DASFAA2003, Kyoto, Japan10 3. GLASS  our graphical query language 3.1 General concepts in GLASS (Cont.) Structure of GLASS query graph CLW Logic expressions and statements Query condition Result construction The black parts are optional.

11 2003-3-28DASFAA2003, Kyoto, Japan11 3. GLASS  our graphical query language 3.2 Output construction Software Engineering John Smith A Mel Green C Database Design John Smith B The content of “Department.xml” Example 1: The DTD of “Department.xml” department course student name codetitle numbernamegrade 2, 1:n, 1:1 cs, 2, 1:n, 1:n cs The ORA-SS schema diagram of “Department.xml”

12 2003-3-28DASFAA2003, Kyoto, Japan12 Software Engineering John Smith A Mel Green C Database Design John Smith B 3. GLASS  our graphical query language 3.2 Output construction (Cont.) course Software Engineering Database Design (a) Extract courses and all information one level under course elements by using the default output method In the default output method, everything will be kept in the original style from source data. QUERYQUERY RESULT SOURCE DATA department course student name codetitle numbernamegrade 2, 1:n, 1:1 cs, 2, 1:n, 1:n cs The default number of level is “1”. We use “/2” to represent 2 levels under element course. The default entity type implies both attributes and simple subelements of course element Software Engineering John Smith A Mel Green C Database Design John Smith B

13 2003-3-28DASFAA2003, Kyoto, Japan13 Software Engineering John Smith A Mel Green C Database Design John Smith B Software Engineering John Smith A Mel Green C Database Design John Smith B (b) Extract courses and all information at all levels under course elements by using the default output method QUERYQUERY RESULT SOURCE DATA course * “*” is a wildcard which means all nested level under course element. 3. GLASS  our graphical query language 3.2 Output construction (Cont.) department course student name codetitle numbernamegrade 2, 1:n, 1:1 cs, 2, 1:n, 1:n cs

14 2003-3-28DASFAA2003, Kyoto, Japan14 Software Engineering John Smith A Mel Green C Database Design John Smith B course (c) Extract courses with all attributes of course element in the original XML document QUERYQUERY RESULT SOURCE DATA @ 3. GLASS  our graphical query language 3.2 Output construction (Cont.) The circle with “@” inside implies all attributes of course element in the original XML document department course student name codetitle numbernamegrade 2, 1:n, 1:1 cs, 2, 1:n, 1:n cs

15 2003-3-28DASFAA2003, Kyoto, Japan15 Software Engineering John Smith A Mel Green C Database Design John Smith B course Software Engineering Database Design (d) Extract courses and all subelements at one level under course elements (except the attributes of course elements) QUERYQUERY RESULT SOURCE DATA 3. GLASS  our graphical query language 3.2 Output construction (Cont.) department course student name codetitle numbernamegrade 2, 1:n, 1:1 cs, 2, 1:n, 1:n cs e The circle with “e” inside indicates all simple subelements of course element.

16 2003-3-28DASFAA2003, Kyoto, Japan16 Software Engineering John Smith A Mel Green C Database Design John Smith B course Software Engineering Database Design (e) Extract courses with their attributes and/or simple subelements as well as the contents of complex subelements at one level under course element. QUERYQUERY RESULT SOURCE DATA Since only the entities at one level under course element are displayed, the student number at the second level under course doesn’t appear in the result. 3. GLASS  our graphical query language 3.2 Output construction (Cont.) The blank rectangle stands for all complex subelements of course element. department course student name codetitle numbernamegrade 2, 1:n, 1:1 cs, 2, 1:n, 1:n cs The blank circle implies both attributes and simple subelements of course element

17 2003-3-28DASFAA2003, Kyoto, Japan17 Software Engineering John Smith A Mel Green C Database Design John Smith B 201 303 (f) Extract courses with their titles as attributes and codes as subelements in the output. a demonstration of the conversion between attribute and subelement QUERYQUERY RESULT SOURCE DATA 3. GLASS  our graphical query language 3.2 Output construction (Cont.) code course title @ course title code e The links here imply that the values of the new defined title attribute and code element come from the title and code in the original data.

18 2003-3-28DASFAA2003, Kyoto, Japan18 3. GLASS  our graphical query language 3.3 Basic query operators  Selection To select all courses whose codes begin with “2”, we can write an XQuery language as: FOR $c IN $department/course WHERE $c/@code/data() = ‘2%’ RETURN $c course code = ‘2%’ * The output course in RHS is the course that satisfies the condition in LHS

19 2003-3-28DASFAA2003, Kyoto, Japan19 3. GLASS  our graphical query language 3.4 Basic query operators  Projection To project all courses with their titles, XQuery will give the following expression: FOR $c IN $department/course RETURN { $c/title } course title Without the specification of subelements or attributes, the title will remain unchanged as a subelement of course in the result.

20 2003-3-28DASFAA2003, Kyoto, Japan20 3. GLASS  our graphical query language 3.5 Basic query operators  Join Suppose we have another XML data named as “Description.xml” containing the descriptions of all courses. course code title description The DTD of “Description.xml” The ORA-SS schema diagram of “Description.xml”

21 2003-3-28DASFAA2003, Kyoto, Japan21 3. GLASS  our graphical query language 3.5 Basic query operators  Join (Cont.) Query: To extract everything of the courses from “Department.xml”, which have descriptions in “Description.xml”, and put the corresponding descriptions under the courses in the results. FROM Department.xml course FROM Description.xml course code description course * description The URLs of data sources

22 2003-3-28DASFAA2003, Kyoto, Japan22 3. GLASS  our graphical query language 3.6 Aggregation functions  group & average Query: For all courses, display courses with their information (in default way) and the average grade of each course. Aggregation functions are available after “_group” functionality is done. To group student under course avg_grade is a derived entity and it will be constructed as a subelement of “course” course student _group cs grade course avg_grade AVG e The character “e” inside the dotted circle implies the avg_grade is a element type

23 2003-3-28DASFAA2003, Kyoto, Japan23 3. GLASS  our graphical query language 3.6 Aggregation functions  group & average Query: For all courses, display courses with their information (in default way) and the average grade of each course. (Presented by XQuery expressions as follows) for $ccd in distinct-values(document("Department.xml")//course/@code) let $c := document("Department.xml")//course, $s := $c/student where $c/@code = $ccd return { $c/title, } {avg($s/grade)}

24 2003-3-28DASFAA2003, Kyoto, Japan24 3. GLASS  our graphical query language 3.7 Query order sensitive data The ORA-SS schema diagram of “Bib.xml”, order-sensitive data Example 2: isbn title firstname lastname 2, +, +, < content author book The order is important to the Suppose we have an order sensitive data “Bib.xml”.

25 2003-3-28DASFAA2003, Kyoto, Japan25 3. GLASS  our graphical query language 3.7 Query order sensitive data (Cont.) author book isbntitle [1] Query: Display all books with their isbn’s, titles and their first authors. “[1]” means the first element in order

26 2003-3-28DASFAA2003, Kyoto, Japan26 3. GLASS  our graphical query language 3.8 Advanced features of GLASS Suppose we have a more complex XML data about projects, members and publications. project member publication id name job_title number title 2, +, + 3, *, + Example 3: The ORA-SS schema diagram of the data about “project, member and publication” Ternary relation among, and where one member in one project can have 0 or many publications; and one publication can belong to one or many (project, member) pairs.

27 2003-3-28DASFAA2003, Kyoto, Japan27 3. GLASS  our graphical query language 3.8.1 Group entity  the use of box Query: Display the members with names who have taken part in less than 5 projects but written more than 6 publications in some project they attended, and their names begin with the character “S”. _group member name = ‘S%’ project _group publication CNT < 5 CNT > 6 member name To group projects under member “CNT” means “count”. (“CNT” do count without eliminating duplicates.) We can also use other aggregation functions like SUM, AVG, etc. To group publications under each pair of (member, project) The box

28 2003-3-28DASFAA2003, Kyoto, Japan28 3. GLASS  our graphical query language 3.8.1 Group entity  the use of box (Cont.) A new query: Display the members with names who have totally taken part in less than 5 projects and totally written more than 6 publications, and their names begin with character “S”. member name member name = ‘S%’ project _group publication CNT < 5 CNT_UNIQUE > 6 _group member name = ‘S%’ project _group publication CNT < 5 CNT > 6 member name The previous query: Display the members with names who have taken part in less than 5 projects but written more than 6 publications in some project they attended, and their names begin with the character “S”. Select publication GROUP BY pairs Select publication only GROUP BY CNT_UNIQUE will eliminate duplicates in counting.

29 2003-3-28DASFAA2003, Kyoto, Japan29 3. GLASS  our graphical query language 3.8.2 Negation  the use of Condition Logic Window member name member publication title = “Introduction to XML” : A : CLW ¬  A; QUERY: display the name of those members who haven’t written the publication titled “Introduction to XML”. Logic expression means “does not have A” or “does not exist A” The identifier “A” means has a publication with title = “Introduction to XML” The colons are not part of the identifiers but distinguish the identifiers from relationship type names.

30 2003-3-28DASFAA2003, Kyoto, Japan30 3. GLASS  our graphical query language 3.8.3 IF-THEN Statement CLW A  B; {IF (A) THEN EXTRACT $pro;} member publication title = “Introduction to XML” : A : publication : B : title = “Introduction to Internet” member name project : $pro : Without the statements in CLW, the information of the projects that all satisfied members have taken part in will be displayed. With the statement in CLW, the information of the projects will be extracted only when condition  A is satisfied. Query: Display the members with their names who have written a publication titled “Introduction to XML” or “Introduction to Internet”; and for those members who have written “Introduction to XML”, also display all information about the projects that they have taken part in. Path identifier “$pro” represents the project element with its the attributes and/or simple subelements

31 2003-3-28DASFAA2003, Kyoto, Japan31 4. Related works and comparison  Form-based graphical XML query languages have a limited expression power.  Graphical XML Query Language [15]  XMLApe Query Language [17]  XML-GL[4,5] is not strong in expressing query logic and may cause misunderstanding in some cases. Ref[4] S. Ceri, S. Comai, E. Damiani, P, Fraternali, S. Paraboschi, and L. Tanca. XML-GL: a graphical language of querying and restructuring XML documents. In Proc. WWW8, Toronto, Canada, May 1999 Ref[5] S. Ceri, S. Comai, E. Damiani, P. Fraternali, and L. Tanca. Complex Queries in XML-GL. SAC (2) 2000: 888-893 Ref[15] Leo Mark, etc. XMLApe. College of Computing, Georgia Institue of Technology. http://www.cc.gatech.edu/projects/XMLApe/ http://www.cc.gatech.edu/projects/XMLApe/ Ref[17] Jan Paredaens, Peter Peelman, Letizia Tanca. G-Log: A Graph-Based Query Language. IEEE Transactions on Knowledge and Data Engineering, 7(3):436--453, June 1995.

32 2003-3-28DASFAA2003, Kyoto, Japan32  XML-GL is the world’s first visual language that explicitly addresses the full complexity of querying XML data.  Has a bipartite structure to express querying and restructuring.  Supports various XML queries  Selection and projection  Join from one or more input documents;  Construction of new documents and new elements  Arithmetic and aggregate functions  Union, difference, heterogeneous union, and Cartesian product. 4. Related works and comparison Features of XML-GL

33 2003-3-28DASFAA2003, Kyoto, Japan33 4. Related works and comparison An example of XML-GL The first interpretation: For all elements where rank subelement with MO-NAME and RANK; and for the remaining, we only output MN-NAME and YEAR. (No will appear twice in the result; and all s are output together in original order.) The second interpretation: For all elements where rank subelement with MO-NAME and RANK; and for all elements we output MN-NAME and YEAR again but do not output subelement. (Some s will appear twice in the result.)

34 2003-3-28DASFAA2003, Kyoto, Japan34 4. Related works and comparison How we express both (the 1 st interpretation) For all elements where rank subelement with MO-NAME and RANK; and for the remaining, we only output MN-NAME and YEAR.. (No will appear twice in the output; and all s are output together in original order) MANUFACTURER MODEL RANK <= 10 MANUFACTURER MODEL MN- NAME YEAR MO- NAME RANK Without any links with the entities in the LHS, all s are directly extracted from the data. With this link between two s, only the s with RANK <= 10 appear in the results.

35 2003-3-28DASFAA2003, Kyoto, Japan35 4. Related works and comparison How we express both (the 2 nd interpretation) MANUFACTURER MN- NAME YEAR For all elements where rank subelement with MO-NAME and RANK; and for all elements we output MN-NAME and YEAR again but do not output subelement. (The results of both queries are put separately in one file.) MANUFACTURER MODEL RANK <= 10 MANUFACTURER MODEL MN- NAME YEAR MO- NAME RANK With this link, only s with rank <=10 are kept in the outputs In comparison, the MANUFACTURER without any links with the LHS means all MANUFACTURER elements from the source data. The link between the two MANUFACTURERs implies only the MANUFACTURER that satisfies the condition in the LHS will be displayed.

36 2003-3-28DASFAA2003, Kyoto, Japan36 4. Related works and comparison Comparison XML-GL Graphical XML Query Language XMLApe Query Language GLASS Data ModelXML DTD XML SchemaORA-SS Support Selection, Projection and Join Yes Support Queries on Ordered Data YesNo Yes Support “group by” operator YesNo Yes Support Aggregation Function Yes No Yes Support NegationNo Yes Support Complex Condition YesNo Yes

37 2003-3-28DASFAA2003, Kyoto, Japan37 4. Related works and comparison Comparison (Cont.) XML-GL Graphical XML Query Language XMLApe Query Language GLASS Support XPathYesNo Yes Support Qualifiers( ,  ) No Yes Support Conditional Output Construction (e.g. With IF-THEN Clause) No Yes Support User-defined View YesNo Yes Support View Validation No Yes

38 2003-3-28DASFAA2003, Kyoto, Japan38 5. Conclusion  Query logic can be explicitly expressed in CLW.  In comparison with XML-GL, GLASS can express negation and logic among conditions easily and clearly.  By separating logic expressions from graphical data structures, we now can build a clear and concise graphical query language.

39 2003-3-28DASFAA2003, Kyoto, Japan39 5. Conclusion (Cont.)  We tend to express the complexity of XML queries in XQuery standard including Aggregation functions and Negation.  We support view transformation and validation, especially swapping elements in different levels. [6] Ref[6]: Yabing Chen, Tok Wang Ling, Mong Li Lee: Designing Valid XML Views. To appear in the proceedings of 21st International Conference on Conceptual Modeling (ER'2002), October 7-11, 2002, Tampere, Finland.

40 2003-3-28DASFAA2003, Kyoto, Japan40 Future work  Enhance the language;  Interpret the graphical expression into XQuery standard;  Expand the content of GLASS  data manipulation (e.g., INSERT, etc)  data integration  view definition  view maintenance

41 2003-3-28DASFAA2003, Kyoto, Japan41 References [1] S. Abiteboul, D. Quass, J. McHugh, J. Widom and J. Wiener. The Lorel Query Language for Semistructured Data. Department of Computer Science, Stanford University. International Journal on Digital Libraries, 1(1):68-88, Apr. 1997. [2] M. Angelaccio, T. Catarci, and G. Santucci. QDB*: A graphical query language with recursion. IEEE Transactions on Software Engineering, 16(10):1150-1163, 1990. [3] T. Catarci, S.K. Chang, M.F. Costabile, S. Levialdi, and G. Santucci. A graph-based framework for multiparadigmatic visual access to databases. IEEE Transactions on Knowledge and Data Engineering, 8(3):455-475, 1996. [4] S. Ceri, S. Comai, E. Damiani, P, Fraternali, S. Paraboschi, and L. Tanca. XML-GL: a graphical language of querying and restructuring XML documents. In Proc. WWW8, Toronto, Canada, May 1999 [5] S. Ceri, S. Comai, E. Damiani, P. Fraternali, and L. Tanca. Complex Queries in XML-GL. SAC (2) 2000: 888-893 [6] Yabing Chen, Tok Wang Ling, Mong Li Lee: Designing Valid XML Views. To appear in the proceedings of 21st International Conference on Conceptual Modeling (ER'2002), October 7-11, 2002, Tampere, Finland. [7] Zhuo Chen. Extracting Schema from XML Documents. SoC, NUS. Honours Year Project Report. [8] Sara Comai, Ernesto Damiani, Letizia Tanca. The WG-Log System: Data Model and Semantics. INTERDATA technical report, T2-R06, July 1998. [9] C. J. Date. An Introduction to Database Systems. 3rd Edition, Addison-Wesley Publishing Company, 1981. [10] Gillian Dobbie, Wu Xiaoying, Tok Wang Ling, Mong Li Lee: ORA-SS: An Object-Relationship-Attribute Model for Semistructured Data. TR21/00, Technical Report, Department of Computer Science, National University of Singapore, December 2000.

42 2003-3-28DASFAA2003, Kyoto, Japan42 References (Cont.) [11] P.W. Eklund, J. Leane, and C. Nowak. GrIT: An implementation of a graphical user interface for conceptual structures. Technical Report TR94-03, Computer Science Department, The University of Adelaide, February 1994. [12] Extensible Stylesheet Language (XSL) Specification. W3C Working Draft. Apr 1999. http://www.w3.org/TR/1999/WD-xsl-19990421/ http://www.w3.org/TR/1999/WD-xsl-19990421/ [13] Ankur Gupta, Zahid Khan. Graphical XML Query Language. Course paper. College of Computing, Georgia Institute of Technology, Sep 2000 [14] Joshua S. Hodas, Robert M. Keller, Ingo Muschenets, Jeffrey Polakow, Amy R. Ward and Will Ballard. Condor: A Simple, Expressive Graphical Database Query Language. Department of Computer Science, Harvey Mudd College. Computer Science Technical Report HMC-CS-97-04. [15] Leo Mark, etc. XMLApe. College of Computing, Georgia Institue of Technology. http://www.cc.gatech.edu/projects/XMLApe/ http://www.cc.gatech.edu/projects/XMLApe/ [16] Yuanying Mo, Tok Wang Ling. Storing and Maintaining Semistructured Data Efficiently in an Object-Relational Database. Research Report. SoC, NUS. [17] Jan Paredaens, Peter Peelman, Letizia Tanca. G-Log: A Graph-Based Query Language. IEEE Transactions on Knowledge and Data Engineering, 7(3):436--453, June 1995. [18] Jayavel Shanmugasundaram, Kristin Tufte, Gang He, Chun Zhang, David DeWitt and Jeffrey Naughton. Relational Databases for Querying XML Documents: Limitations and Opportunities. VLDB 1999: 302-314 Department of Computer Sciences, University of Wisconsin-Madison. [19] XML Path Language (XPath) 2.0. W3C. Apr 2002. http://www.w3.org/TR/xpath20/http://www.w3.org/TR/xpath20/

43 2003-3-28DASFAA2003, Kyoto, Japan43 References (Cont.) [20] XML Query Requirements. W3C. Feb 2001. http://www.w3.org/TR/xmlquery-reqhttp://www.w3.org/TR/xmlquery-req [21] XML Syntax for XQuery 1.0 (XQueryX). W3C. Jun 2001.http://www.w3.org/TR/xqueryxhttp://www.w3.org/TR/xqueryx [22] XQuery 1.0 and XPath 2.0 Data Model. W3C. Apr 2002. http://www.w3.org/TR/query-datamodel/http://www.w3.org/TR/query-datamodel/ [23] XQuery 1.0 and XPath 2.0 Functions and Operators Version 1.0. W3C. Apr 2002. http://www.w3.org/TR/xquery- operators/http://www.w3.org/TR/xquery- operators/ [24] XQuery 1.0 Formal Semantics. W3C. Mar 2002.http://www.w3.org/TR/query-semantics/http://www.w3.org/TR/query-semantics/

44 2003-3-28DASFAA2003, Kyoto, Japan44 The End Thank you very much


Download ppt "2003-3-28DASFAA2003, Kyoto, Japan1 GLASS: A Graphical Query Language for Semi-Structured Data Wei Ni Tok Wang Ling Department of Computer Science National."

Similar presentations


Ads by Google