Welcome to CPSC 534B: Information Integration Laks V.S. Lakshmanan Rm. 315.

Slides:



Advertisements
Similar presentations
Intelligent Technologies Module: Ontologies and their use in Information Systems Revision lecture Alex Poulovassilis November/December 2009.
Advertisements

Native XML Database or RDBMS. Data or Document orientation If you are primarily storing documents, then a Native XML Database may be the best option.
Chapter 5: Introduction to Information Retrieval
Efficient IR-Style Keyword Search over Relational Databases Vagelis Hristidis University of California, San Diego Luis Gravano Columbia University Yannis.
Database Systems Research: Where it is (or should be) Headed? (aka looking for a “perfect” candidate) Laks V.S. Lakshmanan Dept. of Computer Science Univ.
Searching and Exploring Biomedical Data Vagelis Hristidis School of Computing and Information Sciences Florida International University.
Database Theory: Back to the Future Victor Vianu UC San Diego / INRIA.
0 General information Rate of acceptance 37% Papers from 15 Countries and 5 Geographical Areas –North America 5 –South America 2 –Europe 20 –Asia 2 –Australia.
Search Engines and Information Retrieval
Data Management for XML: Research Directions By: Jennifer Widom Stanford University Reviewer: Kristin Streilein.
1 Rewriting Nested XML Queries Using Nested Views Nicola Onose joint work with Alin Deutsch, Yannis Papakonstantinou, Emiran Curtmola University of California,
Xyleme A Dynamic Warehouse for XML Data of the Web.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Chapter 6 Methodology Conceptual Databases Design Transparencies © Pearson Education Limited 1995, 2005.
The Last Lecture Agenda –1:40-2:00pm Integrating XML and Search Engines—Niagara way –2:00-2:10pm My concluding remarks (if any) –2:10-2:45pm Interactive.
Research on Intelligent Information Systems Himanshu Gupta Michael Kifer Annie Liu C.R. Ramakrishnan I.V. Ramakrishnan Amanda Stent David Warren Anita.
Advanced Topics COMP163: Database Management Systems University of the Pacific December 9, 2008.
TOSS: An Extension of TAX with Ontologies and Similarity Queries Edward Hung, Yu Deng, V.S. Subrahmanian Department of Computer Science University of Maryland,
© Anselm SpoerriInfo + Web Tech Course Information Technologies Info + Web Tech Course Anselm Spoerri PhD (MIT) Rutgers University
Lecture Fourteen Methodology - Conceptual Database Design
Summary. Chapter 9 – Triggers Integrity constraints Enforcing IC with different techniques –Keys –Foreign keys –Attribute-based constraints –Schema-based.
Web Data Management Dr. Daniel Deutch. Web Data The web has revolutionized our world Data is everywhere Constitutes a great potential But also a lot of.
Infomaster: An information Integration Tool O. M. Duschka and M. R. Genesereth Presentation by Cui Tao.
CSE 636 Data Integration Introduction. 2 Staff Instructor: Dr. Michalis Petropoulos Location: 210 Bell Hall Office Hours:
XML and Databases (CS 345b) Daniela Florescu Donald Kossmann
Methodology Conceptual Database Design
Module 2b: Modeling Information Objects and Relationships IMT530: Organization of Information Resources Winter, 2007 Michael Crandall.
Indexing XML Data Stored in a Relational Database VLDB`2004 Shankar Pal, Istvan Cseri, Gideon Schaller, Oliver Seeliger, Leo Giakoumakis, Vasili Vasili.
Database Systems Chapter 1 The Worlds of Database Systems.
Information Retrieval in Practice
XML, distributed databases, and OLAP/warehousing The semantic web and a lot more.
Semantic Interoperability Jérôme Euzenat INRIA & LIG France Natasha Noy Stanford University USA.
Welcome to CPSC 534B: Web Data Integration & Management Laks V.S. Lakshmanan Rm. CICSR Main Mall.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
NUITS: A Novel User Interface for Efficient Keyword Search over Databases The integration of DB and IR provides users with a wide range of high quality.
Cooperative Query Answering for Semistructured data Michael Barg Raymond K. Wong Reviewed by SwethaJack Christian (Absent) Chris.
Search Engines and Information Retrieval Chapter 1.
1 IDAR 2007 Emiran Curtmola A Platform for Efficient Full-Text SEARCH on the Web.
Methodology - Conceptual Database Design Transparencies
Methodology Conceptual Databases Design
1 Chapter 15 Methodology Conceptual Databases Design Transparencies Last Updated: April 2011 By M. Arief
Querying Structured Text in an XML Database By Xuemei Luo.
Efficient XSLT Processing in Relational Database System Zhen Hua Liu Anguel Novoselsky Oracle Corporation VLDB 2006.
Proposal for Term Project J. H. Wang Mar. 2, 2015.
1 Lessons from the TSIMMIS Project Yannis Papakonstantinou Department of Computer Science & Engineering University of California, San Diego.
Methodology - Conceptual Database Design. 2 Design Methodology u Structured approach that uses procedures, techniques, tools, and documentation aids to.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations: Other Operations Chapter 14 Ramakrishnan & Gehrke (Sections ; )
Methodology - Conceptual Database Design
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
About Me Hachim Haddouti, born in 1969, married, one son Ph.D. in Computer Science (Database Management Systems) at Technical University of Munich under.
Intro: 1 What is a Database? Collection of Dynamic Data –Large Large of yesteryear now fits on a PC (small DBs) Many applications require even more (terabytes,
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Introduction to the Course January.
XML and Database.
Integrating Structured & Unstructured Data. Goals  Identify some applications that have crucial requirement for integration of unstructured and structured.
Relational DBs Basics. Formally understood Set theoretic Originally defined with an algebra, with Selection, Projection, Join, and Union/Difference/Intersection.
CS 541 Lecture Slides Sunil Prabhakar CS541 Database Systems.
Answering Tree Pattern Queries Using Views Laks V.S. Lakshmanan, Hui (Wendy) Wang, and Zheng (Jessica) Zhao University of British Columbia Vancouver, BC.
1 Information Retrieval LECTURE 1 : Introduction.
SE305 Database System Technology 25/09/2014 Quiz-1.
Lecture 15: Query Optimization. Very Big Picture Usually, there are many possible query execution plans. The optimizer is trying to chose a good one.
1 Integrating Databases into the Semantic Web through an Ontology-based Framework Dejing Dou, Paea LePendu, Shiwoong Kim Computer and Information Science,
Advanced Database Course Syllabus 1 Advanced Database System Lecturer : H.Ben Othmen.
Proposal for Term Project
What is IR? In the 70’s and 80’s, much of the research focused on document retrieval In 90’s TREC reinforced the view that IR = document retrieval Document.
Tools for Memory: Database Management Systems
OrientX: an Integrated, Schema-Based Native XML Database System
1.1 The Evolution of Database Systems
Query Optimization.
Querying XML XSLT.
Presentation transcript:

Welcome to CPSC 534B: Information Integration Laks V.S. Lakshmanan Rm. 315

Course Objectives Most applications of information technology require effective and efficient management of information. Information may reside anywhere – not just in DBs. Information can be heterogeneous. Information of interest may not all be in one place.  Information Integration.  II enabler for a whole class of new applications.

Course Objectives (contd.) Key technologies: – RDBMS – Heterogeneous database systems – View integration and management – Semistructured data and XML (data on the web) Main goal: learn about key concepts, techniques, algorithms, languages, and abstractions that make II possible. And have some fun.

Tentative Schedule Basic Tools (GOFDB) Week of Jan. 5: Overview/review of FOL. Jan. 12: Review of Relational algebra, calculus, datalog, SQL, integrity constraints. Jan. 19: Query containment and equivalence. Conjunctive Negation & aggregation

Tentative Schedule Integration Take 1 – Global Info. Systems Jan. 26: Integration models – Global As View and Local As View query answering using views (an application) II Take 2 – Dealing with heterogeneity Feb. 2: SchemaLog and SchemaSQL. Feb. 9: Schema Integration & Matching. Feb. 16: Break!

Tentative Schedule (contd.) II Take 3 – Dropping (rigid) structure Feb. 23: Intro to Semistructured data and XML (data model) XPath & Tree Pattern Queries Mar. 1: XPath (contd.) XQuery. Mar. 8: XQuery (contd.) TAX algebra / structural Join algos Mar. 15: XML Storage Native Relational Mar. 22: XML + Information Retrieval

Tentative Schedule (contd.) II Take 4 – Semantic Web (The final frontier?) Mar. 29: Semantic Web and II Project Talks and demos: April 5 onward.

Marking Scheme Assignments 45% Project 55% – Reading papers – Critiquing them – Innovating – Implementing – Reporting and presenting Projects can involve teams of 2-3 people (subject to approval). Each team to include  1 MCS student.

Suggested Project Themes Ideas/suggestions offered throughout the course, so be attentive! Data cleaning: key step required in data integration. Mining DTD/schema for XML docs: what you do when you must deal with XML data with no accompanying DTD/schema. XML schema integration: different XML data sources may follow different DTD/schemas. How do you provide a unified integrated view to the user?

Project Themes (contd.) XML query containment/equivalence: given queries (in XQuery or XPath), can rewrite them into more efficient ones; possibly use DTDs or integrity constraints. XML query operator evaluation algorithms: develop cost models and cost-based physical optimization strategies. XML and data security: how do you ensure queries are evaluated securely? Do not divulge anything you are not supposed to.

Project Themes (contd.) XML and Information Retrieval: effective way of querying documents marked up using XML (e.g., Shakespear’s plays); how do you combine IR and database-style XML querying? Data integration issues for biology: scientific data tends to be heterogeneous. How to meet the data integration challenges there? Query Answering using Views for XML: Extend the QAV technology developed for RDBMS for XML querying.

Project Themes (contd.) Detecting similarity between XML documents: develop notions of similarity between XML docs and implement algorithm(s) for detecting similarity Ranking answers to keyword search queries over XML data: develop and implement algorithms for ranking answers, based on “quality” of match XML interop: leverage semantic web and ontologies for matching schemas (XML or relational) and develop/implement algorithms for answering cross-queries

Project Themes (contd.) Explore higher-order logics for tree (XML) querying: example candidates are HiLog and (extensions of) SchemaLog. [can be purely conceptual or part conceptual and part implementation.]