Presentation is loading. Please wait.

Presentation is loading. Please wait.

Managing Semi-Structured Data. Is the web a database?

Similar presentations


Presentation on theme: "Managing Semi-Structured Data. Is the web a database?"— Presentation transcript:

1 Managing Semi-Structured Data

2 Is the web a database?

3 Rules—What Rules? Easy to create web informationEasy to create web information Cannot all be stored in relational databasesCannot all be stored in relational databases Cannot be queried in traditional waysCannot be queried in traditional ways “The web changed the digital information rules.”

4 Semi-structured Data Fully structured dataFully structured data –Databases –Hidden web Fully unstructured data—ordinary textFully unstructured data—ordinary text Semi-structured data—the grey area in betweenSemi-structured data—the grey area in between –No “good solutions;” no good “software, tools, or methodologies to manipulate [semi-structured data]” –“[Researchers] don’t even agree on the shape of the problem—much less, good approaches to solving it.”

5 Nature of the Problem Information embedded in textInformation embedded in text –Keyword search insufficient to answer queries –Natural language processing also insufficient Lack of agreement of vocabularies and schemasLack of agreement of vocabularies and schemas –“Reaching schema agreements among different communities is one of the most expensive steps in software design.” –“We need to be able to process information without requiring … a priori schema and vocabulary agreements among participants.”

6 Example: eBay “Impossible for … developers to define an a priori schema for the information.”“Impossible for … developers to define an a priori schema for the information.” “Information stored in raw text and searched using only keywords, significantly limiting its usability.”“Information stored in raw text and searched using only keywords, significantly limiting its usability.” “Some standard entities (e.g., buyer, date, ask, bid …), but the meat of the information—the item descriptions—has a rich and evolving structure that isn’t captured.”“Some standard entities (e.g., buyer, date, ask, bid …), but the meat of the information—the item descriptions—has a rich and evolving structure that isn’t captured.”

7 Why Schemas? “Schemas assign meaning to the data and … allow automatic data search, comparison, and processing.”“Schemas assign meaning to the data and … allow automatic data search, comparison, and processing.” Hierarchy of meaningHierarchy of meaning –Raw text: strings (values) –Data: attribute-value pairs –Information: data in a conceptual framework –Knowledge: information with a degree of certainty or community agreement –Meaning: knowledge that is relevant or activates “We have to learn to use and exploit schemas as helpers, but not rely on their existence or allow them to be constraining factors.”“We have to learn to use and exploit schemas as helpers, but not rely on their existence or allow them to be constraining factors.”

8 Schema-Agnostic Tools Information retrieval (sophisticated search engines?)Information retrieval (sophisticated search engines?) –Find (maybe?) but not answer –No DB-like query logic, updates, transactions XMLXML –XML data can exist w/wo schemas; schemas can be defined before or after –Mixed text/data content –Languages for query (XQuery) and transformation (XSLT) OWL & RDFOWL & RDF –RDF: subject-predicate-object triples –OWL: ontological descriptions usually over RDF triples –Classification & inferencing –Semantic annotation and tagging Possible Places to Start

9 Are We Stuck? Better information-authoring tools (annotation assistance)Better information-authoring tools (annotation assistance) Information extraction (automatic annotation)Information extraction (automatic annotation) Creation and reuse of standard schemas and vocabularies (ontology generation)Creation and reuse of standard schemas and vocabularies (ontology generation) Mapping schemas to each other (schema mapping)Mapping schemas to each other (schema mapping) Automatic data linking (data linking & merging)Automatic data linking (data linking & merging) Automatic processing of semi-structured data (free-form queries)Automatic processing of semi-structured data (free-form queries) What’s Next? – Florescu (Embley)

10 Dataspace System Supports data and applications in a wide variety of formats all within a dataspace.Supports data and applications in a wide variety of formats all within a dataspace. Offers an integrated means of searching, querying, updating, and administering the dataspace.Offers an integrated means of searching, querying, updating, and administering the dataspace. Has varying levels of service (e.g. “best-effort” or approximate answers)Has varying levels of service (e.g. “best-effort” or approximate answers) Includes tools to create tighter integration of the data, as necessary.Includes tools to create tighter integration of the data, as necessary. What’s beyond a database system? – Franklin, Halevy, Maier

11 “We are still at day one.” “We need to find a compromise to the tension between the advantages of having schemas, in terms of better understanding and automatically processing the data, and disadvantages imposed by schemas, in terms of inflexibility and lack of evolution.” – Florescu


Download ppt "Managing Semi-Structured Data. Is the web a database?"

Similar presentations


Ads by Google