Presentation is loading. Please wait.

Presentation is loading. Please wait.

Let us build a platform for structure extraction and matching that.... Sunita Sarawagi IIT Bombay TexPoint fonts used.

Similar presentations


Presentation on theme: "Let us build a platform for structure extraction and matching that.... Sunita Sarawagi IIT Bombay TexPoint fonts used."— Presentation transcript:

1 Let us build a platform for structure extraction and matching that.... Sunita Sarawagi IIT Bombay http://www.cse.iitb.ac.in/~sunita TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A AA A A A

2 Knows when it failed Attaches every extraction module with a error detection logic Two types of errors Precision errors: easier to detect Reference databases Alternative models Human feedback Recall errors: much harder A research challenge Represents errors and exposes them to users Imprecise data models for results of extraction and deduplication  another research challenge

3 Seamlessly integrates rules, humans and statistics Existing systems partitioned on Rule-based Vs Statistical Manual Vs Learning-based Smooth co-existence of all combinations a must given varying difficulty of tasks and sophistication of users

4 Treats models as first class objects Tens and thousands of schema elements Cannot afford separate extraction and matching model for each How to share models across different levels of hierarchies, natural languages, formatting languages, versions along time. How quickly can we interactively adapt to new domains starting from existing libraries of models

5 Is selectively lazy Cannot run away from the hard tasks Only way to attack the long tail of missed extractions is via expensive resources Explicitly represent increasing levels of cost and payoffs and do cost-sensitive processing Selective linguistic processing: POS  Chunking  Dependency parsing  Full parsing Database lookups No lookups  Boolean matches  TF-IDF matches  Edit distance  Web seaches

6 Supports multi-spectrum queries Knowledge [Schema] should be like a pocket watch, surfaced only when needed; not like a wrist watch, always flaunted. - A Bengali saying. Fully schema-aware: SQL, XML,… Schema-less: Keyword queries Common-sense schema-aware User understands Is-a, Part-of, Properties Use world knowledge (ontologies, word-nets, etc) to map both schema and content elements in the query Can use limited rounds of user interaction


Download ppt "Let us build a platform for structure extraction and matching that.... Sunita Sarawagi IIT Bombay TexPoint fonts used."

Similar presentations


Ads by Google