Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS561- Advanced topics in database systems

Similar presentations


Presentation on theme: "CS561- Advanced topics in database systems"— Presentation transcript:

1 CS561- Advanced topics in database systems
CS561-Spring 2012 WPI, Mohamed eltabakh Suggested Projects

2 Project Teams of 2 (or 3) Several candidate projects to select from (or come up with new ideas) Platform to work on: PostgreSQL, or Hadoop Work closely with instructor for continues feedback and directions Study and comparison between different techniques or exploring new ideas

3 project i: Conflict Management in Scientific Databases
In scientific applications, e.g., in biology, chemistry, or physics Many groups and labs are working on the objects They may have conflicting data No single correct version of the data Currently detecting, resolving, and managing these conflicts is done manually Very expensive manual effort, consuming manpower, not accurate Goal: Automate the conflict management process in database systems Challenges: Mechanisms to specify the comparable objects, e.g., what tuples?? What field(s) to compare against each other, how to compare (which function), how to measure the degree of conflict Ontology for the comparison Which conflicting data should be resolved first Frequently queried objects, priority mechanism, etc.

4 project Ii: Keyword Search over Annotations
Annotations are extra data attached to the database items (table cells, rows, columns, etc.) They represent comments, feedback, provenance information Annotation management is a new hot topic in database systems How to annotate the data How to efficiently store and index the annotations How to query the annotations B1: Curated by user admin B5: This gene has an unknown function B4: pseudogene GID GName GSequence JW0080 mraW ATGATGGAAAA… JW0041 fixB ATGAACACGTT… JW0037 caiB ATGGATCATCT… JW0055 yabP ATGAAAGTATC… B3: obtained from GenoBase B2: possibly split by frameshift Gene

5 project Ii: Keyword Search over Annotations
Goal: Applying keyword searching on annotations has no been addressed yet Challenges: Indexing annotations for keyword searching Interrelated annotations Querying annotations and the base data ….

6 project IiI: Annotation Management in the Presence of Materialized Views
Materialized views basically copy transformed pieces of data (mostly for performance issues) Incremental maintenance of MVs is a rich research area If the base data are annotated, what should we do with the annotations??? Goal: Supporting annotations in the presence of MVs Challenges: Should the MV carry the annotations? If so, how to incrementally maintain the annotations up-to-date? Should the MV not carry the annotations? If so, then at query time, how to find the annotations efficiently? If the MV is annotated, how to automatically propagate the annotations back to the base tables??

7 project IV: Similarity-Based Processing in Hadoop
Hadoop infrastructure assumes exact matching for joins, and aggregations Between Map and Reduce phases In many applications, similarity-based processing is essential, e.g., joins and aggregations based on similarity not exact match Recent two or three papers about similarity-based processing in Hadoop Goal: study these recent work and compare between the different techniques. Hopefully new ideas will arise

8 Any other ideas By next Thursday (Jan. 26) groups should be formed and the project is selected


Download ppt "CS561- Advanced topics in database systems"

Similar presentations


Ads by Google