Presentation is loading. Please wait.

Presentation is loading. Please wait.

Laboratoire LIP6 The Gedeon Project: Data, Metadata and Databases Yves DENNEULIN LIG laboratory, Grenoble ACI MD.

Similar presentations


Presentation on theme: "Laboratoire LIP6 The Gedeon Project: Data, Metadata and Databases Yves DENNEULIN LIG laboratory, Grenoble ACI MD."— Presentation transcript:

1 Laboratoire LIP6 The Gedeon Project: Data, Metadata and Databases Yves DENNEULIN LIG laboratory, Grenoble ACI MD

2 Context and goals ● Heterogeneous metadata management on grids  Clusters of clusters ● High-level queries using metadata ● Easy and flexible deployment and configuration ● Minimal overhead ● Various interfaces ● Initial target application domains  Biocomputing (lots of metadata, few data)  Microscopic imaging (lots of data data, few metadata)

3 The Gedeon middleware  Metadata management on lightweight grids ● Records of (attribute,value) pairs stored in files  Flexible requests ● Can be combined through scripting  Various interfaces ● Command line (tools) ● Libraries ● Virtual FS (legacy applications support)  Deployment “à la carte” ● Composition of various data sources  Performances ● Dedicated I/O library ● Semantic caching

4 Outline 1.General architecture a.Gedeon internal structure b.Composition of various data sources 2.Practical use 3.« dual » cache Conclusion

5 Example of a deployment Query Interface (API, FS, GUI,...) Local proxy Interconnect middleware Local proxy Interconnect Client Servers « close » to the client Storage sites cache

6 Gedeon components ● Gedeon Kernel  fuple ● I/O Library ● Evaluate the queries  lowerG ● Operators to compose bases ● Remote access ● Interface  API lowerG  Virtual FS ● Cache application vSGF lowerG fuple network cache fuple network lowerG Local proxy

7 What inside the sources? ● Records of pairs attribute/value Id classifA classifB 457 Bacteria Clostridia taille26 ref Record

8 Example of composition of sources client + J Metadata can be local or copies site S1 site S2 site S3 RR

9 ... Union enreg. A1 enreg. A2 enreg. A3 enreg. A4 + enreg. B1 enreg. B2 enreg. B3 enreg. B4... enreg. A1 enreg. A2 enreg. A3 enreg. A4 enreg. B1 enreg. B2 enreg. B3 enreg. B4 Unify storage space + Parallel evaluation

10 Round Robin RR Fault Tolerance client Source 1 Source 2

11 Round Robin RR Load Balancing client Source 1 Source 2 client

12 ... Join operator Id A1 A2 457 v1 v2 A3v3 Id A1 A2 458 v4 v5 A3v6 J Id... Id An 457 vAn1 Id An 458 vAn2... Id A1 A2 457 v1 v2 A3v3 Id A1 A2 458 v4 v5 A3v6 AnvAn1 AnvAn2 Enrich a source with another

13 Outline 1.General architecture a.Gedeon internal structure b.Composition of various data sources 2.Practical use 3.« dual » cache Conclusion

14 Tools 1/2 ● Libraries ● CLI ● Operations  sort  projection  select  index ...

15 Tools 2/2 sort(attr='taille') ● Examples  sort $> cat mesmeta.g | fsort 'taille' > trie_taille.g  index create_idx(attr='Id').Id.idx search_idx('Id', 'P0123')

16 Language for the requests ● Simple ($, type control with the operators) ● Regular expressions ● Of the second order

17 Select expression Id classifB 459 Bacteria taille47 Id classifA 460 Fermicutes Select $Id>459 Id classifA 460 Fermicutes Id classifA classifB 457 Bacteria Clostridia taille26

18 Select using regexp Id classifA classifB Id classifB 457 Bacteria Clostridia 459 Bacteria taille26 taille47 Id classifA 460 Fermicutes Select $classifB==/.*a$/ Id classifA classifB 457 Bacteria Clostridia taille26 Id classifB 459 Bacteria taille47

19 Select using 2nd order logic Id classifA classifB Id classifB 457 Bacteria Clostridia 459 Bacteria taille26 taille47 Id classifA 460 Fermicutes Select $/classif[AB]/==Bacteria && $taille>=36 Id classifB 459 Bacteria taille47

20 Virtual FS interface ● Just a specific file-oriented interface ● Data and metadata can be anywhere in the grid ● Definition of logical directories  Ex: cd '$classifB==|.*a$|'  « and » between directories  1 filename =value of a metadata: logical view /fs_virt/$classifB==|.*a$|> ls 457 459 /fs_virt/$classifB==|.*a$|> cat *>/tmp/mater /fs_virt/$classifB==|.*a$|>

21 Outline 1.General architecture a.Gedeon internal structure b.Composition of various data sources 2.Practical use 3.« dual » cache Conclusion

22 Dual cache (1) ● 2 cooperative caches  cache of requests (R, {id,...}) -> save computing power  cache of data (id, {attr,...}) -> save bandwidth ● Semantic cache  Can evaluate a query using the data in the cache  Can generate a remainder to complement the data cached

23 Example ● Refinement of a request 1)'$OC==/Eukaryota/' -> (R, Lid={id1,id2,...}) 2)'$OC==/Eukaryota/ && $year>=1998' Select(*Lid, '$year>=1998')

24 Dual cache (2) ● Distributed semantic cache  Typically used inside communities ● Lots of common requests  No location constraints ● Members of the community can be geographically scattered ● Distributed data cache  Minimize time and data transfer  Cooperation between close, from a topological point of view, sites

25 Dual cache (3) Grenoble Servers Rennes Dual cache Query cache Object cache Semantic locality Community Eukaryota Community Archaea Geographic locality

26 Dual cache (4) ● Work in progress on the notion of distance  Find geographical proximity  Find common interests between communities ● Create hybrid communities based on their requests ● Could be used to change the cache parameters  Manual and/or automatic

27 Conclusion ● A data integration middleware  Handling of metadata ● Distributed and modular  Deployment can be done according to architectural/organisational constraints ● Definition of a dual cache infrastructure  Reflect both organisational use ● Prototype in use  Packaging and documentation needed

28 Questions?


Download ppt "Laboratoire LIP6 The Gedeon Project: Data, Metadata and Databases Yves DENNEULIN LIG laboratory, Grenoble ACI MD."

Similar presentations


Ads by Google