Presentation is loading. Please wait.

Presentation is loading. Please wait.

UPPSALA DATABASE LABORATORY Managing Scientific Queries over Distributed Data in a Grid Environment Ruslan Fomkin.

Similar presentations


Presentation on theme: "UPPSALA DATABASE LABORATORY Managing Scientific Queries over Distributed Data in a Grid Environment Ruslan Fomkin."— Presentation transcript:

1 UPPSALA DATABASE LABORATORY Managing Scientific Queries over Distributed Data in a Grid Environment Ruslan Fomkin

2 UU- IT - UDBLRuslan Fomkin January 20, 2006NGN workshop Uppsala 2 Uppsala DataBase Laboratory (UDBL)  Supervisor prof. T. Risch  Database research How to make extensible middleware query processing allowing scalable and application oriented search to different kinds of wrapped information sources  http://www.it.uu.se/research/group/udbl/

3 UU- IT - UDBLRuslan Fomkin January 20, 2006NGN workshop Uppsala 3 AMOS II Virtual Mediator Database Simulation VisualizationAnalysis Patient Monitoring GRID hist. Measurments Relational Databases Plug-ins Wrappers Queries and views Queries Data sources Applications Continuous Queries

4 UU- IT - UDBLRuslan Fomkin January 20, 2006NGN workshop Uppsala 4 Ongoing Research at UDBL Stream Queries on BlueGene Erik Zeitler, MSc FEM Databases Kjell Orsborn, PhD Mediating Web Services Manivasakan Sabesan, BSc Semantic Web Queries to Hidden Web Johan Petrini, MSc Stream Data Manager Milena Ivanova, PhD UDBL Expensive GRID Queries Ruslan Fomkin, MSc

5 UU- IT - UDBLRuslan Fomkin January 20, 2006NGN workshop Uppsala 5 Outline  Introduction  The project  Test application  Developed framework  Conclusion  Future work

6 UU- IT - UDBLRuslan Fomkin January 20, 2006NGN workshop Uppsala 6 Scientific Applications, Grid and Databases  A lot of scientific data Complex structure Stored in files distributed in Grid  Scientific analyses can be represented as declarative queries Complex queries with numerical computations Long running or batch queries  Utilization of computational resources of Grid

7 UU- IT - UDBLRuslan Fomkin January 20, 2006NGN workshop Uppsala 7 Parallel Object Query System for Expensive Computations (POQSEC)  Query processor for scientific applications high-level interface to specify the analyses automatically generates execution plans and evaluates them  Requirements Scalable, efficient, flexible, transparent  Properties Distributed and parallel

8 UU- IT - UDBLRuslan Fomkin January 20, 2006NGN workshop Uppsala 8 Layered Architecture of the System  POQSEC provides scientific query management  Grid provides computation management file management NorduGrid Middleware  Application area provides computational libraries data management libraries ROOT library POQSEC Application libraries Grid DataClusters User ROOTNorduGrid

9 UU- IT - UDBLRuslan Fomkin January 20, 2006NGN workshop Uppsala 9 Our Test Application  From Particle Physics  Analysis of collision events for presence of Higgs particles  Data produced by ATLAS simulation software stored in files distributed in the Grid (e.g. NorduGrid) managed by ROOT library

10 UU- IT - UDBLRuslan Fomkin January 20, 2006NGN workshop Uppsala 10 Object-Relational Schema of the Application Data EventParticle Lepton MuonElectronJet particles 1 n PxMissPyMiss PxPyPz Kf Ee inheritance relationship

11 UU- IT - UDBLRuslan Fomkin January 20, 2006NGN workshop Uppsala 11 General Query of the Analysis  Selection of those events that satisfy predicates containing numerical operations SELECT ev FROM Event ev WHERE jetvetocut(ev) AND zvetocut(ev) AND topcut(ev) AND misseecuts(ev) AND leptoncuts(ev)AND threeleptoncut(ev);  Each predicate called cut in application area  Predicates are defined as queries

12 UU- IT - UDBLRuslan Fomkin January 20, 2006NGN workshop Uppsala 12 Example of a predicate: Z-veto cut  Either event does not have a pair of opposite charged leptons  or invariant mass of the pair is not close to the mass of a Z particle CREATE FUNCTION zvetocut(Event ev)-> Event AS SELECT ev WHERE NOTANY(oppositeLeptons(ev)) OR abs(invMass(oppositeLeptons(ev)) - zMass) >= minZMass; CREATE FUNCTION oppositeLeptons (Event ev) -> bag of AS SELECT l1, l2 FROM Lepton l1, Lepton l2 WHERE l1 = particles(ev) AND l2 = particles(ev) AND Kf(l1) = -Kf(l2);

13 UU- IT - UDBLRuslan Fomkin January 20, 2006NGN workshop Uppsala 13 Current Framework  Basic tool for utilizing NorduGrid through Advanced Resource Connector (ARC)  Submission mechanism submit query parallelize query to several subqueries generate job scripts (one per subquery)  Babysitter functionality  Data exchange mechanism through files

14 UU- IT - UDBLRuslan Fomkin January 20, 2006NGN workshop Uppsala 14 Client and Coordinator Part POQSEC client  personal database with application schema  ROOT wrapper Coordinator server  receives queries  creates jobs Grid Meta-Database  computational resources  data files Babysitter Coordinator server Grid Meta- Database Submission Database Job queue Query Coordinator Local Storage ARC Client Grid Client Node POQSEC Client Submission Database  received submissions  created jobs Babysitter  interactions with ARC

15 UU- IT - UDBLRuslan Fomkin January 20, 2006NGN workshop Uppsala 15 Query Submission Query submission  query  file name selection  degree of parallelism  CPU time for each job  Submission and its jobs saved in Submission Database  Created jobs added to Job queue  Script files saved to Local Storage Babysitter Coordinator server Grid Meta- Database Submission Database Job queue Query Coordinator Local Storage ARC Client Grid Client Node POQSEC Client Coordinator server creates jobs  same query  partitions of data with equal size  same CPU time provided by user  corresponding job script files

16 UU- IT - UDBLRuslan Fomkin January 20, 2006NGN workshop Uppsala 16 Jobs Submission Babysitter Coordinator server Grid Meta- Database Submission Database Job queue Query Coordinator Local Storage ARC Client Grid Client Node POQSEC Client Babysitter  Takes jobs from Job queue  Submits each job to ARC client  Change status of submitted jobs in Submission DB ARC Grid Manager CE ARC Grid Manager CE ARC client  finds Computing Element  submits job to corresponding ARC Grid manager

17 UU- IT - UDBLRuslan Fomkin January 20, 2006NGN workshop Uppsala 17 Job Execution ARC Grid Manager  downloads input files  submits job to Local Batch System After some delay LBS starts Executor on allocated a CE node Executor during execution  execute given subquery  accesses data through ROOT wrapper  saves result to files on CE Storage CE Storage Executor wrapper CE node ARC Grid Manager SE LBSQueue

18 UU- IT - UDBLRuslan Fomkin January 20, 2006NGN workshop Uppsala 18 Downloading Result Babysitter Coordinator server Grid Meta- Database Submission Database Job queue Query Coordinator Local Storage ARC Client Grid Client Node POQSEC Client ARC Grid Manager CE Storage ARC Grid Manager CE Storage Babysitter  polls ARC client for jobs statuses  requests to download results for finished jobs Results downloaded to Local Storage User can retrieve result when all jobs are ready

19 UU- IT - UDBLRuslan Fomkin January 20, 2006NGN workshop Uppsala 19 Conclusion  We provide declarative query interface for representation scientific queries parallel query execution in Grid (generating scripts) babysitter to keep track of job execution  Query parallelization is important Standalone desktopGrid, one jobGrid, four jobs Response time190 min225 min24 min Requested CPU time-200 min20 min

20 UU- IT - UDBLRuslan Fomkin January 20, 2006NGN workshop Uppsala 20 Future work  Estimation time of executing query  Dealing with underestimation of execution time  Automatic making decision on degree of parallelism and resource brokering adaptive based on current load and job statistics  Dealing with failures in Grid  POOL wrapper

21 UU- IT - UDBLRuslan Fomkin January 20, 2006NGN workshop Uppsala 21 Thank you for attention! Your questions?


Download ppt "UPPSALA DATABASE LABORATORY Managing Scientific Queries over Distributed Data in a Grid Environment Ruslan Fomkin."

Similar presentations


Ads by Google