Presentation is loading. Please wait.

Presentation is loading. Please wait.

Realising Virtual Research Environments by Hybrid Data Infrastructures: the D4Science Experience Andrea Manzi (CERN) Leonardo Candela, Donatella Castelli,

Similar presentations


Presentation on theme: "Realising Virtual Research Environments by Hybrid Data Infrastructures: the D4Science Experience Andrea Manzi (CERN) Leonardo Candela, Donatella Castelli,"— Presentation transcript:

1 Realising Virtual Research Environments by Hybrid Data Infrastructures: the D4Science Experience Andrea Manzi (CERN) Leonardo Candela, Donatella Castelli, Pasquale Pagano (ISTI- CNR) ISGC 2014 Taipei, 25 March 2014

2 Outline The D4Science InfrastructureThe Supporting ProjectsThe Infrastructure constituentsThe Virtual Research EnvironmentsVREs Examples 2

3 Geographically Distributed Computing Infrastructure Across administrative boundaries Across private and commercial providers Service Allocations, Deployment, Monitoring, and Operation Uniform resource and data access 3 The D4Science Infrastructure Production level infra deployed and maintained during D4Science (2007) and D4Science II (2009) projects

4 Hybrid Data Infrastructure An HDI is an IT Infrastructure where research resources (HW, SW, Data) can be shared and exploited on-demand built on existing systems, infrastructures and repositories supporting an innovative application-delivery-model computing, storage, data and software are made available as-a-Service Hybrid Data Infrastructure Application #1 Application #2 Application #N Infrastructure /system A Infrastructure /system B Infrastructure /system Z … … data server service apps 4

5 Supporting two models of provision For end-users – A GUI-centric approach focusing on visual interfaces for accessing Data Infrastructure facilities via a Web Browser For service providers – An API-Centric approach focusing on comprehensive set specifications and methods for accessing HDI facilities in a programmatic way 5

6 Operate a large-scale HDI supporting the Ecosystem Approach to Fishery and Conservation of Marine Living Resources Exploit D4Science infrastructure and its interface with existing grid (EGI) and cloud (MSAzure via VENUS-C platform) infrastructures and data sources (via SDMX, TAPIR, DiGIR, …) Manage the entire data lifecycle where data can be from any domain: from species observations to socio-statistical data, documents and environmental monitoring data Support ecological niche modelling, temporal and spatial data harmonization, statistical data analysis with R, data mining. Serve statisticians, fishery biologists, marine ecologists, economists, lawyers and enforcement bodies (customs, coast guards), conservationists 6

7 Operate a large-scale HDI serving the Biodiversity Science in Europe and Brazil Exploit D4Science infrastructure and its interface with existing cloud (MSAzure and COMPSs via VENUS-C platform) infrastructures and data sources (via SDMX, TAPIR, DiGIR, …) Provide open access to existing grid & cloud resources and software platforms across continents Combine the Biodiversity Science and the Open Access Movement Integrate Regional & Global Taxonomies Support biodiversity scientists willing to build, test and project models of species distribution 7

8 The D4Science infra is powered by gCube Enabling Technology https://www.ohloh.net/p/gCube 8

9 Software Platform 9 Enabling Layer Information System Resource management Workflow Engine

10 10 Registration Discovery Notification Monitoring Inspection Assignment Accounting A scalable and reliable framework – supporting an extensible notion of resource ( HW, Data, services) – open to modular extensions at runtime by arbitrary third parties Enabling Layer: Information System

11 11 A distributed framework managing a trusted resource network Dynamic Deployment remote deployment of resources across the infrastructure Resource lifetime management running of the lifetime of resources ranging from creation and publication to discovery, access and consumption Virtual Research Environment Management Cost effective creation, operation and maintenance of Virtual Research Environments Interoperability, openness and integration at software level third-parties software can be added to the Data e-Infrastructure at runtime - Web Applications (Running in Tomcat); Web Services (Running in service containers, e.g. JAX-WS, Axis); Executable (e.g. pojo, shell script, …) Enabling Layer: Resource Management

12 Based on adaptors for the execution on internal or external resources: – JDLAdaptor - parses a Job Description Language (JDL) definition block and translates the described job or DAG of jobs into an Execution Plan which can be submitted to the ExecutionEngine (gCube) for execution. – GridAdaptor - constructs an Execution Plan that can contact an EMI UI node, submit, monitor and retrieve the output of a grid job. – CondorAdaptor - constructs an Execution Plan that can contact a Condor gateway node, submit, monitor and retrieve the output of a condor job. – HadoopAdaptor - constructs an Execution Plan that can contact a Hadoop UI node, submit, monitor and retrieve the output of a Map Reduce job. 12 Enabling Layer: Workflow Engine

13 Infrastructure Constituents: Technologies The D4Science infrastructure hosts a set of components on top of different technologies to make available a large variety of services for managing, manipulating and processing data and metadata within an autonomously- managed infrastructure: – MS Azure – EGI – VENUS-C COMPSS PMES – u.store – openModeller – MongoDB, Cassandra, Hadoop, – GeoNetwork – ElasticSearch –.. 13

14 Infrastructure Constituents: Services and Data The D4Science infra leverages existing data sources ranging from species data (species names, synonyms, taxonomical classifications, spatial occurrences ) to literature, images – OBIS, http://www.iobis.org/http://www.iobis.org/ – MyOcean, http://www.myocean.eu/http://www.myocean.eu/ – Catalogue of Life, http://www.catalogueoflife.org/http://www.catalogueoflife.org/ – FishBase, http://www.fishbase.org/http://www.fishbase.org/ – speciesLink, http://splink.cria.org.br/http://splink.cria.org.br/ – Biodiversity Heritage Library, http://www.biodiversitylibrary.org/http://www.biodiversitylibrary.org/ – Bioline International, http://www.bioline.org.br/http://www.bioline.org.br/ – Global Biodiversity Information Facility (GBIF), http://www.gbif.org/http://www.gbif.org/ Catalogue of Life 14

15 Virtual Research Environment (VRE) is a distributed and dynamically created environment where subset of data, services, computational, and storage resources regulated by tailored policies are assigned to a subset of users via interfaces for a limited timeframe L. Candela, D. Castelli, P. Pagano (2013) Virtual Research Environments: An Overview and a Research Agenda. Data Science Journal, Vol. 12 Virtual Research Environment (VRE) 15

16 Virtual Research Environment (VRE) Cost-effective creation and management Definition Creation Configuration 16

17 The Social Extension: Workspace It is a virtual drive in which you can upload and download the files needed for the services and the results Files can be organized into folders (sharing) Support for public URIs Support for WebDAV 17

18 The Social Extension: Messages and Notifications Emails in the Cloud Customizable Alerts 18

19 The Social Extension: News Feed Share News User-shared News Application-shared News 19

20 Outline VREs Examples 20

21 ICIS VRE - Tabular Data Analysis 21 Import CodeLists Validate Datasets Analyse And Project

22 Presence Points (FishBase + Obis) Density Based Clustering DBSCAN Other methods are also available … K-Means X-Means ScalableDataMining VRE - Features Clustering 22

23 AquaMaps VRE - Ecological Modeling access to external databases extensible with predictive algorithms exploit several computational back-end use several storage technologies (RDBMS, Column Store, Blob) publish distribution to Geospatial Web services 23

24 SpeciesLab VRE - Cross-Mapper Detecting and reporting differences between species checklists 24

25 MarineSearch VRE - Information Retrieval Entity Enrichment Semantic post-processing 25 Search over several OAI-PMH repositories

26 Summary The D4Science Infrastructure implementing the HDI approach enables heterogeneous resource sharing between cross- domain infrastructures Collects under a common environment resources coming from several e- infrastructures Successfully hosts Virtual Research Environments for members of different user communities Sustainability plan is under development for future EU funding and/or exploitation of public-private partnership 26

27 Landscape D4Science e-Infrastructure gCube Framework gCube Apps Discussion Thanks for your attention www.d4science.org i-marine.d4science.org eubrazilopenbio.d4science.org www.i-marine.eu www.eubrazilopenbio.eu 27 Questions?


Download ppt "Realising Virtual Research Environments by Hybrid Data Infrastructures: the D4Science Experience Andrea Manzi (CERN) Leonardo Candela, Donatella Castelli,"

Similar presentations


Ads by Google