Presentation on theme: "Technology and Infrastructure Support for Large Scale Information Marcio Faerman The Brazilian National Education and Research Network - RNP"— Presentation transcript:
Technology and Infrastructure Support for Large Scale Information Marcio Faerman The Brazilian National Education and Research Network - RNP firstname.lastname@example.org www.rnp.br
Generating Large Data Collections Large Data Volumes can be generated much faster than they can be analyzed –Instrument Observations Particle Accelerators (Cern LHC) Telescopes, Satellites Sensor Networks Virtual Observatories –Large Model Simulations High resolution, Very complex Scientific Experiments –medical imaging (fMRI): ~ 1 GByte per measurement (day) –Bio-informatics queries:500 GByte per database –Satellite world imagery: ~ 5 TByte/year –Current particle physics: 1 PByte per year –LHC physics (2007): 10-30 PByte per year –LSST Astronomy (2012): 5 PBytes per year
Challenges Managing Large Volume Data Scalability –What works for small datasets does not necessarily work for large collections Data Integrity –At a terabyte scale failures and data corruption are very likely to occur –Is data provenance reliable? Efficiency –Data should be accessed at a rate which keeps work feasible –More data – need for more speed Distributed Access –Data can be at remote (and possibly unknown) location Infrastructure Management –Heterogeneous –Distributed –Prone to failures –Very Complex
Challenges – Getting to Know your Data Extract knowledge from raw data files –Data product derivation Vizualization Relationships Patterns New derived quantities –Cross institutional and cross disciplinary collaborations What if experiments –Your data with our model? Dataset Access –Multiple formats Each sensor, simulation has its own storage format –Federated collections –Discovery by content
Technological Response Integration of compute, communication, storage and instrument resources into a powerful infrastructure – Information Grids –Very powerful infrastructure –Economy of scale Serves broad range of customers –biologists, pysicists, government, industry Infrastructure is heterogeneous, distributed, very complex Middleware and Data Oriented tools act as facilitators to tackle data management complexities
Open Access and Preservation Functionalities Federated Digital Libraries –Integration of distributed repositories –Access control – can decide who can see it –Organize the data in collections –Describe your data – Metadata Data Grids –Access to efficient parallel I/O systems –Hierarchical Systems Disk caches, tapes Often Distributed –Analysis, Data Mining –Visualization –Workflow based systems –Transaction based data ingestion Data provenance, Data fingerprinting –What if virtual lab End User Oriented Portals –"I deal with the data in the way it makes sense to me"
Middlewares and Tools Data Management –Storage Resource Broker (SRB) –Globus Data Management –L-Store –IBP –Storage Resource Manager (SRM) Data Representation Libraries –HDF5 –NetCDF Portals –OGCE –JSR 168
Today’s Reality Exceptional achievements by early adopters Integration between domain scientists – data users and producers still a challenge –Need much more cross-disciplinary interaction Emphasis on scale and performance Failures are still a taboo –Frustration factor should be addressed in partnership with users –Focus on failure recovery and quality of service getting more attention
e-Infrastructure Workshop, NUDI/USP, São Paulo, 07.05.20079 Grid Initiatives around the World
Networking in Latin America RNP-BR REUNA-CL CUDI-MX RAAP-PE REACCIUN-VE
12 Brazilian National Research And Education Network - RNP In November 2005 the RNP networking infrastructure was entirely renovated. It consists of A multigigabit core connecting 10 capitals at 2.5 and 10 Gbps Connections at 34 Mbps to 11 capitals Connections up to 16 Mbps to 6 capitals
Infra-estrutura para e-Ciência13 Communitary Metropolitan Networks It is not enough to bring high speed connectivity to each city – it is necessary bring it to the university campus / research lab as well. The metropolitan network is the solution –Infrastructure sharing to support: Campi interconnection of each partner institution Access to RNP national network backbone –This sharing substantially reduces deployment costs –Preferably, the infrastructure will be owned by the partners themselves (reducing operating costs) Pilot: The Metrobel project in the city of Belém do Pará in the Amazon region
Infra-estrutura para e-Ciência15 Redecomep Project(2005-7) Following Metrobel, Brazilian Ministry of Science and Technology is supporting the Communitary Networks for Education and Research (Redecomep) Project, with a R$ 39,7 M (~ U$ 19,0 M) through Finep (dec/2004) Goals: –Extend the metropolitan optical network to other 26 cities with RNP points of presence –Promote integration in metropolitan area –High speed access to RNP point of presence
Next steps Integration between network, data repositories, compute, storage resources and applications –Identify who needs better connectivity –Developing Brazilian cyberinfrastructure –Generally uncoordinated funding for infrastructure resources –Need broad vision at funding agencies and partners level of application requirements and cyberinfrastructure integration RNP articulating with scientific communities and infrastructure providers e-Science/Infrastructure initiative in Brazil
Developing Together Information infrastructure is being redefined in Brazil and Latin America Now is the time to have as much cross-disciplinary interaction as possible to define needs, partnerships and investments Please contact us THANK YOU!