CERN/IT/DB A Strawman Model for using Oracle for LHC Physics Data Jamie Shiers, IT-DB, CERN.

Slides:



Advertisements
Similar presentations
NAS vs. SAN 10/2010 Palestinian Land Authority IT Department By Nahreen Ameen 1.
Advertisements

Daniel Schall, Volker Höfner, Prof. Dr. Theo Härder TU Kaiserslautern.
RDS and Oracle 10g RAC Update Paul Tsien, Oracle.
Oracle Data Guard Ensuring Disaster Recovery for Enterprise Data
Oracle Clustering and Replication Technologies CCR Workshop - Otranto Barbara Martelli Gianluca Peco.
Merrill Holt Director Parallel Server Product Management Oracle Corporation.
CERN/IT/DB Object / Object-Relational Database Management Systems Brief History & Outlook Jamie Shiers, IT-DB, CERN.
Smart Storage and Linux An EMC Perspective Ric Wheeler
Objectivity Data Migration Marcin Nowak, CERN Database Group, CHEP 2003 March , La Jolla, California.
© 2011 Citrusleaf. All rights reserved.1 A Real-Time NoSQL DB That Preserves ACID Citrusleaf Srini V. Srinivasan Brian Bulkowski VLDB, 09/01/11.
DISTRIBUTED DATABASE. Centralized & Distributed Database  Single site database – centralized database –A database is located at a single site or distributed.
Introduction to the new mainframe: Large-Scale Commercial Computing © Copyright IBM Corp., All rights reserved. Chapter 1: The new mainframe.
Data Storage Willis Kim 14 May Types of storages Direct Attached Storage – storage hardware that connects to a single server Direct Attached Storage.
BNL Oracle database services status and future plans Carlos Fernando Gamboa RACF Facility Brookhaven National Laboratory, US Distributed Database Operations.
© 2009 Oracle Corporation. S : Slash Storage Costs with Oracle Automatic Storage Management Ara Vagharshakian ASM Product Manager – Oracle Product.
High Availability & Oracle RAC 18 Aug 2005 John Sheaffer Platform Solution Specialist
Database Design and Introduction to SQL
CERN/IT/DB Multi-PB Distributed Databases Jamie Shiers IT Division, DB Group, CERN, Geneva, Switzerland February 2001.
SANPoint Foundation Suite HA Robert Soderbery Sr. Director, Product Management VERITAS Software Corporation.
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
1 Distributed and Parallel Databases. 2 Distributed Databases Distributed Systems goal: –to offer local DB autonomy at geographically distributed locations.
Bob Thome, Senior Director of Product Management, Oracle SIMPLIFYING YOUR HIGH AVAILABILITY DATABASE.
CERN-IT-DB Exabyte-Scale Data Management Using an Object-Relational Database: The LHC Project at CERN Jamie Shiers CERN, Switzerland
CERN IT Department CH-1211 Geneva 23 Switzerland t Experience with NetApp at CERN IT/DB Giacomo Tenaglia on behalf of Eric Grancher Ruben.
Oracle on Windows Server Introduction to Oracle10g on Microsoft Windows Server.
Clustering  Types of Clustering. Objectives At the end of this module the student will understand the following tasks and concepts. What clustering is.
DISTRIBUTED DATABASES IN ADBMS Shilpa Seth
9/16/2000Ian Bird/JLAB1 Planning for JLAB Computational Resources Ian Bird.
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS From data management to storage services to the next challenges.
Agenda Data Storage Database Management Systems Chapter 3 AOL Time-Warner Case.
DATABASE MANAGEMENT SYSTEMS IN DATA INTENSIVE ENVIRONMENNTS Leon Guzenda Chief Technology Officer.
Storage Tank in Data Grid Shin, SangYong(syshin, #6468) IBM Grid Computing August 23, 2003.
(C) 2008 Clusterpoint(C) 2008 ClusterPoint Ltd. Empowering You to Manage and Drive Down Database Costs April 17, 2009 Gints Ernestsons, CEO © 2009 Clusterpoint.
GStore: GSI Mass Storage ITEE-Palaver GSI Horst Göringer, Matthias Feyerabend, Sergei Sedykh
Oracle RAC and Linux in the real enterprise October, 02 Mark Clark Director Merrill Lynch Europe PLC Global Database Technologies October, 02 Mark Clark.
20-22 September 1999 HPSS User Forum, Santa Fe CERN IT/PDP 1 History  Test system HPSS 3.2 installation in Oct 1997 IBM AIX machines with IBM 3590 drives.
Pushing the Limits of Database clusters Jamie Shiers / CERN Werner Schueler / Intel.
March 9, 2015 San Jose Compute Engineering Workshop.
Database Architectures Database System Architectures Considerations – Data storage: Where do the data and DBMS reside? – Processing: Where.
10/22/2002Bernd Panzer-Steindel, CERN/IT1 Data Challenges and Fabric Architecture.
CERN Database Services for the LHC Computing Grid Maria Girone, CERN.
Introduction to the new mainframe © Copyright IBM Corp., All rights reserved. 1 Main Frame Computing Objectives Explain why data resides on mainframe.
CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 CERN.ch.
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
CERN-IT/DB After-C5 Presentation EDMS Service Migration Monica Marinucci Lopez October 4th 2002 EDMS Document:
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
Cache Fusion Making Shared Storage Perform for Vanilla Systems RAC Architecture.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
ORACLE & VLDB Nilo Segura IT/DB - CERN. VLDB The real world is in the Tb range (British Telecom - 80Tb using Sun+Oracle) Data consolidated from different.
CERN/IT/DB DB US Visit Oracle Visit August 20 – [ plus related news ]
US ATLAS Tier 1 Facility Rich Baker Deputy Director US ATLAS Computing Facilities October 26, 2000.
CNAF Database Service Barbara Martelli CNAF-INFN Elisabetta Vilucchi CNAF-INFN Simone Dalla Fina INFN-Padua.
Your Data Any Place, Any Time Beyond Relational. Overview of Beyond Relational Applications Today Beyond Relational Feature Overview Whirlwind Feature.
Tackling I/O Issues 1 David Race 16 March 2010.
Database CNAF Barbara Martelli Rome, April 4 st 2006.
Technology Drill Down: Windows Azure Platform Eric Nelson | ISV Application Architect | Microsoft UK |
Oracle A Role in LHC Data Handling? Jamie Shiers, IT-DB Based on work with early releases of Oracle 9i by IT-DB + experiments.
Thomas Baus Senior Sales Consultant Oracle/SAP Global Technology Center Mail: Phone:
An Introduction to GPFS
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
Table General Guidelines for Better System Performance
High Availability Linux (HA Linux)
IT-DB Physics Services Planning for LHC start-up
LCG 3D Distributed Deployment of Databases
Storage Virtualization
Oracle Storage Performance Studies
Case studies – Atlas and PVSS Oracle archiver
Table General Guidelines for Better System Performance
Cluster Computers.
Presentation transcript:

CERN/IT/DB A Strawman Model for using Oracle for LHC Physics Data Jamie Shiers, IT-DB, CERN

CERN/IT/DB Overview  Focus on scalability & deployment aspects  Implicit assumption that OCCI / OTT can provide needed functionality  Learn from experience with Objectivity/DB deployment in LAN & WAN

CERN/IT/DB Basic Concepts  Oracle Database refers to datafiles & server processes on a single system or cluster  User applications can access as many Oracle Databases as required  Different roles / schema / transaction boundaries etc all supported out of the box  Oracle deployed today at 1-100TB level

CERN/IT/DB LHC Datatypes / Volumes  RAW: 1PB / year  ESD: ~100TB / year  AOD: ~10TB / year  TAG: ~100GB-1TB / year

CERN/IT/DB LHC Datatypes & Oracle  RAW: 1PB/yr  ESD: ~100TB/yr  AOD: ~10TB/yr  TAG: ~100GB-1TB/yr  ~1 ‘DB’ / month  ~1 ‘DB’ / year  ~1 ‘DB’  ~1 ‘DB’ combined with AOD Maybe possible to soften these to ~1 ‘DB’ for all ESD Would there be a strong advantage? Different ‘DB’s have different access patterns, access control, schema, … etc. Navigation between DBs fully supported (links)

CERN/IT/DB A 100TB Oracle DB  Single machine or cluster?  Oracle stress “Real Application Clusters” with Oracle 9i – set of commodity systems vs ‘datacenter’ style server  Today’s Objy servers have ~1TB / disk accessible through 1 network connection  Scale to cluster of O(10) systems with O(100TB) disk? Seems plausible…

CERN/IT/DB Oracle Confidential7 Cluster Architecture Clustered Database Servers Mirrored Disk Subsyste m High Speed Switch or Interconnect Hub or Switch Fabric Network Centralized Management Console Storage Area Network Low Latency Interconnect VIA or Proprietary Drive and Exploit Industry Advances in Clustering Users No Single Point Of Failure

CERN/IT/DB Oracle Confidential8 Cache Fusion  Full Cache Fusion  Cache-to-cache data shipping  Shared cache eliminates slow I/O  Enhanced IPC  Allows Flexible and Transparent Deployment Users Shared Cache Cache Fusion

CERN/IT/DB O.R.A.C.  Certified Intel configurations from a number of vendors…  COMPAQ: PIII Xeon 700MHz, 4P, 4GB  FastTango: Oracle 9i cluster on Linux  Obtaining information from these and other vendors on suitable evaluation configurations…

CERN/IT/DB 100TB DB  RAW: ~10 tables: assume 1 (worst case)  Tables can be split into partitions  65TB / 2 16 partitions = 1GB / partition Partitions stored in tablespaces Tablespaces composed of sets of files  # partitions no problem for 100TB DB; OK for 10PB ?  1024 open files; 2 16 files / DB (today) (too low?)  1TB = GB files = ~3 hours data-taking  1 day = ~10TB: more natural partitioning level?  Clearly some work in practical VLDB issues…

CERN/IT/DB Oracle Deployment DAQ cluster: current data – no history export tablespaces to RAW cluster to/from MSS ESD cluster: 1/year? 1? AOD/TAG 1 total? to RCs to/from RCs reconstruct‘shift’ analysis

CERN/IT/DB 100TB cluster testbed  BT have ~80TB Oracle DB today  Visit arranged for July 31  Other VLDB sites will also be visited  e.g. Deutsche Telekom (DB2), DOCOMO, …

CERN/IT/DB 100TB RAC  Assume 500GB 50MB/s  10TB = 20 drives; need 1GB/s = 1Gbit E  Probably need 10Gbit E to allow for striping  100TB = 10 x 20 drives  Today’s DB servers are on Gbit Ethernet  Technology predictions suggest 10 / 100 Gbit Ethernet by start of LHC production

CERN/IT/DB Why Cluster? Separate DBs  Simple, no cluster h/w or s/w  Individual nodes (DBs) can be maintained independently  Need additional layer to find DB  Machines serving inactive data idle  Each node is a single point of failure Cluster  Additional complexity, cost  Entire cluster must be upgraded together  No additional s/w layer  All nodes used all of the time(?)  Shared cache  Reliability increases with additional nodes

CERN/IT/DB Size of the Largest RDBMS in Commercial Use for DSS Source: Database Scalability Program 2000 Terabytes Projected By Respondents

CERN/IT/DB Decision Support (2000) CompanyDB Size* (TB) DBMS Partner Server PartnerStorage Partner SBC10.50NCR LSI First Union Nat. Bank4.50InformixIBMEMC Dialog4.25ProprietaryAmdahlEMC Telecom Italia (DWPT) 3.71IBM Hitachi FedEx Services3.70NCR EMC Office Depot3.08NCR EMC AT & T2.83NCR LSI SK C&C2.54OracleHPEMC NetZero2.47OracleSunEMC Telecom Italia (DA) 2.32InformixSiemensTerraSystems *Database size = sum of user data + summaries and aggregates + indexes

CERN/IT/DB Transaction Processing (2000) CompanyDB Size* (TB) DBMS Partner Server PartnerStorage Partner Telstra10.36IBMIBM, Hitachi IBM British Telecom8.45CAIBMEMC United Parcel Service7.88IBM EMC Experian3.14IBMHitachiEMC US Customs Service2.70CAIBMHitachi Korea Telecom (KT ICIS) 2.26OracleCompaqStorageTek Dacom System Tech.1.80OraclePyramidSeagate CheckFree1.35IBM Centrelink1.27CCAIBM LG TelCom1.13OracleHPEMC *Database size = sum of user data + summaries and aggregates + indexes

CERN/IT/DB Summary  ~100TB DBs (in Oracle sense) will be fully supported by mainstream vendors on LHC timescales  The gap between our requirements & those of commercial firms narrowing fast