Presentation is loading. Please wait.

Presentation is loading. Please wait.

CERN/IT/DB A Strawman Model for using Oracle for LHC Physics Data Jamie Shiers, IT-DB, CERN.

Similar presentations


Presentation on theme: "CERN/IT/DB A Strawman Model for using Oracle for LHC Physics Data Jamie Shiers, IT-DB, CERN."— Presentation transcript:

1 CERN/IT/DB A Strawman Model for using Oracle for LHC Physics Data Jamie Shiers, IT-DB, CERN

2 CERN/IT/DB Overview  Focus on scalability & deployment aspects  Implicit assumption that OCCI / OTT can provide needed functionality  Learn from experience with Objectivity/DB deployment in LAN & WAN

3 CERN/IT/DB Basic Concepts  Oracle Database refers to datafiles & server processes on a single system or cluster  User applications can access as many Oracle Databases as required  Different roles / schema / transaction boundaries etc all supported out of the box  Oracle deployed today at 1-100TB level

4 CERN/IT/DB LHC Datatypes / Volumes  RAW: 1PB / year  ESD: ~100TB / year  AOD: ~10TB / year  TAG: ~100GB-1TB / year

5 CERN/IT/DB LHC Datatypes & Oracle  RAW: 1PB/yr  ESD: ~100TB/yr  AOD: ~10TB/yr  TAG: ~100GB-1TB/yr  ~1 ‘DB’ / month  ~1 ‘DB’ / year  ~1 ‘DB’  ~1 ‘DB’ combined with AOD Maybe possible to soften these to ~1 ‘DB’ for all ESD Would there be a strong advantage? Different ‘DB’s have different access patterns, access control, schema, … etc. Navigation between DBs fully supported (links)

6 CERN/IT/DB A 100TB Oracle DB  Single machine or cluster?  Oracle stress “Real Application Clusters” with Oracle 9i – set of commodity systems vs ‘datacenter’ style server  Today’s Objy servers have ~1TB / disk accessible through 1 network connection  Scale to cluster of O(10) systems with O(100TB) disk? Seems plausible…

7 CERN/IT/DB Oracle Confidential7 Cluster Architecture Clustered Database Servers Mirrored Disk Subsyste m High Speed Switch or Interconnect Hub or Switch Fabric Network Centralized Management Console Storage Area Network Low Latency Interconnect VIA or Proprietary Drive and Exploit Industry Advances in Clustering Users No Single Point Of Failure

8 CERN/IT/DB Oracle Confidential8 Cache Fusion  Full Cache Fusion  Cache-to-cache data shipping  Shared cache eliminates slow I/O  Enhanced IPC  Allows Flexible and Transparent Deployment Users Shared Cache Cache Fusion

9 CERN/IT/DB O.R.A.C.  Certified Intel configurations from a number of vendors…  COMPAQ: PIII Xeon 700MHz, 4P, 4GB  FastTango: Oracle 9i cluster on Linux  Obtaining information from these and other vendors on suitable evaluation configurations…

10 CERN/IT/DB 100TB DB  RAW: ~10 tables: assume 1 (worst case)  Tables can be split into partitions  65TB / 2 16 partitions = 1GB / partition Partitions stored in tablespaces Tablespaces composed of sets of files  # partitions no problem for 100TB DB; OK for 10PB ?  1024 open files; 2 16 files / DB (today) (too low?)  1TB = 100 10GB files = ~3 hours data-taking  1 day = ~10TB: more natural partitioning level?  Clearly some work in practical VLDB issues…

11 CERN/IT/DB Oracle Deployment DAQ cluster: current data – no history export tablespaces to RAW cluster to/from MSS ESD cluster: 1/year? 1? AOD/TAG 1 total? to RCs to/from RCs reconstruct‘shift’ analysis

12 CERN/IT/DB 100TB cluster testbed  BT have ~80TB Oracle DB today  Visit arranged for July 31  Other VLDB sites will also be visited  e.g. Deutsche Telekom (DB2), DOCOMO, …

13 CERN/IT/DB 100TB RAC  Assume 500GB disks @ 50MB/s  10TB = 20 drives; need 1GB/s = 1Gbit E  Probably need 10Gbit E to allow for striping  100TB = 10 x 20 drives  Today’s DB servers are on Gbit Ethernet  Technology predictions suggest 10 / 100 Gbit Ethernet by start of LHC production

14 CERN/IT/DB Why Cluster? Separate DBs  Simple, no cluster h/w or s/w  Individual nodes (DBs) can be maintained independently  Need additional layer to find DB  Machines serving inactive data idle  Each node is a single point of failure Cluster  Additional complexity, cost  Entire cluster must be upgraded together  No additional s/w layer  All nodes used all of the time(?)  Shared cache  Reliability increases with additional nodes

15 CERN/IT/DB Size of the Largest RDBMS in Commercial Use for DSS Source: Database Scalability Program 2000 Terabytes 3 50 100 199620002005 Projected By Respondents

16 CERN/IT/DB Decision Support (2000) CompanyDB Size* (TB) DBMS Partner Server PartnerStorage Partner SBC10.50NCR LSI First Union Nat. Bank4.50InformixIBMEMC Dialog4.25ProprietaryAmdahlEMC Telecom Italia (DWPT) 3.71IBM Hitachi FedEx Services3.70NCR EMC Office Depot3.08NCR EMC AT & T2.83NCR LSI SK C&C2.54OracleHPEMC NetZero2.47OracleSunEMC Telecom Italia (DA) 2.32InformixSiemensTerraSystems *Database size = sum of user data + summaries and aggregates + indexes

17 CERN/IT/DB Transaction Processing (2000) CompanyDB Size* (TB) DBMS Partner Server PartnerStorage Partner Telstra10.36IBMIBM, Hitachi IBM British Telecom8.45CAIBMEMC United Parcel Service7.88IBM EMC Experian3.14IBMHitachiEMC US Customs Service2.70CAIBMHitachi Korea Telecom (KT ICIS) 2.26OracleCompaqStorageTek Dacom System Tech.1.80OraclePyramidSeagate CheckFree1.35IBM Centrelink1.27CCAIBM LG TelCom1.13OracleHPEMC *Database size = sum of user data + summaries and aggregates + indexes

18 CERN/IT/DB Summary  ~100TB DBs (in Oracle sense) will be fully supported by mainstream vendors on LHC timescales  The gap between our requirements & those of commercial firms narrowing fast


Download ppt "CERN/IT/DB A Strawman Model for using Oracle for LHC Physics Data Jamie Shiers, IT-DB, CERN."

Similar presentations


Ads by Google