Large Scale Computing Systems

1 Large Scale Computing Systems
Data Computations Infrastructures A new ERA: BIG x 3 Revisit: algorithms, architectures, distributed systems, parallel computing, scalable DBs

2 Big Data ‘Moore's’ Law: Data doubles every 18 months
90% of today’s data was created in the last 2 years Facebook: 20TB/day compressed CERN/LHC: 40TB/day (15PB/year) NYSE: 1TB/day Many more Web logs, financial transactions, medical records, etc

3 Data Growth 35 ZB (Zettabyte-1021)
1 EB (Exabyte-1018) = 1000 PB (Petabyte-1015) Last year (2010) US mobile data traffic 0.8 ZB (Zettabyte) = 800 EB Entire global mass of digital data in 2009 according to IDC 35 ZB (Zettabyte-1021) IDC’s forecast for all digital data in 2020

4 MapReduce A programming model
A software framework for writing applications that rapidly process vast amounts of data in parallel on large clusters of compute nodes

5 Cloud computing Big Data pushes databases to their limits
NoSQL databases Horizontal scalable schema-free multi-datacenter data stores that can handle PB of data Google’s BigTable, Facebook’s Cassandra, LinkedIn’s Voldemort, Amazon’s Dynamo, and many more Cloud Computing Virtualized resources from distant data centers Elastic and “pay as you go” resource provisioning Easy resource manipulation through an API

6 Big computations Challenges for exascale computing:
Scalability up to millions of cores Programmability (revisit traditional parallel programming models) Fault tolerance (in thousands or millions of nodes, several may fail every day) Low power consumption (maximize GFLOP/WATT) It’s not High-Performance Computing (HPC) anymore… it’s High-Efficiency Computing (HEC)

7 Exascale applications
Computations on sparse matrices: The heart of scientific and engineering simulations (Huge) Graph algorithms: Shortest paths, PageRank, etc Regular grids: solving PDEs with millions of unknowns

8 Big Infrastructures OS, Architectures revisited Virtualization
Cloud Facilities - Datacenters Distributed storage: 100’s PBs using commodity disks HPC clusters: Exascale computing using scalable ‘ingredients’

