Download presentation
Presentation is loading. Please wait.
Published byBeatrice Ross Modified over 9 years ago
1
Internet of Things (Smart Grid) Storm Archival Storage – NOSQL like Hbase Streaming Processing (Iterative MapReduce) Batch Processing (Iterative MapReduce) Raw Data Information Wisdom Knowledge Data Decisions Pub-Sub System Orchestration / Dataflow / Workflow Cloud DIKW based on HPC-ABDS to integrate streaming and batch Big Data
2
Data Ingest Storm Archival Storage – Accumulo Streaming Processing (Bolts) Batch Processing (MapReduce) Raw Data Information Wisdom Knowledge Data Decisions Pub-Sub System Orchestration / Dataflow / Workflow
3
Big DataHPC
4
HPC-ABDS Integrated Software Big Data ABDSHPC, Cluster Orchestration Crunch, Tez, Cloud Dataflow Kepler, Pegasus Libraries Mllib/Mahout, R, PythonMatlab, Eclipse, Apps High Level Programming Pig, Hive, Drill Domain-specific Languages Platform as a ServiceApp Engine, BlueMix, Elastic Beanstalk XSEDE Software Stack Languages Java, Erlang, SQL, SparQL Fortran, C/C++ StreamingStorm, Kafka, Kinesis Parallel RuntimeMapReduce MPI/OpenMP/OpenCL CoordinationZookeeper CachingMemcached Data ManagementHbase, Neo4J, MySQLiRODS Data TransferSqoopGridFTP SchedulingYarnSlurm File SystemsHDFS, Object StoresLustre FormatsThrift, Protobuf FITS, HDF VirtualizationOpenStackDocker, SR-IOV InfrastructureCLOUDSSUPERCOMPUTERS
5
HPC-ABDS Integrated Software Big Data ABDSHPC, Cluster 17. Orchestration Crunch, Tez, Cloud Dataflow Kepler, Pegasus, Taverna 16. Libraries MLlib/Mahout, R, PythonScaLAPACK, PETSc, Matlab 15A. High Level Programming Pig, Hive, DrillDomain-specific Languages 15B. Platform as a Service App Engine, BlueMix, Elastic Beanstalk XSEDE Software Stack Languages Java, Erlang, Scala, Clojure, SQL, SPARQL, Python Fortran, C/C++, Python 14B. StreamingStorm, Kafka, Kinesis 13,14A. Parallel Runtime Hadoop, MapReduce MPI/OpenMP/OpenCL 2. CoordinationZookeeper 12. CachingMemcached 11. Data Management Hbase, Accumulo, Neo4J, MySQLiRODS 10. Data TransferSqoopGridFTP 9. SchedulingYarnSlurm 8. File SystemsHDFS, Object StoresLustre 1, 11A FormatsThrift, Protobuf FITS, HDF 5. IaaSOpenStack, DockerLinux, Bare-metal, SR-IOV InfrastructureCLOUDSSUPERCOMPUTERS CUDA, Exascale Runtime
6
Initial Convergence Software Big Data ABDSHPC, Cluster Orchestration Crunch, Tez, Cloud Dataflow Kepler, Pegasus, Taverna Libraries MLlib/Mahout, R, PythonScaLAPACK, PETSc, Matlab High Level Programming Pig, Hive, DrillDomain-specific Languages Platform as a Service App Engine, BlueMix, Elastic Beanstalk XSEDE Software Stack Languages Java, Erlang, Scala, Clojure, SQL, SPARQL, Python Fortran, C/C++, Python StreamingStorm, Kafka, Kinesis Parallel Runtime Hadoop, MapReduce MPI/OpenMP/OpenCL CoordinationZookeeper CachingMemcached Data Management Hbase, Accumulo, Neo4J, MySQLiRODS Data TransferSqoopGridFTP SchedulingMesos, Aurora, YarnSlurm File SystemsHDFS, Object StoresLustre FormatsThrift, Protobuf FITS, HDF IaaSOpenStack, DockerLinux, Bare-metal, SR-IOV InfrastructureCLOUDSSUPERCOMPUTERS CUDA, Exascale Runtime
10
4 Forms of MapReduce (1) Map Only ( 4) Point to Point or Map-Communication (3) Iterative Map Reduce or Map-Collective (2) Classic MapReduce Input map reduce Input map reduce Iterations Input Output map Local Graph BLAST Analysis Local Machine Learning Pleasingly Parallel High Energy Physics (HEP) Histograms Distributed search Recommender Engines Expectation maximization Clustering e.g. K-means Linear Algebra, PageRank Classic MPI PDE Solvers and Particle Dynamics Graph Problems MapReduce and Iterative Extensions (Spark, Twister)MPI, Giraph Integrated Systems such as Hadoop + Harp with Compute and Communication model separated Correspond to first 4 of Identified Architectures
11
(5) Map Streaming maps brokers Events (6) Shared memory Map Communicates Map & Communicate Shared Memory
12
6 Data Analysis Architectures BLAST Analysis Local Machine Learning Pleasingly Parallel High Energy Physics (HEP) Histograms Web search Recommender Engines Expectation maximization Clustering Linear Algebra, PageRank Classic MPI PDE Solvers and Particle Dynamics Graph Streaming images from Synchrotron sources, Telescopes, IoT MapReduce and Iterative Extensions (Spark, Twister)MPI, GiraphApache Storm Difficult to parallelize asynchronous parallel Graph Algorithms Harp – Enhanced Hadoop Maps are Bolts Classic Hadoop in classes 1) 2)
15
Kmeans Clustering Time Secs Efficiency # Cores
17
Infra structure IaaS Software Defined Computing (virtual Clusters) Hypervisor, Bare Metal Operating System Platform PaaS Cloud e.g. MapReduce HPC e.g. PETSc, SAGA Computer Science e.g. Compiler tools, Sensor nets, Monitors Software-Defined Distributed System (SDDS) as a Service includes Network NaaS Software Defined Networks OpenFlow GENI Software (Application Or Usage) SaaS Use HPC-ABDS Class Usages e.g. run GPU & multicore Applications Control Robot SDDS-aaS Tools Provisioning Image Management IaaS Interoperability NaaS, IaaS tools Expt management Dynamic IaaS NaaS DevOps CloudMesh is a SDDSaaS tool that uses Dynamic Provisioning and Image Management to provide custom environments for general target systems Involves (1) creating, (2) deploying, and (3) provisioning of one or more images in a set of machines on demand http://mycloudmesh.org/ 17 Dynamic Orchestration and Dataflow
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.