Odej Kao Technische Universität Berlin Distributed IT Systems

From Stratosphere to Apache Flink, From Big Data Center to Digital Future
Odej Kao Technische Universität Berlin Distributed IT Systems Dean of Faculty EECS Director tubIT - IT Service Center Struga, 23rd of September 2016

More and more data is available to science and business!
Drivers: Cloud Computing Internet of Services Internet of Things Cyberphysical Systems sensor data web archives video streams Underlying Trends: Connectivity Collaboration Computer generated data simulation data audio streams RFID data Odej Kao ■ CIT – Complex and Distributed IT Systems, TU Berlin

Data-driven applications …
e-sciences lifecycle management health home automation water management Industry 4.0 traffic management market research energy management Autonomous driving Odej Kao ■ CIT – Complex and Distributed IT Systems, TU Berlin

Smart Cities from my perspective
Big Data Analytics Sensor Reaction Guaranteed response time < x Make it reliable and predictive => tactile internet, 5G, ... Make it shorter => Edge computing Make it faster => Next generation analytics engine

How to Analyse the data? It started with very few systems … Pig, Jaql,
Hive Higher-Level Language Parallel Programming Model Execution Engine Hadoop Scope, DryadLINQ AQL, SIMPLE/ Sopremo PACT Dryad Nephele Map / Reduce Hadoop Stack (Yahoo!, Facebook) Dryad Stack (Microsoft) Stratosphere Stack (TU, HU, HPI) Asterix Stack (UCI, UCR, UCSD) Hyracks Algebricks Odej Kao ■ CIT – Complex and Distributed IT Systems, TU Berlin

Data & Analysis: More and More Complex!
data volume too large Volume data rate too fast Velocity data too heterogeneous Variability data too uncertain Veracity Reporting aggregation, selection Ad-Hoc Queries SQL, Xquery Integration map/reduce Data Mining Matlab, R, Python Predictive/Prescripive Matlab, R, Python DM DM scalability scalability ML ML algorithms algorithms Data Analysis Odej Kao ■ CIT – Complex and Distributed IT Systems, TU Berlin

... and more: Apache Big Data Stack
Odej Kao ■ CIT – Complex and Distributed IT Systems, TU Berlin

Interesting questions
Streaming vs Batch Processing Which hardware: HPC vs. HTC vs. Cloud Right components to choose? Value through interdisciplinary collaboration Virtualization: HPC v Docker v OpenStack (OpenNebula) Apache Beam v. Kepler for orchestration and lots of other HPC v “Apache” or “Apache v Apache” choices e.g. Beam v. Crunch v. NiFi What Language should be used: Python/R/Matlab, C++, Java … 350 Software systems in HPC-ABDS collection with lots of choice HPC simulation stack well defined and highly optimized; user makes few choices © Jeffrey Fox Odej Kao ■ CIT – Complex and Distributed IT Systems, TU Berlin

Stratosphere from 10,000 feet
Machine Learning Graph Analysis and more… SQL Analysis StratoSphere Above the Clouds Data stored in Hadoop File-System Data Odej Kao ■ CIT – Complex and Distributed IT Systems, TU Berlin

Efficient Parallel Data Processing in the Cloud, ACM-MTAGS
Timeline Efficient Parallel Data Processing in the Cloud, ACM-MTAGS

Stratosphere from 10,000 feet
Stratosphere Program Data flow Program compiler Stratosphere optimizer Picks data shipping and local strategies, operator order Execution plan Job graph Execution graph Runtime Hash- and sort-based out-of-core operator implementations, memory management Parallel Runtime Task scheduling, network data transfers, resource allocation, Odej Kao ■ CIT – Complex and Distributed IT Systems, TU Berlin

Programs are Data Flows
Reduce Join Map Source Sink Input Transformations Output Odej Kao ■ CIT – Complex and Distributed IT Systems, TU Berlin

Operators Odej Kao ■ CIT – Complex and Distributed IT Systems, TU Berlin

Parallelization Contracts (PACTs)
Second-order function Data First-order function (user code) Data Map PACT Describe how input is partitioned in groups “What is processed together” First-order UDF called once per input group Map PACT Each input record forms a group, Each record independently processed by UDF Reduce PACT One attribute is the designated key All records with same key value form a group Reduce PACT Odej Kao ■ CIT – Complex and Distributed IT Systems, TU Berlin

The execution engine – Nephele
Shared resource management Abandon assumption that execution engine “owns” nodes Instead nodes are temporarily “leased” Parallel Execution Engine IaaS Cloud Job must express tasks‘ data dependencies Which task‘s input is required as which task‘s output Required to safely terminate virtual machines Mapping between tasks and VM types Which task shall run on which type of virtual machine? Information could be provided by programmer Odej Kao ■ CIT – Complex and Distributed IT Systems, TU Berlin

Nephele features ? 01101100101 Detecting parallelization constraints
Exploiting the cloud’s elasticity Elasticity Detecting parallelization constraints Scale In/Scale Out Mitigating I/O bottlenecks Adaptive Online Compression Inferring physical network topologies Cloud Topology Inference ? Odej Kao ■ CIT – Complex and Distributed IT Systems, TU Berlin

Nephele elasticity Client IaaS Cloud Standard master worker pattern
Worker allocation on demand Workload over time Client Public Network (Internet) Cloud Mgmt. Interface Persistent Storage Private / Virtualized Network IaaS Cloud Master Worker Worker Worker Odej Kao ■ CIT – Complex and Distributed IT Systems, TU Berlin

Nephele Job Graphs Job Graph Execution Graph B C D E F A F F Channels
Tasks Parallel Execution D D (network channel) B C B C (memory channel) A A Tasks consume data streams and produce data streams Channels are spanned according to a "distribution pattern" Odej Kao ■ CIT – Complex and Distributed IT Systems, TU Berlin

Exploiting Elasticity
Which degree of parallelization is suitable for which task? Hard to anticipate for arbitrary user code, must be assessed online CPU Bottlenecks I/O Bottlenecks Output 1: Avg. CPU Util.: Output 1: Avg. CPU Util.: Output 1 Task 1: Avg. CPU Util.: Task 1: Avg. CPU Util.: Task 1 Par. Instanzen nicht mehr schnell genug versorgen Gesamtauslastung -> Komplexität Task, Grad an Par., Beziehung Schwer im Vorfeld abzuschätzen Bottlenecks lokal Input 1: Avg. CPU Util.: Input 1: Avg. CPU Util.: Input 1 Odej Kao ■ CIT – Complex and Distributed IT Systems, TU Berlin

Bottleneck Detection Profiling component runs on every worker node
Profiling provides pt(vi): % of time parallel instance i of vertex v used its given CPU time during last t seconds (seq. code, independence of par. instances) st(ej): % of time parallel instance j of edge e was saturated during last t seconds (capacity contr. channels) Values of pt(vi) and st(ej) are propagated to master every t seconds Prozentsatz der CPU Zeit, die … gekommen hat und tatsächlich nutzen konnte Odej Kao ■ CIT – Complex and Distributed IT Systems, TU Berlin

Bottleneck Detection Algorithm
LRTS  ReverseTopologicalSort(G) for all v in LRTS do v.isCpuBottleneck  IsCPUBottleneck(v, G) end for if Ǝv ϵ LRTS : v.isCPUBottleneck then Ev = {(v,w) | w ϵ VG ˄ (v,w) ϵ EG} for all e ϵ Ev do e.isIOBottleneck  IsIOBottleneck(e, G) end if st(e1) = 99% pt(v1) = 35% pt(v2) = 99% pt(v3) = 10% pt(v4) = 27% st(e2) = 16% st(e3) = 9% 1 2 3 4 CPU Bottleneck Criteria CPU bottleneck: pt(v) > α (α = 90%) No successor vertex of v is CPU bottleneck Criteria I/O bottleneck: st(e) > β (β = 90%) No successor edge of e is I/O bottleneck Odej Kao ■ CIT – Complex and Distributed IT Systems, TU Berlin

Evaluation (1/2) Evaluation job Properties of job Goal of evaluation
Conversion of article DB 40 GB of bitmap images to PDF Properties of job Different computational tasks complexities Each parallel instance runs on separate VM (with 1 CPU core) Input data reside on external storage Goal of evaluation Find ideal degree of parallelization for each task File Reader OCR Task PDF Creator Inverted Index Task Index Writer PDF Writer Odej Kao ■ CIT – Complex and Distributed IT Systems, TU Berlin

Evaluation (2/2) Duration: 5:10 h 4 VMs Duration: 1:15 h 7 VMs
CPU Bottleneck I/O Bottleneck Duration: 0:25 h 22 VMs Duration: 0:24 h 23 VMs Odej Kao ■ CIT – Complex and Distributed IT Systems, TU Berlin

Topology detection The network is a scarce resource
Used for communication among nodes Used by distributed file system Possibly used by other virtual machines Network performance hard to predict Available throughput may change over time Can lead to I/O bottlenecks starvation Idea: Handle varying I/O performance on application layer Adaptive compression Topology detection Parallel Data Flow I/O Bottleneck Odej Kao ■ CIT – Complex and Distributed IT Systems, TU Berlin

Next Generation Advanced data analysis programs: Declarative specification and optimization of iteration and state Low-latency: Trading-off virtualization Evolving Datasets: First results fast Algorithms: analyzing and understanding “big data” Odej Kao ■ CIT – Complex and Distributed IT Systems, TU Berlin

Extended execution engine
Executes stateful, iterative, data-parallel plans Manages State Low-latency stream processing Concurrent Queries Seamless integration into infrastructure (e.g. YARN) Iterations In Greek mythology, aurae are the nymphs of breezes. Aura handles “fast data,” succeeding Stratosphere I’s Nephele, which was named after “nephos,” the Greek word for cloud. Odej Kao ■ CIT – Complex and Distributed IT Systems, TU Berlin

Resource Management and Execution
Architecture Task Manager Execution Manager State Manager Physical View Data Structures Operator Implementations Data/State Management Op1 PV1 PV2 PV3 PV4 Op2 Op3 Op4 Computation State Management Resource Management and Execution Odej Kao ■ CIT – Complex and Distributed IT Systems, TU Berlin

State Management Concept: physical views model evolving datasets for data-parallel processing Unified naming and access to mutable state, streams and static datasets Description of dataset properties: Format, partitioning schema, persistence (stored, cached, live streaming, replicated…), access mode (r/w, push/pull, update in place, …) Mutable datasets for iterations and data- sharing State Manager Physical View DS PV1 PV2 PV3 PV4 PV We propose to represent state and intermediate datasets as physical views (PVs). A physical view can be seen as a lower-level analogue to (materialized) views in relational databases. Like relational views, PVs are named and can be accessed by the runtime engine. Unlike relational views, they are not defined by a SQL statement, but by their physical properties, such as whether the PV is physically materialized (e.g., due to a previous checkpointing decision or needs to be recomputed), whether the PV is distributed and how (e.g., range-partitioned on a specific key), whether the PV is replicated, or the type of data structure used for local storage (e.g., B+-tree, hash table, heap file, connector to a streaming data source, etc). In this architecture, an evolving dataset would be represented as one variant of a PV. The PVs are realized through a set of data structures that implement methods of accessing or updating/appending data and represent characteristics like materialization or streaming. The State Manager plays the critical role of governing the resources that store the state and handle the communication to keep the different local data structures consistent. It also decides to move state as required by elasticity or failure/recovery constraints. We envision it to effectively “virtualize” the storage of intermediate results in a similar way as the hardware abstraction virtualizes the computational resources. Odej Kao ■ CIT – Complex and Distributed IT Systems, TU Berlin

State Management Synchronization
Data reuse in bulk and incremental iterations Scaling-Out/Down: Migrate, modify and rebalance state between nodes Fault tolerance Replication and recovery in state management PV PV PV We propose to represent state and intermediate datasets as physical views (PVs). A physical view can be seen as a lower-level analogue to (materialized) views in relational databases. Like relational views, PVs are named and can be accessed by the runtime engine. Unlike relational views, they are not defined by a SQL statement, but by their physical properties, such as whether the PV is physically materialized (e.g., due to a previous checkpointing decision or needs to be recomputed), whether the PV is distributed and how (e.g., range-partitioned on a specific key), whether the PV is replicated, or the type of data structure used for local storage (e.g., B+-tree, hash table, heap file, connector to a streaming data source, etc). In this architecture, an evolving dataset would be represented as one variant of a PV. The PVs are realized through a set of data structures that implement methods of accessing or updating/appending data and represent characteristics like materialization or streaming. The State Manager plays the critical role of governing the resources that store the state and handle the communication to keep the different local data structures consistent. It also decides to move state as required by elasticity or failure/recovery constraints. We envision it to effectively “virtualize” the storage of intermediate results in a similar way as the hardware abstraction virtualizes the computational resources. Odej Kao ■ CIT – Complex and Distributed IT Systems, TU Berlin

Resource Management and Execution
Supremo Execution Plans (SEPs): From DAG of black-boxed UDFs to stateful, iterative, data-parallel programs Model, schedule, and execute iterative data flows Adaptation to varying workload with evolving datasets Predictable Performance Meet pre-defined latency/throughput requirements Task Manager Operator Implementations Op1 Op2 Op3 Op4 PV1 Op1 Op2 Op3 Op4 PV2 PV3 We propose to represent state and intermediate datasets as physical views (PVs). A physical view can be seen as a lower-level analogue to (materialized) views in relational databases. Like relational views, PVs are named and can be accessed by the runtime engine. Unlike relational views, they are not defined by a SQL statement, but by their physical properties, such as whether the PV is physically materialized (e.g., due to a previous checkpointing decision or needs to be recomputed), whether the PV is distributed and how (e.g., range-partitioned on a specific key), whether the PV is replicated, or the type of data structure used for local storage (e.g., B+-tree, hash table, heap file, connector to a streaming data source, etc). In this architecture, an evolving dataset would be represented as one variant of a PV. The PVs are realized through a set of data structures that implement methods of accessing or updating/appending data and represent characteristics like materialization or streaming. The State Manager plays the critical role of governing the resources that store the state and handle the communication to keep the different local data structures consistent. It also decides to move state as required by elasticity or failure/recovery constraints. We envision it to effectively “virtualize” the storage of intermediate results in a similar way as the hardware abstraction virtualizes the computational resources. Odej Kao ■ CIT – Complex and Distributed IT Systems, TU Berlin

Extensions in Stratosphere II
Basic Operators Union Join Reduce Map CoGroup Cross Iterate IterateDelta Extensions in Stratosphere II Iterations – making sure Graph and Machine Learning algorithms run really fast.. Odej Kao ■ CIT – Complex and Distributed IT Systems, TU Berlin

Stratosphere offers two types of iterations
Delta Iterations (aka. Workset Iterations) Bulk Iterations Iterative Function Initial Dataset Result Result Iterative Function State Initial Workset Initial Solution set Odej Kao ■ CIT – Complex and Distributed IT Systems, TU Berlin

Why Delta Iterations? Bulk Delta Runtime (secs) Computations performed in each iteration for connected communities of a social graph Odej Kao ■ CIT – Complex and Distributed IT Systems, TU Berlin

Automatic Optimization
Execution Plan A Execution Plan B Different plans are optimal for running the program in different environments and on different data. Run on a sample on the laptop Run on large files on the cluster Odej Kao ■ CIT – Complex and Distributed IT Systems, TU Berlin

Streaming Batch-job workloads Usual goal: „minimize time-to- solution“
Translates to „maximize throughput“ Streaming workloads may have different goals Meet pipeline latency and throughput requirements Minimize pipeline latency, don‘t care about throughput Max/Min or other custom metrics Odej Kao ■ CIT – Complex and Distributed IT Systems, TU Berlin

Meeting Latency Requirements
QoS: Meet latency constraint X, then maximize throughput Based on observations we designed two strategies: Adaptive Output Buffer Sizing Dynamic Task Chaining Odej Kao ■ CIT – Complex and Distributed IT Systems, TU Berlin

Latency around 4s, arge buffers -> „latency waves“
Latency measurements Latency with Adaptive Buffer Sizing + Task chaining Latency without optimization Latency around 4s, arge buffers -> „latency waves“ Final Latency: ≈300ms (≈ 94% improvement) Odej Kao ■ CIT – Complex and Distributed IT Systems, TU Berlin

National Big Data Center
climbing the next level Odej Kao ■ CIT – Complex and Distributed IT Systems, TU Berlin

Machine Learning + Data Management = X
Relational Algebra/SQL Mathematical Programming Data Warehouse/OLAP Linear Algebra NF2 /XQuery Scalability Error Estimation Technology X Hardware adaption Active Sampling Fault Tolerance Regression Monte Carlo Resource Management Think ML-algorithms in a scalable way Feature Engineering Representation Algorithms (SVM, GPs, etc.) ML DM Declarative Languages Automatic Adaption Scalable processing declarative Process iterative algorithms in a scalable way Statistic Parallelization Compiler Sketches Hashing Memory Management Isolation Convergence Goal: Data Analysis without System Programming! Memory Hierarchy Curse of Dimensionality Data Analysis Language Iterative Algorithms Query Optimization Control flow Dataflow Indexing

BBDC Goals Developing an integrated, declarative, highly scalable open- source system (Apache Flink )

„What“, not „how“ Example: k-Means Clustering
Declarative data analysis program with automatic optimization, parallelization and hardware adaption object RunKMeans { val plan = km.getScalaPlan(args(0).toInt, args(1), args(2), args(3), args(4).toInt) LocalExecutor.execute(plan) } return def main(args: Array[String]) { if (args.size < 5) { val km = new KMeans println(km.getDescription) System.exit(0) "Parameters: [numSubStasks] [dataPoints] [clusterCenters] [output] [numIterations]" override def getPlan(args: String*) = { override def getDescription() = { class KMeans extends PlanAssembler with PlanAssemblerDescription with Serializable { getScalaPlan(args(0).toInt, args(1), args(2), args(3), args(4).toInt) def computeEuclidianDistance(other: Point) = other match { case Point(x2, y2, z2) => math.sqrt(math.pow(x - x2, 2) + math.pow(y - y2, 2) + math.pow(z - z2, 2)) case class Point(x: Double, y: Double, z: Double) { dataPoints.reduce { (z, v) => z.copy(_2 = z._2 + v._2) } def sumPointSums = (dataPoints: Iterator[(Int, PointSum)]) => { def asPointSum = (pid: Int, dist: Distance) => dist.clusterId -> PointSum(1, dist.dataPoint) case class Distance(dataPoint: Point, clusterId: Int, distance: Double) def toPoint() = Point(round(pointSum.x / count), round(pointSum.y / count), round(pointSum.z / count)) case PointSum(c, Point(x, y, z)) => PointSum(count + c, Point(x + pointSum.x, y + pointSum.y, z + pointSum.z)) case class PointSum(count: Int, pointSum: Point) { def +(that: PointSum) = that match { private def round(d: Double) = math.round(d * 100.0) / 100.0; val PointInputPattern(id, x, y, z) = line def formatOutput = (cid: Int, p: Point) => "%d|%.2f|%.2f|%.2f|".format(cid, p.x, p.y, p.z) (id.toInt, Point(x.toDouble, y.toDouble, z.toDouble)) def parseInput = (line: String) => { val PointInputPattern = """(\d+)\|-?(\d+\.\d+)\|-?(\d+\.\d+)\|-?(\d+\.\d+)\|""".r val distToCluster = dataPoint.computeEuclidianDistance(clusterPoint) val ((pid, dataPoint), (cid, clusterPoint)) = (p, c) def computeDistance(p: (Int, Point), c: (Int, Point)) = { def computeNewCenters(centers: DataSet[(Int, Point)]) = { val clusterPoints = DataSource(clusterInput, DelimitedInputFormat(parseInput)) def getScalaPlan(numSubTasks: Int, dataPointInput: String, clusterInput: String, clusterOutput: String, numIterations: Int) = { pid -> Distance(dataPoint, cid, distToCluster) val dataPoints = DataSource(dataPointInput, DelimitedInputFormat(parseInput)) val distances = dataPoints cross centers map computeDistance val finalCenters = clusterPoints.iterate(numIterations, computeNewCenters) newCenters val newCenters = nearestCenters groupBy { case (cid, _) => cid } reduceGroup sumPointSums map { case (cid, pSum) => cid -> pSum.toPoint() } val nearestCenters = distances groupBy { case (pid, _) => pid } reduceGroup { ds => ds.minBy(_._2.distance) } map asPointSum.tupled plan plan.setDefaultParallelism(numSubTasks) val output = finalCenters.write(clusterOutput, DelimitedOutputFormat(formatOutput.tupled)) val plan = new ScalaPlan(Seq(output), "KMeans Iteration (Immutable)") 65 lines of code short development time robust runtime „What“ (Scala frontend) (Apache Flink) Mahout - Kmeans Implementierung public static final String DISTANCE_MEASURE_KEY = "org.apache.mahout.clustering.kmeans.measure"; public class Cluster { Cluster.java clusterMap = new HashMap<String, Cluster>(); private final int clusterId; private static int nextClusterId = 0; public static final String CLUSTER_PATH_KEY = "org.apache.mahout.clustering.kmeans.path"; public static final String CLUSTER_CONVERGENCE_KEY = "org.apache.mahout.clustering.kmeans.convergence"; List<Cluster> clusters = new ArrayList<Cluster>(); setClusterMap(clusters); private Vector center = new SparseVector(0); if (clusterMap.isEmpty()) private int numPoints = 0; private Vector centroid = null; private Vector pointTotal = null; private void setClusterMap(List<Cluster> clusters) { private boolean converged = false; clusters.clear(); clusterMap.put(cluster.getIdentifier(), cluster); private static DistanceMeasure measure; public void config(List<Cluster> clusters) { + cluster.computeCentroid().asFormatString(); return cluster.getIdentifier() + ": " private static double convergenceDelta = 0; public static String formatCluster(Cluster cluster) { } int beginIndex = formattedString.indexOf('['); public static Cluster decodeCluster(String formattedString) { String id = formattedString.substring(0, beginIndex); private KMeansUtil() { public final class KMeansUtil { KMeansUtil.java String center = formattedString.substring(beginIndex); private static final Logger log = LoggerFactory.getLogger(KMeansUtil.class); char firstChar = id.charAt(0); Vector clusterCenter = AbstractVector.decodeVector(center); if (firstChar == 'C' || startsWithV) { boolean startsWithV = firstChar == 'V'; int clusterId = Integer.parseInt(formattedString.substring(1, beginIndex - 2)); public static void configureWithClusterInfo(String clusterPathStr, Cluster cluster = new Cluster(clusterCenter, clusterId); List<Cluster> clusters) { return cluster; cluster.converged = startsWithV; Path clusterPath = new Path(clusterPathStr); List<Path> result = new ArrayList<Path>(); JobConf job = new JobConf(KMeansUtil.class); PathFilter clusterFileFilter = new PathFilter() { try { return null; public static void configure(JobConf job) { public boolean accept(Path path) { return path.getName().startsWith("part"); }; convergenceDelta = Double.parseDouble(job.get(CLUSTER_CONVERGENCE_KEY)); measure.configure(job); Class<?> cl = ccl.loadClass(job.get(DISTANCE_MEASURE_KEY)); ClassLoader ccl = Thread.currentThread().getContextClassLoader(); nextClusterId = 0; measure = (DistanceMeasure) cl.newInstance(); clusterPath, clusterFileFilter)), clusterFileFilter); FileStatus[] matches = fs.listStatus(FileUtil.stat2Paths(fs.globStatus( FileSystem fs = clusterPath.getFileSystem(job); } catch (ClassNotFoundException e) { } catch (IllegalAccessException e) { throw new RuntimeException(e); for (FileStatus match : matches) { } catch (InstantiationException e) { result.add(fs.makeQualified(match.getPath())); measure = aMeasure; public static void config(DistanceMeasure aMeasure, double aConvergenceDelta) { reader =new SequenceFile.Reader(fs, path, job); for (Path path : result) { SequenceFile.Reader reader = null; convergenceDelta = aConvergenceDelta; while (reader.next(key, value)) { int counter = 1; public static void emitPointToNearestCluster(Vector point, Cluster cluster = Cluster.decodeCluster(value.toString()); List<Cluster> clusters, Text values, OutputCollector<Text, Text> output) clusters.add(cluster); throws IOException { double nearestDistance = Double.MAX_VALUE; for (Cluster cluster : clusters) { Cluster nearestCluster = null; reader.close(); } finally { if (reader != null) { double distance = measure.distance(point, cluster.getCenter()); nearestDistance = distance; if (nearestCluster == null || distance < nearestDistance) { nearestCluster = cluster; log.info("Exception occurred in loading clusters:", e); String outKey = nearestCluster.getIdentifier(); String value = "1\t" + values.toString(); public static void outputPointWithClusterInfo(String key, Vector point, output.collect(new Text(outKey), new Text(value)); .toString(nearestCluster.clusterId))); output.collect(new Text(key), new Text(Integer // lazy compute new centroid centroid = pointTotal.divide(numPoints); else if (centroid == null) { if (numPoints == 0) return pointTotal; private Vector computeCentroid() { public Cluster(Vector center) { return centroid; super(); this.clusterId = nextClusterId++; this.pointTotal = center.like(); this.numPoints = 0; this.center = center; public Cluster(Vector center, int clusterId) { this.clusterId = clusterId; this.converged = clusterId.startsWith("V"); this.clusterId = Integer.parseInt((clusterId.substring(1))); public Cluster(String clusterId) { public String toString() { @Override return getIdentifier() + " - " + center.asFormatString(); else return "V" + clusterId; if (converged) public String getIdentifier() { public void addPoints(int count, Vector delta) { addPoints(1, point); public void addPoint(Vector point) { return "C" + clusterId; numPoints += count; centroid = null; return center; public Vector getCenter() { pointTotal = pointTotal.plus(delta); pointTotal = delta.copy(); if (pointTotal == null) return numPoints; public int getNumPoints() { pointTotal = center.like(); numPoints = 0; center = computeCentroid(); public void recomputeCenter() { return converged; converged = measure.distance(centroid, center) <= convergenceDelta; Vector centroid = computeCentroid(); public boolean computeConvergence() { public Vector getPointTotal() { public boolean isConverged() { KMeansClusterMapper.java public class KMeansClusterMapper extends KMeansMapper { values, output); Vector point = AbstractVector.decodeVector(values.toString()); Cluster.outputPointWithClusterInfo(values.toString(), point, clusters, OutputCollector<Text, Text> output, Reporter reporter) throws IOException { public void map(WritableComparable<?> key, Text values, Reducer<Text, Text, Text, Text> { public class KMeansCombiner extends MapReduceBase implements KMeansCombiner.java String[] numPointnValue = values.next().toString().split("\t"); AbstractVector.decodeVector(numPointnValue[1].trim())); Cluster cluster = new Cluster(key.toString()); while (values.hasNext()) { public void reduce(Text key, Iterator<Text> values, cluster.addPoints(Integer.parseInt(numPointnValue[0].trim()), + cluster.getPointTotal().asFormatString())); output.collect(key, new Text(cluster.getNumPoints() + "\t" public void configure(JobConf job) { Cluster.configure(job); super.configure(job); private static final Logger log = LoggerFactory.getLogger(KMeansDriver.class); KMeansDriver.java public class KMeansDriver { public static void main(String[] args) { private KMeansDriver() { String input = args[0]; double convergenceDelta = Double.parseDouble(args[4]); String measureClass = args[3]; String clusters = args[1]; int maxIterations = Integer.parseInt(args[5]); String output = args[2]; public static void runJob(String input, String clustersIn, String output, maxIterations, 2); runJob(input, clusters, output, measureClass, convergenceDelta, String measureClass, double convergenceDelta, int maxIterations, boolean converged = false; int numCentroids) { log.info("Iteration {}", iteration); String clustersOut = output + "/clusters-" + iteration; while (!converged && iteration < maxIterations) { String delta = Double.toString(convergenceDelta); int iteration = 0; log.info("Clustering "); runClustering(input, clustersIn, output + "/points", measureClass, delta); iteration++; converged = runIteration(input, clustersIn, clustersOut, measureClass, delta, numCentroids); clustersIn = output + "/clusters-" + iteration; int numReduceTasks) { String clustersOut, String measureClass, String convergenceDelta, JobClient client = new JobClient(); private static boolean runIteration(String input, String clustersIn, JobConf conf = new JobConf(KMeansDriver.class); FileInputFormat.setInputPaths(conf, new Path(input)); Path outPath = new Path(clustersOut); conf.setOutputValueClass(Text.class); conf.setOutputKeyClass(Text.class); FileOutputFormat.setOutputPath(conf, outPath); conf.setCombinerClass(KMeansCombiner.class); conf.setMapperClass(KMeansMapper.class); conf.setOutputFormat(SequenceFileOutputFormat.class); conf.set(Cluster.CLUSTER_CONVERGENCE_KEY, convergenceDelta); conf.set(Cluster.CLUSTER_PATH_KEY, clustersIn); conf.setNumReduceTasks(numReduceTasks); conf.set(Cluster.DISTANCE_MEASURE_KEY, measureClass); conf.setReducerClass(KMeansReducer.class); FileSystem fs = FileSystem.get(conf); client.setConf(conf); JobClient.runJob(conf); log.warn(e.toString(), e); return true; } catch (IOException e) { return isConverged(clustersOut + "/part-00000", conf, fs); String output, String measureClass, String convergenceDelta) { private static void runClustering(String input, String clustersIn, conf.setNumReduceTasks(0); conf.setMapperClass(KMeansClusterMapper.class); Path outPath = new Path(output); Text key = new Text(); SequenceFile.Reader reader = new SequenceFile.Reader(fs, outPart, conf); Path outPart = new Path(filePath); private static boolean isConverged(String filePath, JobConf conf, FileSystem fs) boolean converged = true; Text value = new Text(); converged = value.toString().charAt(0) == 'V'; while (converged && reader.next(key, value)) { private KMeansJob() { KMeansJob.java public class KMeansJob { + args.length); System.err if (args.length != 7) { System.err.println("Expected number of arguments 10 and received:" public static void main(String[] args) throws IOException { int index = 0; String input = args[index++]; throw new IllegalArgumentException(); .println("Usage:input clustersIn output measureClass convergenceDelta maxIterations numCentroids"); String clusters = args[index++]; int maxIterations = Integer.parseInt(args[index++]); String output = args[index++]; String measureClass = args[index++]; double convergenceDelta = Double.parseDouble(args[index++]); maxIterations, numCentroids); int numCentroids = Integer.parseInt(args[index++]); int numCentroids) throws IOException { // delete the output directory JobConf conf = new JobConf(KMeansJob.class); fs.delete(outPath, true); fs.mkdirs(outPath); if (fs.exists(outPath)) { KMeansDriver.runJob(input, clustersIn, output, measureClass, convergenceDelta, maxIterations, numCentroids); KMeansMapper.java Mapper<WritableComparable<?>, Text, Text, Text> { public class KMeansMapper extends MapReduceBase implements protected List<Cluster> clusters; void config(List<Cluster> clusters) { Cluster.emitPointToNearestCluster(point, clusters, values, output); this.clusters = clusters; if (clusters.isEmpty()) clusters); KMeansUtil.configureWithClusterInfo(job.get(Cluster.CLUSTER_PATH_KEY), clusters = new ArrayList<Cluster>(); KMeansReducer.java throw new NullPointerException("Cluster is empty!!!"); public class KMeansReducer extends MapReduceBase implements protected Map<String, Cluster> clusterMap; .decodeVector(numNValue[1].trim())); cluster.addPoints(Integer.parseInt(numNValue[0].trim()), AbstractVector String[] numNValue = value.split("\t"); String value = values.next().toString(); Cluster cluster = clusterMap.get(key.toString()); output.collect(new Text(cluster.getIdentifier()), new Text(Cluster cluster.computeConvergence(); .formatCluster(cluster))); 486 lines of code long development time non-robust runtime „How“ (Hadoop) Hand-optimized code (data-, load- and system dependent)

Big Data Analytics without System Programming! („What“, not „How“)
Description of „What“? (declarative specification) Data Analyst Larger human base of „data scientists“ Reduction of „human“ latencies Cost reduction Machine Description of „How“? (State of the art in scalable data analysis) Map/Reduce, MPI

BBDC Research Key Achievement: ACM SIGMOD 2015 Research Highlight Award
Stratosphere was accepted as Apache incubator project (2014) and renamed in Flink Flink is top level Apache project since 2015 Spin-off DataArtisans drives the further development

Data Center Aware Processing
Improve data processing with Flink on existing middleware and infrastructure features like Virtualization (e.g. OpenStack, YARN) Software-Defined Networking (e.g. OpenDaylight) Advanced network topologies (e.g. hierarchical multi path) Current research Fault-tolerance Improve network throughput Reduce job execution time Map on infrastructure Improve resource management Flink Execution Graph … Network Aware Virtualization Odej Kao ■ CIT – Complex and Distributed IT Systems, TU Berlin

The Challenge of Digitalization
Climbing the next level Einstein Center Digital Future The Challenge of Digitalization Odej Kao ■ CIT – Complex and Distributed IT Systems, TU Berlin

Changes through digitalization – History repeats
Industrialization IT and automation Digitalization Electrification Communication Elektrifizierung Licht, Antriebe, Fortbewegung Infrastrutkur: Stromnetze Standards: Wechselstrom, Spannung,. All epochs share Brightest minds design the future Various application domains Mobility and daily live change Infra-structures arise New industrial branches Society divided in followers and left-behinds Odej Kao ■ CIT – Complex and Distributed IT Systems, TU Berlin

Learn from the past. Design the future
Attract the brightest minds Listen to the ideas Involve all stakeholders: universities, medical centers, corporations, research institutes, politics Timing: Provide real use cases, real data, real world deployment and evaluation Create excellent research environment Support business and production for sustainability Odej Kao ■ CIT – Complex and Distributed IT Systems, TU Berlin

Einstein Center Digital Future
ECDF as a platform to research, to collaborate, to develop, to educate Interdisciplinary collaboration environment Fast research framework (recruiting, projects set-up, show casing) to match impressive digitalization pace ECDF deployed Private / Public Partnership to acquire funding for 50 new professors with supporting PhDs Collaborative projects(2M Euro/year) Nucleus for future excellence Embedded in state-wide digital initiative Donors’ Endowment Matching funds Collaboration funding Einstein Center Digital Future Collaborative projects, professorships, infrastructure, master studies, public approach, show cases Odej Kao ■ CIT – Complex and Distributed IT Systems, TU Berlin

Most cited researchers
Excellence: 77 Principle Investigators Leibniz price holders GI digital heads Digital champions Some facts Leibniz price Digital heads Digital champions Several of the most cited scientist in Germany (h-Index 70+) Most cited researchers Odej Kao ■ CIT – Complex and Distributed IT Systems, TU Berlin 49

Digital methods, algorithms, infrastructures
Research program Digital methods, algorithms, infrastructures Digital Humanities and Society Digital Industry and services Digital Health Design, improve, adapt digital methods, models, infrastructures New insights with digital methods, models, infrastructures Odej Kao ■ CIT – Complex and Distributed IT Systems, TU Berlin

Can we work interdisciplinary?
Wide commitment to Berlin Digital Agenda Digitalization as focus for long-term collaborative, interdisciplinary research Joint master study program on digitalization Joint infrastructure 5G Lab, OpenLabs, Big Data Center, SmartCity testbeds, … Majority of ECDF funding for collaborative, interdisciplinary projects House of Digitalisation OpenLab and show rooms Public forum Odej Kao ■ CIT – Complex and Distributed IT Systems, TU Berlin

Professorships Digital Humanities and Society
Data Policies/Trust Distributed Security Infrastructure 5G / Future Internet Industry Grade Networks & Clouds Mobile Cloud Computing Open and Secure IoT Ecosystem Physical Principles of IT Security Real-time Signal Processing for Optical Internet Communication Secure/reliable network-based system architectures Semantic Data Intelligence Statistics / Machine Learning Terahertz- and Laser-Spectroscopy Terahertz Sensor Technology Virtual and Augmented Reality Big Data approaches to test genotype phenotype associations Bioinformatics and personalized medicine Biomedical Imaging Digital Laboratory Diagnostics E-Health and Shared Decision Allocation New methods of clinical data documentation and harmonization New methods of genomics data-analysis Open Cancer Informatics Climate and Traffic Digital Civil Engineering Digital Transformation and IT- Infrastructures Enterprise Architecture Management Industry 4.0* Liner-SimLab Smart Buildings in Smart Cities Smart Cities / Cognitive Cities Smart Housing: Networks, Data and Energy Smart Mobility Water and Wastewater Waste water Digital Curation & Research Management Digital Education Digital Public Spheres* Digitalization and Multicultural Aspects Digitalization of the World of Employment Internet of Things Self-determination in the Digital Society Socio-Ecological Transformation and Sustainable Digitalization Wearable Computing Digital Humanities and Society Digital Industry and Services Digital Health Core IT Odej Kao ■ CIT – Complex and Distributed IT Systems, TU Berlin

Announcements (First 18 professorships)
Odej Kao ■ CIT – Complex and Distributed IT Systems, TU Berlin

Public outreach Show rooms and open labs Town hall and public forums
Long night of sciences Show rooms and open labs Town hall and public forums (Lange Nacht der Wissenschaften) website! On 11 June, 2016, more than 70 universities, research institutes, universities of applied science, and technology-oriented companies in Berlin and the Telegrafenberg in Potsdam are opening their doors between 17:00 and 24:00. With approximately 2,000 programme items, local scientists and researchers from universities and research institutes invite you to come and be amazed! As a visitor, you can expect spectacular experiments, exciting lectures, science shows, guided tours through normally inaccessible laboratories, and much more. This annual event, now in its 16th year, allows you to have the chance to take a nocturnal look at the diverse worlds of science and research. Thousands of researchers will be ready to answer your questions and discuss their current work. Collaboration with museums, forums, training centers, … Odej Kao ■ CIT – Complex and Distributed IT Systems, TU Berlin 54

New formats for research and debate
Open Research & Education: Open Access, Open Source, Open Educational Resources, Open Data Transdisciplinary research space for experimental prototyping and reflection Lab with digital production technology (FabLab) Interface to the public: debate, hackathons, meet ups on current societal challenges linked to digitization Physical space for informal encounter and coincidence © Gesche Joost, UdK Odej Kao ■ CIT – Complex and Distributed IT Systems, TU Berlin Odej Kao ■ CIT – Complex and Distributed IT Systems, TU Berlin

Thank you Further information www.cit.tu-berlin.de www.bbdc.berlin
be-digital.berlin stratosphere.eu flink.apache.org Contact berlin.de Odej Kao ■ CIT – Complex and Distributed IT Systems, TU Berlin

Odej Kao Technische Universität Berlin Distributed IT Systems

Similar presentations

Presentation on theme: "Odej Kao Technische Universität Berlin Distributed IT Systems"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Odej Kao Technische Universität Berlin Distributed IT Systems

Similar presentations

Presentation on theme: "Odej Kao Technische Universität Berlin Distributed IT Systems"— Presentation transcript:

Similar presentations

About project

Feedback