Distributed Computing with Turing Machine. Turing machine  Turing machines are an abstract model of computation. They provide a precise, formal definition.

Slides:



Advertisements
Similar presentations
Turing Machines Memory = an infinitely long tape Persistent storage A read/write tape head that can move around the tape Initially, the tape contains only.
Advertisements

MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
 Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware  Created by Doug Cutting and.
CS 345: Chapter 9 Algorithmic Universality and Its Robustness
MapReduce.
Based on the text by Jimmy Lin and Chris Dryer; and on the yahoo tutorial on mapreduce at index.html
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
THE CHURCH-TURING T H E S I S “ TURING MACHINES” Pages COMPUTABILITY THEORY.
Developing a MapReduce Application – packet dissection.
A Model of Computation for MapReduce
CS 310 – Fall 2006 Pacific University CS310 Turing Machines Section 3.1 November 6, 2006.
Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.
Theory of Computation. Computation Computation is a general term for any type of information processing that can be represented as an algorithm precisely.
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
Design and Analysis of Algorithms
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Presented by Ravi Teja Pampana
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
MapReduce.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
SOFTWARE SYSTEMS DEVELOPMENT MAP-REDUCE, Hadoop, HBase.
Map Reduce and Hadoop S. Sudarshan, IIT Bombay
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
Whirlwind tour of Hadoop Inspired by Google's GFS Clusters from systems Batch Processing High Throughput Partition-able problems Fault Tolerance.
Randomized Turing Machines
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Introduction to Hadoop and HDFS
f ACT s  Data intensive applications with Petabytes of data  Web pages billion web pages x 20KB = 400+ terabytes  One computer can read
Cloud Distributed Computing Platform 2 Content of this lecture is primarily from the book “Hadoop, The Definite Guide 2/e)
Whirlwind Tour of Hadoop Edward Capriolo Rev 2. Whirlwind tour of Hadoop Inspired by Google's GFS Clusters from systems Batch Processing High.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
Hidemoto Nakada, Hirotaka Ogawa and Tomohiro Kudoh National Institute of Advanced Industrial Science and Technology, Umezono, Tsukuba, Ibaraki ,
THE CHURCH-TURING T H E S I S “ TURING MACHINES” Part 1 – Pages COMPUTABILITY THEORY.
Database Applications (15-415) Part II- Hadoop Lecture 26, April 21, 2015 Mohammad Hammoud.
Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA
Presented by: Katie Woods and Jordan Howell. * Hadoop is a distributed computing platform written in Java. It incorporates features similar to those of.
Computer Science 101 Theory of Computing. Computer Science is... The study of algorithms, with respect to –their formal properties –their linguistic realizations.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
IBM Research ® © 2007 IBM Corporation Introduction to Map-Reduce and Join Processing.
MapReduce & Hadoop IT332 Distributed Systems. Outline  MapReduce  Hadoop  Cloudera Hadoop  Tutorial 2.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.
MapReduce Joins Shalish.V.J. A Refresher on Joins A join is an operation that combines records from two or more data sets based on a field or set of fields,
1 Section 13.1 Turing Machines A Turing machine (TM) is a simple computer that has an infinite amount of storage in the form of cells on an infinite tape.
1 Turing Machines and Equivalent Models Section 13.1 Turing Machines.
MapReduce Basics Chapter 2 Lin and Dyer & /tutorial/
Automata & Formal Languages, Feodor F. Dragan, Kent State University 1 CHAPTER 3 The Church-Turing Thesis Contents Turing Machines definitions, examples,
1 Introduction to Turing Machines
INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.
1 Student Date Time Wei Li Nov 30, 2015 Monday 9:00-9:25am Shubbhi Taneja Nov 30, 2015 Monday9:25-9:50am Rodrigo Sanandan Dec 2, 2015 Wednesday9:00-9:25am.
Chapter 9 Turing Machines What would happen if we change the stack in Pushdown Automata into some other storage device? Truing Machines, which maintains.
Theory of Computation Automata Theory Dr. Ayman Srour.
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
Item Based Recommender System SUPERVISED BY: DR. MANISH KUMAR BAJPAI TARUN BHATIA ( ) VAIBHAV JAISWAL( )
Fall 2013 Lecture 27: Turing machines and decidability CSE 311: Foundations of Computing.
CCD-410 Cloudera Certified Developer for Apache Hadoop (CCDH) Cloudera.
Turing’s Thesis.
Central Florida Business Intelligence User Group
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
Database Applications (15-415) Hadoop Lecture 26, April 19, 2016
Chapter 9 TURING MACHINES.
Finite State Machines.
Chapter 3: The CHURCH-Turing thesis
Lecture 16 (Intro to MapReduce and Hadoop)
P.V.G’s College of Engineering, Nashik
CS639: Data Management for Data Science
5/7/2019 Map Reduce Map reduce.
Formal Definitions for Turing Machines
Presentation transcript:

Distributed Computing with Turing Machine

Turing machine  Turing machines are an abstract model of computation. They provide a precise, formal definition of what it means for a function to be computable.  It is similar to a finite automaton but with an unlimited and unrestricted memory.  Use infinite tape as inputs unlimited memory;  It has a head can read and write symbols and move R/L on the tape;  The tape contains input string and the other tapes is blank.

The distributed computing based on Bigdata to show  Bigdata system is a distributed system, it has an distributed file system named HDFS which can store large data in a cluster, then to manage them.  If the input contains a very large string, even the turing machine can be computed in a polynomial time, it still spend large time to solve it.

Similarities between Bigdata and Turing Machine  Mass storage The Turing machine model uses an infinite tape as its unlimited memory. Turing Machine can store mass input tape and instruction. The bigdata their data is based on internet, it is also very ample  main control system Turing Machine has a certain function to compute which control the head direction to read/write the tape. The bigdata also has a main control system named namenode. It used to distribute the datanode and let client to operate the datanode

Different between bigdata and Turing machine  Some problem can be solved on a deterministic Turing Machine in a polynomial time. It depend on the size of the input and the function which control the slip of the head.  All the input and the compute just can be done on its own tape.  Bigdata use the HDFS system manage the data, can be execute in many computers.

HDFS  Distributed File System  Large Data Assets  HDFS Parts NameNode ◦manage the filesystem namespace ◦manages opening, closing, renaming, etc. ◦maps blocks to datanodes DataNodes ◦manage stores (blocks) – create/delete ◦serves reads/writes for data blocks

HDFS:Data loading

Key/Value pairs  Take a collection of key, value pairs  Map onto a different collection of key, value pairs. Map(k1,v1) -> (k2,v2) shuffling (A,1),(B,2),(C,3)(B,3),(A,2),(C,1) (D,1),(C,1),(B,2)(A,5) Shuffled (A,(1,2,5)) (B,(2,3,2)) (C,(3,1,1)) (D,(1))

Map-Reduce Using the function Map-Reduce to decompose the large computing problem to many small blocks. Using the function Map put them on many computers through the internet,every single machine can distributed compute their own data at the same time. Reduce is a kind of combine, it depend on the key-value model.

Map-Reduce process 1.In the mapping phase, MapReduce takes the input data and feeds each data element to the mapper. 2.In the reducing phase, the reducer processes all the outputs from the mapper and arrives at a final result. 3.In simple terms, the mapper is meant to filter and transform the input into something that the reducer can aggregate over.

Distributed Task Execution Problem Statement: There is a large computational problem that can be divided into multiple parts and results from all parts can be combined together to obtain a final result. Case Study: Simulation of a Digital Communication System There is a software simulator of a digital communication system like WiMAX that passes some volume of random data through the system model and computes error probability of throughput. Each Mapper runs simulation for specified amount of data which is 1/Nth of the required sampling and emit error rate. Reducer computes average error rate. Solution: Problem description is split in a set of specifications and specifications are stored as input data for Mappers. Each Mapper takes a specification, performs corresponding computations and emits results. Reducer combines all emitted parts into the final result.

Conclusion If there are a lot of input maybe 10TB or more, using turing machine will spend a lot of time to solve it. But when using distributed computing, it means split the computing in many blocks, one block is a computer(turing machine) to compute, every computer solve a relative small input, the input in every block just is the 1/N of the original input. It will spend less time to solve problem and make the computing more efficient.

Thank you