7 Fun Things to do with MapReduce Chris Hillman – Teradata Data

Slides:



Advertisements
Similar presentations
General Information Software Robot Benri. Characteristics 1. Connect up to 16 cameras. 2. Do six different type of detections. 3. Define sub-areas where.
Advertisements

Starting to Manage Data for Content Analysis. Data Form and Transformation What shape is the data in when you get it? What shape is the data in when you.
Presented by Erin Palmer. Speech processing is widely used today Can you think of some examples? Phone dialog systems (bank, Amtrak) Computers dictation.
The map and reduce functions in MapReduce are easy to test in isolation, which is a consequence of their functional style. For known inputs, they produce.
MapReduce in Action Team 306 Led by Chen Lin College of Information Science and Technology.
Number Recognition Actually recognizing the symbol representing the number.
Developing a MapReduce Application – packet dissection.
MapReduce: Simplified Data Processing on Large Clusters Cloud Computing Seminar SEECS, NUST By Dr. Zahid Anwar.
TotalETL:infoServer Chris Fournier Nathan Clark Scott Longley Cyril Shilnikov MQP Project 2005 Sponsored by TotalETL inc.
LYU 0102 : XML for Interoperable Digital Video Library Recent years, rapid increase in the usage of multimedia information, Recent years, rapid increase.
Toward Automatic Processing and Indexing of Microfilm.
Data-Intensive Text Processing with MapReduce Jimmy Lin The iSchool University of Maryland Sunday, May 31, 2009 This work is licensed under a Creative.
Progress Presentation. Tasks Completed I have resolved most of the bugs in the previous graphs of stanford to the world monitoring data. Completed work.
Outline of Presentation Introduction of digital video libraries Introduction of the CMU Informedia Project Informedia: user perspective Informedia:
7/14/2015EECS 584, Fall MapReduce: Simplied Data Processing on Large Clusters Yunxing Dai, Huan Feng.
MapReduce : Simplified Data Processing on Large Clusters Hongwei Wang & Sihuizi Jin & Yajing Zhang
 Pages made using data in a database  Use scripting languages  Use of RDMS  Look and feel by templates, CSS  High scaling  Admin pages as back end.
Using Dynamic Lists in SAS Stored Processes for Genetic Toxicity Historical Control Data Volker Harm, PhUSE Conference 2011, Brighton.
Batch VIP — A backend system of video processing VIEW Technologies The Chinese University of Hong Kong.
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
Hadoop: The Definitive Guide Chap. 8 MapReduce Features
Hadoop & Cheetah. Key words Cluster  data center – Lots of machines thousands Node  a server in a data center – Commodity device fails very easily Slot.
Automatic for the personnel “Librarian friendly programming” Dror Berger & Meirav Livne IGELU 2014.
Getting Started with ANTLR Chapter 1. Domain Specific Languages DSLs are high-level languages designed for specific tasks DSLs include data formats, configuration.
Face Detection And Recognition For Distributed Systems Meng Lin and Ermin Hodžić 1.
EVA/Minerva, Nov Mariane Aaron The process of uploading items to Europeana, through real-life example of Eretz Israel Museum.
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,
Appraisal and Data Mining of Large Size Complex Documents Rob Kooper, William McFadden and Peter Bajcsy National Center for Supercomputing Applications.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
MapReduce 資工碩一 黃威凱. Outline Purpose Example Method Advanced 資工碩一 黃威凱.
Voice Recognition (Presentation 2) By: Priya Devi A. S/W Developer, Xsys technologies Bangalore.
APPX XML Support Advanced data interchange using a recognized standard.
Parsing java with XML. XML (Extensible Markup language). What is XML? a comprehensive, but difficult superset of HTML to allow you to define and manipulate.
Analysis of Complex Systems John Sherwood Period 2.
© Paradigm Publishing Inc. 4-1 OPERATING SYSTEMS.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
Hadoop System simulation with Mumak Fei Dong, Tianyu Feng, Hong Zhang Dec 8, 2010.
Handwritten Recognition with Neural Network Chatklaw Jareanpon, Olarik Surinta Mahasarakham University.
Machine Learning Extract from various presentations: University of Nebraska, Scott, Freund, Domingo, Hong,
tFileInputEBCDIC Bug Report & Design Recommendation
Chapter 5 Ranking with Indexes 1. 2 More Indexing Techniques n Indexing techniques:  Inverted files - best choice for most applications  Suffix trees.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Map-Reduce examples 1. So, what is it? A two phase process geared toward optimizing broad, widely distributed parallel computing platforms Apache Hadoop.
DART (DFA- DoubleClick for Advertisers) Owned by Google Much like our tracking tags, DFA tags are implemented into our ads so that reporting is streamlined.
MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.
MapReduce: simplified data processing on large clusters Jeffrey Dean and Sanjay Ghemawat.
MapReduce Joins Shalish.V.J. A Refresher on Joins A join is an operation that combines records from two or more data sets based on a field or set of fields,
Transaction al Texts VCAL Literacy Unit 1 Reading & Writing Week 1 & 2.
Aleksandar Drašković Enterprise Architect deroso Solutions GmbH Data shredding: a deep dive into SharePoint 2013 storage architecture.
MapReduce: Simplied Data Processing on Large Clusters Written By: Jeffrey Dean and Sanjay Ghemawat Presented By: Manoher Shatha & Naveen Kumar Ratkal.
Item Based Recommender System SUPERVISED BY: DR. MANISH KUMAR BAJPAI TARUN BHATIA ( ) VAIBHAV JAISWAL( )
Facial Detection via Convolutional Neural Network Nathan Schneider.
MapReduce “MapReduce allows us to stop thinking about fault tolerance.” Cathy O’Neil & Rachel Schutt, 2013.
MapReduce Compiler RHadoop
Sushant Ahuja, Cassio Cristovao, Sameep Mohta
Introduction to MarcEdit
(optional - but then again, all of these are optional)‏
Distributed Operating Systems
SMPS EXHIBITOR
Machine Learning Ali Ghodsi Department of Statistics
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
MapReduce Algorithm Design
eCulture Science Gateway – reloaded
Data processing with Hadoop
SSDT and Database Project Basics
ILogic What’s New.
MAPREDUCE TYPES, FORMATS AND FEATURES
Face Detection Gender Recognition 1 1 (19) 1 (1)
Map Reduce, Types, Formats and Features
The Audio Notetaker Workspace Explained
Presentation transcript:

7 Fun Things to do with MapReduce Chris Hillman – Teradata Data

Agenda Map Tasks Face Detection Character Recognition Speech to Text Shuffling Mass Spectrometer processing Reducers Text Mining Actual Mining Cluster Building

Face Detection in Images Step Step 1. Get a good Open Source Library Step 2. Check the Example

Character Recognition Step More Complex Task than Face Detection SELECT * FROM RecognizeNumberPlate( ON anpr.vehiclelogs

Speech to Text Step Fed up with word count examples? How about counting words in a recorded wav

Proteomics Step Mass Spectrometers Create a lot of data…. In XML format…. It’s nasty to work

Text Mining Step First phases are map tasks Text Extraction and

Actual Mining Step Comparing Seismic surveys taken at different points in

Cluster Building Step Why Build your own cluster? It’s fun You learn lots It gets you invited to parties Physical or Virtual? Physical – more fun, looks impressive, harder to build, maintain, use, cost of power Virtual – performance? Easier to test, try different versions,

Thank you Chris = ++