報告人 : 葉瑞群 日期 :2012/01/9 出處 : IEEE Transactions on Knowledge and Data Engineering.

Slides:



Advertisements
Similar presentations
CACORE TOOLS FEATURES. caCORE SDK Features caCORE Workbench Plugin EA/ArgoUML Plug-in development Integrated support of semantic integration in the plugin.
Advertisements

HTML5 ETDs Edward A. Fox, Sung Hee Park, Nicholas Lynberg, Jesse Racer, Phil McElmurray Digital Library Research Laboratory Virginia Tech ETD 2010, June.
LIBRA: Lightweight Data Skew Mitigation in MapReduce
EHarmony in Cloud Subtitle Brian Ko. eHarmony Online subscription-based matchmaking service Available in United States, Canada, Australia and United Kingdom.
Technology of Data Analytics. INTRODUCTION OBJECTIVE  Data Analytics mindset – shallow and wide, deep when you need it  Quick overview, useful tidbits,
Authors: Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox Publish: HPDC'10, June 20–25, 2010, Chicago, Illinois, USA ACM Speaker: Jia Bao Lin.
Parallel and Distributed IR
Greenstone Digital Library Usage and Implementation By: Paul Raymond A. Afroilan Network Applications Team Preginet, ASTI-DOST.
Undergraduate Poster Presentation Match 31, 2015 Department of CSE, BUET, Dhaka, Bangladesh Wireless Sensor Network Integretion With Cloud Computing H.M.A.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Data Mining on the Web via Cloud Computing COMS E6125 Web Enhanced Information Management Presented By Hemanth Murthy.
作者 :Jin Li, Qian Wang, Cong Wang, Ning Cao, Kui Ren, and Wenjing Lou 出處 :IEEE Transactions on Knowledge and Data Engineering(2011) 日期 :2012/05/15 報告人 :
By Mihir Joshi Nikhil Dixit Limaye Pallavi Bhide Payal Godse.
報告人 : 葉瑞群 日期 : 2011/11/10 出處 : IEEE Transactions on Knowledge and Data Engineering.
Committed to Deliver….  We are Leaders in Hadoop Ecosystem.  We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,
資訊工程系智慧型系統實驗室 iLab 南台科技大學 1 Optimizing Cloud MapReduce for Processing Stream Data using Pipelining 出處 : 2011 UKSim 5th European Symposium on Computer Modeling.
C# A 1 CSC 298 Introduction to C#. C# A 2 What to expect in this class  Background: knowledge of an object oriented language of the C++, Java, … family.
Pepper: An Elastic Web Server Farm for Cloud based on Hadoop Author : S. Krishnan, J.-S. Counio Date : Speaker : Sian-Lin Hong IEEE International.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
Location-aware MapReduce in Virtual Cloud 2011 IEEE computer society International Conference on Parallel Processing Yifeng Geng1,2, Shimin Chen3, YongWei.
Authors: Jiann-Liang Chenz, Szu-Lin Wuy,Yang-Fang Li, Pei-Jia Yang,Yanuarius Teofilus Larosa th International Wireless Communications and Mobile.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
Face Detection And Recognition For Distributed Systems Meng Lin and Ermin Hodžić 1.
Software Engineering for Business Information Systems (sebis) Department of Informatics Technische Universität München, Germany wwwmatthes.in.tum.de Data-Parallel.
CHAPTER FOUR COMPUTER SOFTWARE.
W HAT IS H ADOOP ? Hadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity.
Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Service Computation 2010November 21-26, Lisbon.
Next-Generation Formotus Forms Replace Paper and InfoPath with Mobile Business Applications Created and Deployed Using Microsoft Azure MICROSOFT AZURE.
CSE 548 Advanced Computer Network Security Document Search in MobiCloud using Hadoop Framework Sayan Cole Jaya Chakladar Group No: 1.
Optimizing Cloud MapReduce for Processing Stream Data using Pipelining 作者 :Rutvik Karve , Devendra Dahiphale , Amit Chhajer 報告 : 饒展榕.
MARISSA: MApReduce Implementation for Streaming Science Applications 作者 : Fadika, Z. ; Hartog, J. ; Govindaraju, M. ; Ramakrishnan, L. ; Gunter, D. ; Canon,
An Architecture for Distributed High Performance Video Processing in the Cloud 作者 :Pereira, R.; Azambuja, M.; Breitman, K.; Endler, M. 出處 :2010 IEEE 3rd.
Performance Evaluation of Image Conversion Module Based on MapReduce for Transcoding and Transmoding in SMCCSE Speaker : 吳靖緯 MA0G IEEE.
Alastair Duncan STFC Pre Coffee talk STFC July 2014 The Trials and Tribulations and ultimate success of parallelisation using Hadoop within the SCAPE project.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 4 Computer Software.
Modern Information Retrieval Chapter 9: Parallel and Distributed IR Section 9.1: Introduction Section : MIMD Architectures Inverted Files November.
Module 4 Part 2 Introduction To Software Development : Programming & Languages Introduction To Software Development : Programming & Languages.
CSE 548 Advanced Computer Network Security Trust in MobiCloud using Hadoop Framework Updates Sayan Cole Jaya Chakladar Group No: 1.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.
 Programming - the process of creating computer programs.
Web Log Data Analytics with Hadoop
Expose your Texts Text Analysis on the Internet Geoffrey Rockwell McMaster University Text Analysis Portal for.
CSE 548 Advanced Computer Network Security Trust in MobiCloud using Hadoop Framework Updates Sayan Kole Jaya Chakladar Group No: 1.
Powered by Microsoft Azure, Auctori Is the Next Generation in Multilingual, Global, Search Engine Optimized Web Content Management Systems MICROSOFT AZURE.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
HEMANTH GOKAVARAPU SANTHOSH KUMAR SAMINATHAN Frequent Word Combinations Mining and Indexing on HBase.
HDFS MapReduce Hadoop  Hadoop Distributed File System (HDFS)  An open-source implementation of GFS  has many similarities with distributed file.
A Project of the University Libraries Ball State University Libraries A destination for research, learning, and friends.
Cloud Distributed Computing Environment Hadoop. Hadoop is an open-source software system that provides a distributed computing environment on cloud (data.
Derek Weitzel Grid Computing. Background B.S. Computer Engineering from University of Nebraska – Lincoln (UNL) 3 years administering supercomputers at.
Built on the Powerful Microsoft Azure Platform, Forensic Advantage Helps Public Safety and National Security Agencies Collect, Analyze, Report, and Distribute.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND.
INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.
Big Data Javad Azimi May First of All… Sorry about the language  Feel free to ask any question Please share similar experiences.
Part 1 The Basics of Information Systems. Purpose of Information Systems Information systems ◦ Collects, stores and organizes information ◦ Retrieves.
Built on the Powerful Microsoft Azure Platform, HarmonyPSA Is a Cloud-Based Customer Service and Billing System for IT Solution Providers MICROSOFT AZURE.
MICROSOFT AZURE APP BUILDER PROFILE: RAVERUS LTD. Raverus is a customer-driven company engaged in providing software applications designed to improve and.
Information Systems & Semantic Web University of Koblenz ▪ Landau, Germany Cloud Computing What, why, how? Noam Bercovici Renata Dividino.
Item Based Recommender System SUPERVISED BY: DR. MANISH KUMAR BAJPAI TARUN BHATIA ( ) VAIBHAV JAISWAL( )
Your Interactive Guide to the Digital World Discovering Computers 2012 Chapter 13 Computer Programs and Programming Languages.
Leverage Big Data With Hadoop Analytics Presentation by Ravi Namboori Visit
Big Data is a Big Deal!.
Sushant Ahuja, Cassio Cristovao, Sameep Mohta
Distributed Production
Presentation transcript:

報告人 : 葉瑞群 日期 :2012/01/9 出處 : IEEE Transactions on Knowledge and Data Engineering

1.Introduction 2.Background and Related Work 3.TAPoR 4.Transitioning a Service to Hadoop 5.Evaluation 6.Future Work 7.Conclusion 2

 Improving the response times of text-analysis tools so that users can comparatively analyze large text corpora across a variety of dimensions.  We discuss our experience redesigning and re- implementing four basic TAPoR operations on Hadoop and we report on the performance improvements enabled by the migration. 3

 Hadoop is an open-source implementation of the MapReduce framework in Java.  The canonical example of MapReduce is “counting the occurrences of a word in a set of documents”. 4

 TAPoR is a web-based application developed by digital humanists to support a suite of lexical-analysis tools. The tools offered include listing words and word counts, finding word co- occurrence, generating word clouds, and more. 5

 Hiding the complexity of inter process communications while running an application in a cluster from the user is one of the significant advantages of using Hadoop.  It also provides a virtualized distributed file system, the Hadoop Distributed File System (HDFS). 6

 Using multi-core Cell processors in a cluster environment, an evolutionary code extractor for Java systems, on a 4 machine Hadoop cluster. 7

 The Text Analysis Portal for Research (TAPoR) is a webbased application that provides a suite of text-analysis tools to scholars and researchers in the Digital-Humanities.  It includes a back-end web service called TAPoRware. 8

 A. Determine the feasibility or necessity  Hadoop is always the right tool. If the goal is to leverage additional computing power,existing strategies for high performance computing (HPC)may be better suited. Hadoop’s algorithmic strengths are index building and batch processing; 9

 B. Design the new service  We designed our indexes to allow operations to re-use the same index, to allow us to divide collections into subcollections easily,and to be quickly searchable and sortable 10

11

12

 Other file formats: Our current implementation deals only with plain text files. A collection can in reality be HTML or XML (typically not well-formed), PDF, or office file formats (Word, Open Office). When allowing users to add collections, we should detect the file type and convert it. 13

 This is why we have started migrating the TAPoR operations on Hadoop,This is why we have started migrating the TAPoR operations on Hadoop. 14

END 15