Big Data Workflows N AME : A SHOK P ADMARAJU C OURSE : T OPICS ON S OFTWARE E NGINEERING I NSTRUCTOR : D R. S ERGIU D ASCALU.

Slides:



Advertisements
Similar presentations
Big Data Management and Analytics Introduction Spring 2015 Dr. Latifur Khan 1.
Advertisements

Hadoop in the Wild CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook.
Shujaat Hussain.  Karmasphere's core technology, the Karmasphere Application Framework, is an open platform that provides independence across Hadoop.
Mihai Pintea. 2 Agenda Hadoop and MongoDB DataDirect driver What is Big Data.
CS525: Special Topics in DBs Large-Scale Data Management
Big Data A big step towards innovation, competition and productivity.
2.3 Methods for Big Data What is “Big Data”? Summarizing Big Data.
This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
© 2011 IBM Corporation Smarter Software for a Smarter Planet The Capabilities of IBM Software Borislav Borissov SWG Manager, IBM.
Big Data. What is Big Data? Big Data Analytics: 11 Case Histories and Success Stories
CIS 9002 Kannan Mohan Department of CIS Zicklin School of Business, Baruch College.
Arben Asllani University of Tennessee at Chattanooga Business Analytics with Management Science Models and Methods Chapter 1 Business Analytics with Management.
MIS – 3030 Business Technologies Social Media & Conversation Big Data.
Big Data Analytics Large-Scale Data Management Big Data Analytics Data Science and Analytics How to manage very large amounts of data and extract value.
How Companies are Using Spark And where the Edge in Big Data will be Matei Zaharia.
+ Big Data IST210 Class Lecture. + Big Data Summary by EMC Corporation ( More videos that.
1 Melanie Alexander. Agenda Define Big Data Trends Business Value Challenges What to consider Supplier Negotiation Contract Negotiation Summary 2.
Big Data: Electronic Gold And why Oreus should invest in Big Data Thomas Snuverink.
SUPPLY CHAIN OF BIG DATA. WHAT IS BIG DATA?  A lot of data  Too much data for traditional methods  The 3Vs  Volume  Velocity  Variety.
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware iCARE : A Framework for Big Data Based.
What we know or see What’s actually there Wikipedia : In information technology, big data is a collection of data sets so large and complex that it.
IoT Meets Big Data Standardization Considerations
Big Data Analytics Platforms. Our Team NameApplication Viborov MichaelApache Spark Bordeynik YanivApache Storm Abu Jabal FerasHPCC Oun JosephGoogle BigQuery.
Group : Lê Minh Hi ế u Phan Th ị Thanh Th ả o Ph ạ m Hoàng Long Nguy ễ n Huy Hùng 1 BIG DATA – NoSQL Topic 1.
BUSINESS INTELLIGENCE & ADVANCED ANALYTICS DISCOVER | PLAN | EXECUTE JANUARY 14, 2016.
Big Data Yuan Xue CS 292 Special topics on.
Smart Grid Big Data: Automating Analysis of Distribution Systems Steve Pascoe Manager Business Development E&O - NISC.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND.
BIG DATA. The information and the ability to store, analyze, and predict based on that information that is delivering a competitive advantage.
Big Data Javad Azimi May First of All… Sorry about the language  Feel free to ask any question Please share similar experiences.
This is a free Course Available on Hadoop-Skills.com.
Group members: Phạm Hoàng Long Nguyễn Huy Hùng Lê Minh Hiếu Phan Thị Thanh Thảo Nguyễn Đức Trí 1 BIG DATA & NoSQL Topic 1:
BIG DATA. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database.
Microsoft Ignite /28/2017 6:07 PM
Hadoop in the Wild CMSC 491 Hadoop-Based Distributed Computing Spring 2016 Adam Shook.
Leverage Big Data With Hadoop Analytics Presentation by Ravi Namboori Visit
A Tutorial on Hadoop Cloud Computing : Future Trends.
Big Data-An Analysis. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult.
COMP9321 Web Application Engineering Semester 2, 2016
Data Analytics (CS40003) Introduction to Data Lecture #1
CNIT131 Internet Basics & Beginning HTML
Data Analytics 1 - THE HISTORY AND CONCEPTS OF DATA ANALYTICS
Image taken from: slideshare
Connected Infrastructure
SNS COLLEGE OF TECHNOLOGY
SAS users meeting in Halifax
Big Data Enterprise Patterns
ANOMALY DETECTION FRAMEWORK FOR BIG DATA
An Open Source Project Commonly Used for Processing Big Data Sets
Tutorial: Big Data Algorithms and Applications Under Hadoop
Big Data.
Chapter 14 Big Data Analytics and NoSQL
Big Data Technology.
Connected Infrastructure
Ministry of Higher Education
Big Data - in Performance Engineering
Overview of big data tools
Big Data Young Lee BUS 550.
Big Data 5 exabytes (1018 bytes) of data were created by human until Today this amount of information is created in two days. In 2012, digital world.
Charles Tappert Seidenberg School of CSIS, Pace University
Dep. of Information Technology By: Raz Dara Mohammad Amin
Big Data: Four Vs Salhuldin Alqarghuli.
Big Data Analysis in Digital Marketing
Big DATA.
V. Uddameri Texas Tech University
Big-Data Analytics with Azure HDInsight
UNIT 6 RECENT TRENDS.
Big Data.
Presentation transcript:

Big Data Workflows N AME : A SHOK P ADMARAJU C OURSE : T OPICS ON S OFTWARE E NGINEERING I NSTRUCTOR : D R. S ERGIU D ASCALU

Introduction to Big Data workflows  “Big Data” is a broad term for datasets that are so large or complex.  “Workflows” are the task oriented and often require more specific data than process.  A “Process” is designed on a higher level scenarios that helps for decision making in organizational level.  Big Data workflow is best illustrated in comparing traditional IT workloads with Big Data workloads.  Big Data workloads may require many servers to run one application whereas traditional IT workloads requires one server to run many application.  Big Data workloads run to the completion and traditional IT workloads run forever.

How Big Data Makes Big Impacts

Characteristics: (5Vs and 1C)  Volume :  Amount of data that is being generated is increasing drastically every day.  Size of the data determines the value and potential of the data and whether it can be considered as Big Data or not.  Velocity :  In this context refers to the speed of generation of data  How fast the data being generated is processed to meet the demands.  Variety :  Different formats of data  E.g. Documents, s, Videos, Images, Audio, Machine logs, Sensor generated data etc.

 Variability :  How consistent is the data in terms of availability or interval of reporting.  Refers to the inconsistency of data available at times.  Veracity :  The quality of the data that is being captured can vary greatly.  Accuracy of the analysis depends on the veracity of the source data.  Complexity :  Data management can be very complex process, especially when large volumes of data come from multiple sources.  These data needs to be linked, connected and correlated in order to be able to extract information from the data.

Big Data Software Tools:  Platform:  Apache Hadoop  SAP HANA etc.  Business Analytics:  JasperSoft BI Suite  Pentaho Business Analytics  Karmasphere Studio and Analyst  Talend Open Studio etc.  Databases/Data Warehouses:  Cassandra- NoSQL Database developed by Facebook  HBase- Apache project  Hive- Hadoop’s data warehouse etc.

Big Data Software Tools:  Data Mining:  RapidMiner  Orange  KEEL  SPMF etc  Software programming and framework:  R- Statistical Software  Python  Julia- Expensive language, faster than R, fairly easy to learn  Hadoop and Hive  Java  SCALA- Java based language for building high-level algorithms  KAFKA and STORM

Figure: IBM Big Data Platform (Source: IBM Research)

Intel Distribution for Apache Hadoop

Challenges:  Data Challenges:  Dealing with the size of big data.  Handling multiplicity of types, sources and formats.  Reacting to the flood of information in the time required by the application.  Handling uncertainty in data quality, data availability.  How timely are the readings.  Finding high quality data from the vast collections of data.  Scalability in generating the data.

 Process Challenges:  Analyzing the data.  Finding the right model for analysis.  Ability to iterate quickly.  Deriving insights in capturing data.  Aligning data from different source.  Transforming data into suitable form for analysis and modeling it.  Management Challenges:  Related to Data privacy, Security, Governance and Ethical issues.  Ensuring that data is used correctly.  Tracking how the data is used, transformed and derived.  Managing its lifecycle.

Workload Optimization in Big Data 1.A task is divided into several subtasks. 2.Use Map step to break the task into several smaller tasks and index for processing. 3.Optimization-Order small units of work 4.Use Reduce step will fetch many results to a single result set. Figure: Steps to optimize performance of Big Data workloads (Source:

MapReduced Explained [3:15 - 8:00]

Application areas for Big Data:  Private Sector  Retail- E.g. Walmart, BestBuy, Target.  Retail Banking- E.g. Bank of America, Wells Fargo, Citi Bank.  Real Estate- E.g. Windermere Real Estate.  Science and Research- E.g. NASA.  Manufacturing  Government  Financial sector  Technology  Social Networking- E.g. Facebook, LinkedIn, Twitter etc.  Electronic Commerce- E.g. eBay, Amazon  Internet Of Things.

1.Briefly explain the challenges in Big Data 2.Describe the characteristics of Big Data. (5Vs & 1C). 3.What are the major application areas for Big Data?

Questions ?

Thank you