Data Analytics (CS40003) Introduction to Data Lecture #1

Slides:



Advertisements
Similar presentations
Big Data Management and Analytics Introduction Spring 2015 Dr. Latifur Khan 1.
Advertisements

Big Data and Predictive Analytics in Health Care Presented by: Mehadi Sayed President and CEO, Clinisys EMR Inc.
Big Data Workflows N AME : A SHOK P ADMARAJU C OURSE : T OPICS ON S OFTWARE E NGINEERING I NSTRUCTOR : D R. S ERGIU D ASCALU.
StorIT Certified - Big Data Sales Expert Name of the course: StorIT Certified Bigdata Sales Expert Duration: 1 day full time Date: November 12, 2014 Location:
Fraud Detection in Banking using Big Data By Madhu Malapaka For ISACA, Hyderabad Chapter Date: 14 th Dec 2014 Wilshire Software.
CS525: Special Topics in DBs Large-Scale Data Management
Evolution in Coming 10 Years: What's the Future of Network? - Evolution in Coming 10 Years: What's the Future of Network? - Big Data- Big Changes in the.
Meeting of the Management of Statistical Information System (MSIS 2014) Innovation – Open Data Initiative of Government of India – Fostering Innovations,
Lecture-8/ T. Nouf Almujally
Big Data A big step towards innovation, competition and productivity.
Hadoop Ecosystem Overview
BIGDATA AND DATASCIENCE By Sigma Analytics and Computing.
Big Data. What is Big Data? Big Data Analytics: 11 Case Histories and Success Stories
© 2012 IBM Corporation IBM Security Systems 1 © 2013 IBM Corporation 1 Ecommerce Antoine Harfouche.
Distributed Indexing of Web Scale Datasets for the Cloud {ikons, eangelou, Computing Systems Laboratory School of Electrical.
+ Hbase: Hadoop Database B. Ramamurthy. + Motivation-0 Think about the goal of a typical application today and the data characteristics Application trend:
Big Data Analytics Large-Scale Data Management Big Data Analytics Data Science and Analytics How to manage very large amounts of data and extract value.
+ Big Data IST210 Class Lecture. + Big Data Summary by EMC Corporation ( More videos that.
1 Melanie Alexander. Agenda Define Big Data Trends Business Value Challenges What to consider Supplier Negotiation Contract Negotiation Summary 2.
Cloud Computing & Big Data Group 9 Femme L H Sabaru | Aditya Gisheila N P | Aninda Harapan | Harry | Andrew Khosugih.
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware iCARE : A Framework for Big Data Based.
What we know or see What’s actually there Wikipedia : In information technology, big data is a collection of data sets so large and complex that it.
CPS 216: Data-intensive Computing Systems Information about Project 1 Shivnath Babu.
BUSINESS INTELLIGENCE & ADVANCED ANALYTICS DISCOVER | PLAN | EXECUTE JANUARY 14, 2016.
Big Data Yuan Xue CS 292 Special topics on.
Smart Grid Big Data: Automating Analysis of Distribution Systems Steve Pascoe Manager Business Development E&O - NISC.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND.
Big Data Javad Azimi May First of All… Sorry about the language  Feel free to ask any question Please share similar experiences.
This is a free Course Available on Hadoop-Skills.com.
BIG DATA. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database.
INTRODUCTION TO INFORMATION SYSTEMS LECTURE 9: DATABASE FEATURES, FUNCTIONS AND ARCHITECTURES PART (2) أ/ غدير عاشور 1.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Our experience with NoSQL and MapReduce technologies Fabio Souto.
Microsoft Ignite /28/2017 6:07 PM
Hadoop in the Wild CMSC 491 Hadoop-Based Distributed Computing Spring 2016 Adam Shook.
BI 202 Data in the Cloud Creating SharePoint 2013 BI Solutions using Azure 6/20/2014 SharePoint Fest NYC.
From RDBMS to Hadoop A case study Mihaly Berekmeri School of Computer Science University of Manchester Data Science Club, 14th July 2016 Hayden Clark,
A Tutorial on Hadoop Cloud Computing : Future Trends.
Big Data-An Analysis. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult.
CNIT131 Internet Basics & Beginning HTML
CSCI5570 Large Scale Data Processing Systems
Data Analytics 1 - THE HISTORY AND CONCEPTS OF DATA ANALYTICS
Sushant Ahuja, Cassio Cristovao, Sameep Mohta
An Open Source Project Commonly Used for Processing Big Data Sets
CS122B: Projects in Databases and Web Applications Winter 2017
Chapter 14 Big Data Analytics and NoSQL
Hadoopla: Microsoft and the Hadoop Ecosystem
Big-Data Fundamentals
Vocabulary byte - The technical term for 8 bits of data.
Mohammad J. Mansourzadeh
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Hadoop Clusters Tess Fulkerson.
Department of Information Systems
Ministry of Higher Education
Database Design (constructed by Hanh Pham based on slides from books and websites listed under reference section) Big Data.
Big Data - in Performance Engineering
Data Warehousing and Data Mining
Tools for Processing Big Data Jinan Al Aridhee and Christian Bach
Big Data Overview.
Big Data 5 exabytes (1018 bytes) of data were created by human until Today this amount of information is created in two days. In 2012, digital world.
IT Megatrends that shape the Digital Future…
Zoie Barrett and Brian Lam
Charles Tappert Seidenberg School of CSIS, Pace University
Big Data Analysis in Digital Marketing
Data Mining: Concepts and Techniques
Big DATA.
Data Analysis and R : Technology & Opportunity
V. Uddameri Texas Tech University
CS 239 – Big Data Systems Fall 2018
UNIT 6 RECENT TRENDS.
Big Data.
Presentation transcript:

Data Analytics (CS40003) Introduction to Data Lecture #1 Dr. Debasis Samanta Associate Professor Department of Computer Science & Engineering

In today’s discussion… Introduction to data Current trend Big data Big data vs. small data Tools and techniques CS 40003: Data Analytics

Introduction to data Example: 10, 25, …, Kharagpur, 10CS3002, namo@gov.in Anything else? Data vs. Information 100.0, 0.0, 250.0, 150.0, 220.0, 300.0, 110.0 Is there any information? CS 40003: Data Analytics

How large your data is? What is the maximum file size you have dealt so far? Movies/files/streaming video that you have used? What is the maximum download speed you get? To retrieve data stored in distant locations? How fast your computation is? How much time to just transfer from you, process and get result? CS 40003: Data Analytics

Growth of data CS 40003: Data Analytics

Sources of data “Every day, we create 2.5 quintillion bytes of data So much that 90% of the data in the world today has been created in the last two years alone. The data come from several sources sensors used to gather climate information posts to social media sites, digital pictures and videos purchase transaction records cell phone GPS signals etc. …… to name a few! CS 40003: Data Analytics

Examples Social media and networks Scientific instruments (All of us are generating data) Scientific instruments (Collecting all sorts of data) Mobile devices (Tracking all objects all the time) Sensor technology and networks (Measuring all kinds of data) CS 40003: Data Analytics

Now data is Big data! No single standard definition! ‘Big-data’ is similar to ‘Small-data’, but bigger …but having data bigger consequently requires different approaches techniques, tools and architectures …to solve: new problems …and, of course, in a better way Big data is data whose scale, diversity, and complexity require new architecture, techniques, algorithms, and analytics to manage it and extract value and hidden knowledge from it… CS 40003: Data Analytics

Characteristics of Big data: V3 CS 40003: Data Analytics

V3 : V for Volume Volume of data, which needs to be processed is increasing rapidly More storage capacity More computation More tools and techniques CS 40003: Data Analytics

V3: V for Variety Various formats, types, and structures Text, numerical, images, audio, video, sequences, time series, social media data, multi-dimensional arrays, etc… Static data vs. streaming data A single application can be generating/collecting many types of data To extract knowledge all these types of data need to be linked together CS 40003: Data Analytics

V3: V for Velocity Data is being generated fast and need to be processed fast For time-sensitive processes such as catching fraud, big data must be used as it streams into your enterprise in order to maximize its value Scrutinize 5 million trade events created each day to identify potential fraud Analyze 500 million daily call detail records in real-time to predict customer churn faster Sometimes, 2 minutes is too late! The latest we have heard is 10 ns (nano seconds) delay is too much CS 40003: Data Analytics

Big data vs. small data - Optimizations and predictive analytics - Complex statistical analysis - All types of data, and many sources - Very large datasets - More of a real-time - Ad-hoc querying and reporting - Data mining techniques - Structured data, typical sources - Small to mid-size datasets CS 40003: Data Analytics

Big data vs. small data Big data is more real-time in nature than traditional applications Big data architecture Traditional architectures are not well-suited for big data applications (e.g. Exa-data, Tera-data) Massively parallel processing, scale out architectures are well-suited for big data applications CS 40003: Data Analytics

Who are the major players in the world of Big data? Challenges ahead… The Bottleneck is in technology New architecture, algorithms, techniques are needed Also in technical skills Experts in using the new technology and dealing with Big data Who are the major players in the world of Big data? CS 40003: Data Analytics

Big data players CS 40003: Data Analytics

Major players… Google Hadoop MapReduce Mahout Apache Hbase Cassandra CS 40003: Data Analytics

Tools available NoSQL MapReduce Storage Servers Processing DatabasesMongoDB, CouchDB, Cassandra, Redis, BigTable, Hbase, Hypertable, Voldemort, Riak, ZooKeeper MapReduce Hadoop, Hive, Pig, Cascading, Cascalog, mrjob, Caffeine, S4, MapR, Acunu, Flume, Kafka, Azkaban, Oozie, Greenplum Storage S3, HDFS, GDFS Servers EC2, Google App Engine, Elastic, Beanstalk, Heroku Processing R, Yahoo! Pipes, Mechanical Turk, Solr/Lucene, ElasticSearch, Datameer, BigSheets, Tinkerpop CS 40003: Data Analytics

Any question? You may post your question(s) at the “Discussion Forum” maintained in the course Web page. CS 40003: Data Analytics