Download presentation
Presentation is loading. Please wait.
1
Introduction to Big Data
Big Data Computing Introduction to Big Data
2
Preface Content of this Lecture:
In this lecture, we will discuss a brief introduction to Big Data: Why Big Data, Where did it come from?, Challenges and applications of Big Data, Characteristics of Big Data i.e. Volume, Velocity, Variety and more V’s. Big Data Computing Introduction to Big Data
3
What’s Big Data? Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization. The trend to larger data sets is due to the additional information derivable from analysis of a single large set of related data, as compared to separate smaller sets with the same total amount of data, allowing correlations to be found to "spot business trends, determine quality of research, prevent diseases, link legal citations, combat crime, and determine real-time roadway traffic conditions.” Big Data Computing Introduction to Big Data
4
Facts and Figures Walmart handles 1 million customer transactions/hour. Facebook handles 40 billion photos from its user base! Facebook inserts 500 terabytes of new data every day. Facebook stores, accesses, and analyzes 30+ Petabytes of generated data. user A flight generates 240 terabytes of flight data in 6-8 hours of flight. More than 5 billion people are calling, texting, tweeting and browsing on mobile phones worldwide. Decoding the human genome originally took 10 years to process; now it can be achieved in one week.8 The largest AT&T database boasts titles including the largest volume of data in one unique database (312 terabytes) and the second largest number of rows in a unique database (1.9 trillion), which comprises AT&T’s extensive calling records. Big Data Computing Introduction to Big Data
5
An Insight Byte: One grain of rice KB(3): One cup of rice:
MB (6): 8 bags of rice: GB (9): 3 Semi trucks of rice: TB (12): 2 container ships of rice PB (15): Blankets ½ of Jaipur Exabyte (18): Blankets West coast Or 1/4th of India Zettabyte (21): Fills Pacific Ocean Yottabyte(24): An earth-sized rice bowl Brontobyte (27): Astronomical size Desktop Internet Big Data Future Big Data Computing Introduction to Big Data
6
What’s making so much data?
Sources: People, machine, organization: Ubiquitous computing More people carrying data-generating devices (Mobile phones with facebook, GPS, Cameras, etc.) Data on the Internet: Internet live stats Big Data Computing Introduction to Big Data
7
Source of Data Generation
12+ TBs of tweet data every day 4.6 billion camera phones world wide 30 billion RFID tags today (1.3B in 2005) data every day ? TBs of 100s of millions of GPS enabled devices sold annually 25+ TBs of log data every day 2+ billion people on the Web by end 2011 76 million smart meters in 2009… 200M by 2014 Big Data Computing Introduction to Big Data
8
An Example of Big Data at Work
Crowdsourcing Big Data Computing Introduction to Big Data
9
Where is the problem? Traditional RDBMS queries isn't sufficient to get useful information out of the huge volume of data To search it with traditional tools to find out if a particular topic was trending would take so long that the result would be meaningless by the time it was computed. Big Data come up with a solution to store this data in novel ways in order to make it more accessible, and also to come up with methods of performing analysis on it. Big Data Computing Introduction to Big Data
10
Challenges Capturing Storing Searching Sharing Analysing Visualization
Big Data Computing Introduction to Big Data
11
IBM considers Big Data (3V’s):
The 3V’s: Volume, Velocity and Variety. Big Data Computing Introduction to Big Data
12
Volume (Scale) Volume: Enterprises are awash with ever-growing data of all types, easily amassing terabytes even Petabytes of information. Turn 12 terabytes of Tweets created each day into improved product sentiment analysis Convert 350 billion annual meter readings to better predict power consumption Big Data Computing Introduction to Big Data
13
Volume (Scale) Data Volume Data volume is increasing exponentially
44x increase from From 0.8 zettabytes to 35zb Data volume is increasing exponentially Exponential increase in collected/generated data Big Data Computing Introduction to Big Data
14
Example 1: CERN’s Large Hydron Collider(LHC)
CERN’s Large Hydron Collider (LHC) generates 15 PB a year Big Data Computing Introduction to Big Data
15
Example 2: The Earthscope
The Earthscope is the world's largest science project. Designed to track North America's geological evolution, this 3.8 million square miles, 67 terabytes of data. It observatory records data over amassing analyzes seismic slips in the San Andreas fault, sure, but also the plume of magma underneath Yellowstone and much, much more. ( /ns/technology_and_science- future_of_technology/#.TmetOdQ--uI) Big Data Computing Introduction to Big Data
16
Velocity (Speed) Velocity: Sometimes 2 minutes is too late. For time- sensitive processes such as catching fraud, big data must be used as it streams into your enterprise in order to maximize its value. Scrutinize 5 million trade events created each day to identify potential fraud Analyze 500 million daily call detail records in real- time to predict customer churn faster Big Data Computing Introduction to Big Data
17
Examples: Velocity (Speed)
Data is begin generated fast and need to be processed fast Online Data Analytics Late decisions missing opportunities Examples E-Promotions: Based on your current location, your purchase history, what you like send promotions right now for store next to you Healthcare monitoring: sensors monitoring your activities and body any abnormal measurements require immediate reaction Big Data Computing Introduction to Big Data
18
Real-time/Fast Data Mobile devices (tracking all objects all the time) Social media and networks (all of us are generating data) Scientific instruments (collecting all sorts of data) Sensor technology and networks (measuring all kinds of data) The progress and innovation is no longer hindered by the ability to collect data But, by the ability to manage, analyze, summarize, visualize, and discover knowledge from the collected data in a timely manner and in a scalable fashion Big Data Computing Introduction to Big Data
19
Variety (Complexity) Variety: Big data is any type of data –
Structured Data (example: tabular data) Unstructured –text, sensor data, audio, video Semi Structured : web data, log files Big Data Computing Introduction to Big Data
20
Examples: Variety (Complexity)
Relational Data (Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Graph Data Social Network, Semantic Web (RDF), … Streaming Data You can only scan the data once A single application can be generating/collecting many types of data Big Public Data (online, weather, finance, etc) To extract knowledge all these types of data need to linked together Big Data Computing Introduction to Big Data
21
The 3 Big V’s (+1) Big 3V’s Plus 1 Volume Velocity Variety Value
Big Data Computing Introduction to Big Data
22
The 3 Big V’s (+1) (+ N more)
Plus many more Veracity Validity Variability Viscosity & Volatility Viability, Venue, Vocabulary, Vagueness, … Big Data Computing Introduction to Big Data
23
Value Integrating Data
Reducing data complexity Increase data availability Unify your data systems All 3 above will lead to increased data collaboration -> add value to your big data Big Data Computing Introduction to Big Data
24
Veracity Veracity refers to the biases ,noise and abnormality in data, trustworthiness of data. 1 in 3 business leaders don’t trust the information they use to make decisions. How can you act upon information if you don’t trust it? Establishing trust in big data presents a huge challenge as the variety and number of sources grows. Big Data Computing Introduction to Big Data
25
Valence Valence refers to the connectedness of big data. Such as in the form of graph networks Big Data Computing Introduction to Big Data
26
Validity Accuracy and correctness of the data relative to a particular use Example: Gauging storm intensity satellite imagery vs social media posts prediction quality vs human impact Big Data Computing Introduction to Big Data
27
Variability How the meaning of the data changes over time
Language evolution Data availability Sampling processes Changes in characteristics of the data source Big Data Computing Introduction to Big Data
28
Viscosity & Volatility
Both related to velocity Viscosity: data velocity relative to timescale of event being studied Volatility: rate of data loss and stable lifetime of data Scientific data often has practically unlimited lifespan, but social / business data may evaporate in finite time Big Data Computing Introduction to Big Data
29
More V’s Viability Venue Vocabulary Vagueness
Which data has meaningful relations to questions of interest? Venue Where does the data live and how do you get it? Vocabulary Metadata describing structure, content, & provenance Schemas, semantics, ontologies, taxonomies, vocabularies Vagueness Confusion about what “Big Data” means Big Data Computing Introduction to Big Data
30
Harnessing Big Data OLTP: Online Transaction Processing
OLAP: Online Analytical Processing (DBMSs) (Data Warehousing) RTAP: Real-Time Analytics Processing (Big Data Architecture & technology) Big Data Computing Introduction to Big Data
31
The Model Has Changed… The Model of Generating/Consuming Data has Changed Old Model: Few companies are generating data, all others are consuming data New Model: all of us are generating data, and all of us are consuming data Big Data Computing Introduction to Big Data
32
Big Data Analytics Big data is more real-time in nature than traditional Dataware house (DW) applications Traditional DW architectures (e.g. Exadata, Teradata) are not well-suited for big data apps Shared nothing, massively parallel processing, scale out architectures are well-suited for big data apps Big Data Computing Introduction to Big Data
33
Big Data Technology Big Data Computing Introduction to Big Data
34
Conclusion In this lecture, we have defined Big Data and discussed the challenges and applications of Big Data. We have also described characteristics of Big Data i.e. Volume, Velocity, Variety and more V’s, Big Data Analytics, Big Data Landscape and Big Data Technology. Big Data Computing Introduction to Big Data
35
References https://nptel.ac.in/courses/106/104/106104189/
Big Data Computing Introduction to Big Data
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.