Presentation is loading. Please wait.

Presentation is loading. Please wait.

Zhangxi Lin Texas Tech University ISQS 6339, Data Mgmt & BI 1 ISQS 6339, Data Management & Business Intelligence Introduction.

Similar presentations


Presentation on theme: "Zhangxi Lin Texas Tech University ISQS 6339, Data Mgmt & BI 1 ISQS 6339, Data Management & Business Intelligence Introduction."— Presentation transcript:

1 Zhangxi Lin Texas Tech University ISQS 6339, Data Mgmt & BI 1 ISQS 6339, Data Management & Business Intelligence Introduction

2 \\TechShare\coba\d\isqs3358 ISQS 6339, Data Mgmt & BI 2

3 Outline ISQS 6339, Data Mgmt & BI 3 Big Data Definitions of BI Categorizations of BI BI Trend BI tools

4 What is Business Intelligence ISQS 6339, Data Mgmt & BI 4 A Simple Definition: The applications and technologies transforming Business Data into Action Business intelligence (BI) is a business management term refers to applications and technologies which are used to gather, provide access to, and analyze data and information about their company operations. Business intelligence systems can help companies gain more comprehensive knowledge of the factors affecting their business, and help companies to make better business decisions. YouTube: What is BI? 2’ What is BI? Microsoft Business Intelligence Surface Demo 6’34” Microsoft Business Intelligence Surface Demo

5 Data, information, and knowledge ISQS 6339, Data Mgmt & BI 5 Data – a collection of raw value elements or facts used for calculating, reasoning, or measuring. Information – the result of collecting and organizing data in a way that establishes relationship between data items, which thereby provides context and meaning Knowledge – the concept of understanding information based on recognized patterns in a way that provides insight to information.

6 Online Video What is business intelligence? 10’36” What is business intelligence? Retail and Big Data Revolution, 2’12” Retail and Big Data Revolution Big data, 7’12” Big data Big data terms, 31’19” Big data terms

7 Driving force - Big Data A collection of data sets so large and complex that it becomes awkward to work with using on-hand database management tools. Difficulties include capture, storage, search, sharing, analysis, and visualization. The trend to larger data sets is due to the additional information derivable from analysis of a single large set of related data, as compared to separate smaller sets with the same total amount of data. 8/14/2012 7 Copyright 2012

8 ISQS7339, Fall 2012 8

9 Zettabyte (ZB) A quantity of information or information storage capacity equal to 10 21 bytes or 1,000 exabytes. As of April 2012, no storage system has achieved one zettabyte of information. The combined space of all computer hard drives in the world was estimated at approximately 160 exabytes in 2006. Seagate reported selling 330 exabytes worth of hard drives during the 2011 Fiscal Year. As of 2009, the entire World Wide Web was estimated to contain close to 500 exabytes. This is a half zettabyte. 1,000,000,000,000,000,000,000 bytes = 1000 7 bytes = 10 21 bytes 9

10 Data Scale 10

11 Market "Big data" has increased the demand of information management specialists - major companies have spent more than $15 billion for this. This industry is worth more than $100 billion and growing at almost 10% a year. 4.6 billion mobile-phone subscriptions worldwide and between 1 billion and 2 billion people accessing the internet. The world's effective capacity to exchange information through telecommunication networks was 281 petabytes in 1986, 471 petabytes in 1993, 2.2 exabytes in 2000, 65 exabytes in 2007petabytes exabytes It is predicted that the amount of traffic flowing over the internet will reach 667 exabytes annually by 2013.exabytes 8/14/2012 11 Copyright 2012

12 Approach - Cloud Computing Cloud computing is the use of computing resources (hardware and software) that are delivered as a service over a network (typically the Internet). The name comes from the use of a cloud-shaped symbol as an abstraction for the complex infrastructure it contains in system diagrams. Cloud computing entrusts remote services with a user's data, software and computation. Buzzword: SaaS/IaaS/PaaS Buzzword ISQS 6339, Data Mgmt & BI 12

13 Distributed business intelligence Deal with big data – the open & distributed approach LAMP Hadoop MapReduce HDFS NOSQL Zookeeper Storm ISQS7339, Fall 2012 13

14 Apache Hadoop An open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware. The Apache Hadoop framework is composed of the following modules : Hadoop Common - contains libraries and utilities needed by other Hadoop modules Hadoop Distributed File System (HDFS). Hadoop YARN - a resource-management platform responsible for managing compute resources in clusters and using them for scheduling of users' applications. Hadoop MapReduce - a programming model for large scale data processing. Apache Hadoop's MapReduce and HDFS components originally derived respectively from Google's MapReduce and Google File System (GFS) papers.GoogleMapReduceGoogle File System ISQS 6339, Data Mgmt & BI 14

15 A Multi-node Hadoop Cluster ISQS 6339, Data Mgmt & BI 15

16 ISQS 6339, Data Mgmt & BI 16

17 ISQS 6339, Data Mgmt & BI 17

18 ISQS 6339, Data Mgmt & BI 18

19 ISQS 6339, Data Mgmt & BI 19

20 ISQS 6339, Data Mgmt & BI 20

21 ISQS 6339, Data Mgmt & BI 21

22 ISQS 6339, Data Mgmt & BI 22

23 Hadoop 2: Big data's big leap forward The new Hadoop is the Apache Foundation's attempt to create a whole new general framework for the way big data can be stored, mined, and processed. The biggest constraint on scale has been Hadoop’s job handling. All jobs in Hadoop are run as batch processes through a single daemon called JobTracker, which creates a scalability and processing-speed bottleneck. Hadoop 2 uses an entirely new job-processing framework built using two daemons: ResourceManager, which governs all jobs in the system, and NodeManager, which runs on each Hadoop node and keeps the ResourceManager informed about what's happening on that node. ISQS 6339, Data Mgmt & BI 23

24 MapReduce 2.0 – YARN (Yet Another Resource Negotiator) ISQS 6339, Data Mgmt & BI 24

25 The process of BI ISQS 6339, Data Mgmt & BI 25 Data -> information -> knowledge -> actionable plans Data -> information: the process of determining what data is to be collected and managed and in what context Information -> knowledge: The process involving the analytical components, such as data warehousing, online analytical processing, data quality, data profiling, business rule analysis, and data mining Knowledge -> actionable plans: The most important aspect in a BI process

26 Actionable Knowledge ISQS 6339, Data Mgmt & BI 26 An information asset retains its value on if the converted knowledge is actionable. Need some methods for extracting value from knowledge This is not a technical issue but an organizational one – need empowered individuals in the organization to take the action There is an issue of Return on Investment (ROI)

27 BI Problems ISQS 6339, Data Mgmt & BI 27 Structured Detecting Credit card fraud Setting Loan parameters Market segmentation/Mass customization Deciding Marketing mix Customer Churn Reducing employee turnover Improving Quality/Efficiency … Unstructured Data exploration Utilization of resources (stored knowledge) to maximum effectiveness …

28 BI Applications ISQS 6339, Data Mgmt & BI 28 Customer Analytics Customer profiling Targeted marketing Personalization Collaborative filtering Customer satisfaction Customer lifetime value Customer loyalty Sales Channel Analytics Marketing Sales performance and pipeline

29 BI Applications (2) ISQS 6339, Data Mgmt & BI 29 Supply Chain Analytics Supplier and vendor management Shipping Inventory control Distribution analysis Behavior Analysis Purchasing trends Web activity Fraud and abuse detection Customer attrition Social network analysis

30 The Evolution of Business Intelligence ISQS 6339, Data Mgmt & BI 30 1 st Generation – Traditional analytics (query and reporting) 2 nd Generation – Traditional generation (OLAP, data warehousing) 2.5 nd Generation – New traditional generation 3 rd Generation - Advanced analytics Rules, predictive analytics and realtime data mining Stream analytics

31 ISQS 6339, Data Mgmt & BI 31 Business Intelligence Classifications Traditional Analytics 1 st Generation Analytics (Query & Reporting) 2 nd Generation Analytics (OLAP, Data Warehousing) Advanced Analytics/Optimization Rules Predictive Analytics Real-time and traditional Data Mining Stream Analytics* Real-time, continuous, sequential analysis (ranging from basic to advanced analytics) * In lieu of stream analytics, “embedded analytics,” although architecturally different, could potentially play the same role 3 rd -Generation BI Legacy BI “New Traditional” Analytics “2.5-Gen” Analytics (In-Memory OLAP, Search-Based) Source: Bill O’Connell IBM, Aug 2007

32 ISQS 6339, Data Mgmt & BI 32 Business Intelligence Use Cases Traditional Analytics 1 st Generation Analytics (Query & Reporting) 2 nd Generation Analytics (OLAP, Data Warehousing) Advanced Analytics/Optimization Rules Predictive Analytics Real-time and traditional Data Mining Stream Analytics* Real-time, continuous, sequential analysis (ranging from basic to advanced analytics) * In lieu of stream analytics, “embedded analytics,” although architecturally different, could potentially play the same role “New Traditional” Analytics “2.5-Gen” Analytics (In-Memory OLAP, Search-Based) Example Target Solutions: Fraud Detection / Risk CRM Analytic Supply Chain Optimization RFID / Spatial Data Other High-Volume Focus on what is happening RIGHT NOW Real-Time Threshold Focus on what will happen Analytic applications that apply statistical relationships in the form of RULES Focus on what did happen Turning data into information is limited by the relationships which the end-user already knows to look for. Data mining to determine why something happened by unearthing relationships that the end-user may not have known existed. Source: Bill O’Connell IBM, Aug 2007

33 Data Center - The Headquarter of Big Data Case of BaoCloud Center at Shanghai

34 The land for data center at Shanghai 34

35 Customizable Data Center Baocloud data center

36

37

38 38

39 39


Download ppt "Zhangxi Lin Texas Tech University ISQS 6339, Data Mgmt & BI 1 ISQS 6339, Data Management & Business Intelligence Introduction."

Similar presentations


Ads by Google