Presentation is loading. Please wait.

Presentation is loading. Please wait.

指導老師 : 吳思佩 老師 組員 : 981730 張懷文 981740 嚴訢 981745 余凡 991721 林蔚城 991723 古欣玉 991725 陳皓君 991728 李穎宣 991732 張育華 991741 莊雅涵 991760 李嘉翎.

Similar presentations

Presentation on theme: "指導老師 : 吳思佩 老師 組員 : 981730 張懷文 981740 嚴訢 981745 余凡 991721 林蔚城 991723 古欣玉 991725 陳皓君 991728 李穎宣 991732 張育華 991741 莊雅涵 991760 李嘉翎."— Presentation transcript:

1 指導老師 : 吳思佩 老師 組員 : 981730 張懷文 981740 嚴訢 981745 余凡 991721 林蔚城 991723 古欣玉 991725 陳皓君 991728 李穎宣 991732 張育華 991741 莊雅涵 991760 李嘉翎

2 (a)Describe its possible definitions (a)Describe its possible definitions 991760 李嘉翎

3 What is Big Data? In a literal way, Big Data means the very huge volumes of data. Does big data just mean the data of the volumes is very big? Actually, the definition of Big Data is not as simple as it looks like in literal. Big Data is a set of tool, processes, methods, and procedures of the huge data. Reference 1

4 Big Data Definitions “Big data is the term used to describe the huge volumes of data generated by traditional business activities and from new sources such as social media.” Typical big data includes information from stores, bank ATMs, Facebook posts and YouTube videos. Reference 2

5 Big Data Definitions “Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, analyze, manage, and process the data within a tolerable elapsed time.” Big Data refers to huge data generation, collection, storage tools, platform, analysis systems, methods and the techniques and methods about how to get the information which is meaningful and valuable for the data. Reference 3

6 Big Data characteristic--3Vs model Reference 4

7 Another model for describing Big Data characteristics ── 4Vs model Reference 4

8 Why Big Data is generated? AND Who is generating Big Data?

9 Why Big Data is generated? Big Data is a revolution about information development. With information technology developed highly, people are more dependent on IT, the data is more variable and informatized. The informatized data is generated very fast and huge. Reference 9

10 Who is generating big Data? Social media and networks (All of us are generating data) Scientific instruments (Collecting all sorts of data) Mobile devices (Tracking all objects all the time) Sensor technology and networks (Measuring all kinds of data) Reference 5

11 (b) Its possible challenges and opportunities (b) Its possible challenges and opportunities 991760 李嘉翎

12 Big Data Challenges Big Data Challenges

13 Challenges of Big Data Challenge 1 ── Heterogeneity & Incompleteness How we can understand and use big data when it comes in an unstructured format (ex. Text, image, video or audio) since big data is complexity. Challenge 2 ── Timeliness How we can capture the most important data as it happens and deliver that to the right people in real-time. Reference 6

14 Challenges of Big Data Challenge 3 ── Technology How can we store the data since the size of big data is always Terabyte and even petabyte? Challenge 4 ── Technical skills How we can analyze and understand it given its size and our computational capacity. “The Bottleneck is in technology ◦New architecture, algorithms, techniques are needed Also in technical skills ◦Experts in using the new technology and dealing with big data “ Reference 7 1TB = 1000GB 1PB = 1000TB

15 Challenges of Big Data Challenge 5 ── Privacy Privacy and security to access and deployment. As more data is available, acceptable use of personal data becomes of greater concern to people. Reference 7

16 Big Data Opportunities Big Data Opportunities

17 Opportunities of Big Data Industry Improve productivity Increase sales Avoid Fraud and Risk Gain competitive advantage Create substantial value for the world economy. Reference 8

18 Opportunities of Big Data Medical science Diagnose the cause of disease more correctly and faster. Predict what kind of illness might occur and give treatments previously to avoid patients get worse. Improve medical methods and skills. Reference 8

19 Opportunities of Big Data Online-Shopping & Ad agency Recommend the item you might like or be interested in. Predict who will be the customers of different items or service. Give different ads to different groups to decrease cost and get perfect performance. Reference 8

20 Reference Reference 1 Big Data 是什麼 ? Reference 2 Big Data Reference 3 Wiki Reference 4 雲端時代的殺手級應用 —Big Data 海量資料分析 Reference 5 Reference 6 Reference 7 Reference 8 Business 數位時代 Reference 9 遠見電子報

21 (c) Explain how can a corporate deal with the problems associated with big data and explain its possible solutions---- Association Rules learning Association Rules learning 991725 陳皓君

22 Association Rules learning Reference 1 Theoretical basis : “Association rules”, is an important topic of data mining. It is used to explore useful association of each data from big data. We usually use association rules to solve problems such as "If a consumer purchased product A and B, then what else will he be going to buy?" Using association rules make a lot of disorganized data into static data of a small amount and easy to observe and understanding.

23 Reference 1 Association Rules learning Market Basket Analysis's a classic example of Association rules. Supermarket observe the customer purchase records databases and do association rule mining can discover customer's buying habits and promote sales. The association rule is to find out customer's buying habits to understand why these customers buy these products, and find the "association rules", corporations gain benefits and build competitive advantage by mining these rules.

24 Association Rules learning----Case “Beer and Nappies” Reference 2 On Friday afternoons, young American males who buy diapers (nappies) also have a predisposition to buy beer. No one had predicted that result. Once the correlation was uncovered, it was easy to back extrapolate from the effect to the cause.

25 Reference 2 1. Young American males frequently indulge in ritualized carousing behavior with friends of Friday nights. 2. Carousing usually involves the consumption of beer. 3. Most young American males only buy diapers after they have fathered offspring. After seeing the results of the data mining, Wal-Mart moved the beer next to the diapers and both sales went up. Association Rules learning----Case “Beer and Nappies”

26 Computing method Support : P(AUB) Is the probability of purchase two items simultaneously. Like purchase both diapers and beer. Confidence : P(B|A)=P(AUB)/P(A) Is the conditional probability. For example,the probability of buying diapers and beer simultaneously under the conditions of buying diapers. Reference 1

27 minimum support and minimum confidence : The minimum support with minimum confidence depend on the user while meeting (or greater) the minimum support and minimum confidence, the association rules has value of reference. Computing method Reference 1

28 Computing example With a simple example. Table 1 at the next page is a customer purchase records database, containing six transactions. And the items are tennis rackets, tennis, sports shoes, badminton. Consider association rules: tennis racket  tennis Reference 1

29 Computing example transactions 1,2,3,4,6 contains tennis rackets, transactions 1,2,6 contains both tennis rackets and tennis, Support=3/6=0.5, Confidence=3/5=0.6 Table1 Reference 1

30 Computing example If a given minimum support A = 0.5, minimum confidence B = 0.6, association rules tennis racket  tennis has value of reference. we can consider the purchase of a tennis racket and the purchase of tennis exist a strong correlation. Reference 1

31 Advantages & Disadvantages Advantages : The more detailed classification represents the results of the analysis will be more practical, it can increase the reference value of calculation results. Disadvantages : The more amount of data increases, the more computing resources and time we spend, and the difficulty will exponentially increase. Reference 3

32 Conclusion These calculated data can be regarded as an indicator to help "strategic cross-sell capabilities" find out what kind of product information is suitable to provide to a customers let customers see things they most likely to buy. This successful case can be described they highly master their product. Association Rules as an analysis tool for potential relevance, we must have sufficient knowledge of algorithms when we use it, or the big data still not be properly utilized. Reference 4

33 Reference Reference 1 5%BC%8F%E8%A7%84%E5%88%99 Reference 2 Reference 3 ssociation_Folder/DM_association.htm Reference 4

34 (c) Explain how can a corporate deal with the problems associated with big data and explain its possible solutions---- Classification Classification 991732 張育華

35 Classification trees “Classification trees are used to predict membership of cases or objects in the classes of a categorical dependent variable from their measurements on one or more predictor variables. ” “Classification tree analysis is one of the main techniques used in so-called Data Mining.” Reference 1

36 The goal of classification trees “The goal of classification trees is to predict or explain responses on a categorical dependent variable.” “The goal of classification tree analysis is to obtain the most accurate prediction possible.” “The flexibility of classification trees make them a very attractive analysis option.” Reference 1

37 Classification Question  "How likely is person X to buy the newest BMW M5?"  “By creating a classification tree (a decision tree), the data can be mined to determine the likelihood of this person to buy a new M5. Possible nodes on the tree would be age, income level, current number of cars, marital status, kids, homeowner, or renter. The attributes of this person can be used against the decision tree to determine the likelihood of him purchasing the M5.” Reference 5

38 Classification tree Picture 1

39 Steps in Developing a Classifier Step 1. Create a training set “A training set contains a list of objects with known classifications. Ideally the training set should contain many examples (often thousands of objects) so that it includes both common and rare types of objects. The training set should be accurately classified, and it should be a representative sample. Both of these goals can be hard to achieve.” Reference 4

40 Steps in Developing a Classifier Step 2. Select powerful features “The choice of features to measure for each object is crucial in determining the ultimate accuracy of the classifier. Ideally the features should be relevant to the classification, independent, and powerful in separating the different classes.” Reference 4

41 Steps in Developing a Classifier Step 3. Train the classifier “The choice of an algorithm for classification is in many ways the easiest part of developing a scheme for object classification.” Step 4. Assess classifier accuracy “Once a potentially useful classifier has been constructed, the accuracy of the classifier must be measured. Knowledge of the accuracy is necessary both in the application of the classifier and also in comparison of different classifiers.” Reference 4

42 Unique Features of STATISTICA Data Miner(analysis software) “Data Mining with STATISTICA offers a wealth of options and techniques not available in competing products. These features can be critical to maximize ROI in a competitive environment.” “STATISTICA Data Miner can be used by novices, and even offers automatic model builder and wizard-like "Data Miner Recipes," yet offers the most comprehensive selection of methods and techniques for experts to solve even the most complex problems.”  Example: h=RandomForests/Examples/Example1ClassificationRandomForests h=RandomForests/Examples/Example1ClassificationRandomForests Reference 6

43 Netflix The use of data analysis, based on consumer evaluation of past films, predict the user's next want to see what kind of film, so Netflix developed Cinematch (video recommendation engine), using Big data and Data mining, recommended for consumers video. Reference 2 Picture 2

44 Netflix Netflix have all the user's behavior and preferences.  According reports in 2012, Netflix has more than 25 million users, 30 million daily average hits, play, pause, fast forward, playback, 4 million times the evaluation behavior 3,000,000 times Search for action.  According to, Netflix have all the user's behavior and preferences, every click, play, pause, fast forward, playback, viewing time, frequency and period, will become an event. In addition, each movie will add a different label, such as director, actor, screenwriter, producer, type, plot, etc. These records will be kept down, and put each data into backend data analysis.  According to official data, 75% of users mostly accepted the Netflix movie recommendation. Reference 2

45 It was revealed they collect in 2011(at least) More than 25 million users About 30 million plays per day (and it tracks every time you rewind, fast forward and pause a movie) More than 2 billion hours of streaming video watched during the last three months of 2011 alone About 4 million ratings per day About 3 million searches per day Geo-location data Device information Time of day and week (it now can verify that users watch more TV shows during the week and more movies during the weekend) Metadata from third parties such as Nielsen Social media data from Facebook and Twitter Reference 3

46 House of Cards “ “House of Cards” is one of the first major test cases of this Big Data-driven creative strategy. For almost a year, Netflix executives have told us that their detailed knowledge of Netflix subscriber viewing preferences clinched their decision to license a remake of the popular and critically well regarded 1990 BBC miniseries.” Reference 3 Picture 3

47 House of Cards “Netflix’s data indicated that the same subscribers who loved the original BBC production also gobbled down movies starring Kevin Spacey or directed by David Fincher. Therefore, concluded Netflix executives, a remake of the BBC drama with Spacey and Fincher attached was a no-brainer, to the point that the company committed $100 million for two 13-episode seasons.” Reference 3 Picture 3

48 REFERENCE Reference 1 Reference 2 what-user-wants/ Reference 3 & Picture 3 Reference 4 Reference 5 Reference 6 Picture 1 Picture 2 580x446.jpg

49 (c) Explain how can a corporate deal with the problems associated with big data and explain its possible solutions---- Cluster Analysis Cluster Analysis 991728 李穎宣

50 cluster analysis  Cluster analysis is an exploratory data analysis tool for solving classification problems.  The goal is that the objects in the same group are more similar to each other than to those in other groups.  The purpose is to simplify data.  How to do? Reference 1,2,6

51 Two main types - Hierarchical  Hierarchical ◦To form clusters using a hierarchical cluster analysis, you must select:  „A criterion for determining similarity or distance between cases  „A criterion for determining which clusters are merged at successive steps  „The number of clusters you need to represent your data  Objects that belong to a child cluster also belong to the parent cluster. Reference 5

52 Two main types - Nonhierarchical  Nonhierarchical - K – Means ◦Case is assigned to the cluster for which its distance to the cluster mean is the smallest. ◦Start out with an initial set of means and classify cases based on their distances to the centers. ◦Next, you compute the cluster means again, using the cases that are assigned to the cluster. ◦Then, you reclassify all cases based on the new set of means, keep repeating this step until cluster means don’t change much between successive steps. ◦Finally, you calculate the means of the clusters once again and assign the cases to their permanent clusters. Reference 5

53 Opinion Research Corporation  Who they are : ORC International is a leading global market research firm uniquely able to integrate our people, methods, technology and insights to address clients’ strategic issues, challenges and opportunities.  What they do : ORC International specialize in research related to Customer Equity, Employee Engagement, Business & Market Expansion and Product Development & Innovation. Reference 3

54 ORC – Customer Strategies  Customer Equity Model—a framework for measuring the customer experience and determining effective strategies for influencing consumer behavior and buying decisions, which will ultimately impact your top and bottom line  Brand Assessment & Management  Linkage with Employee Engagement  Win/Loss Analysis  Net Promoter Score/Customer Advocacy  Competitive Analysis  Customer Segmentation Reference 3

55 How to Brand Assessment & Management ??

56 Brand Assessment & Management  Brand Research Core value and competitive advantage Analyze corporate brand  Brand Management Reference 3

57 Brand Management  Brand Management Integration  Brand loyalty  Market positioning and segmentation: clear and understand their own brand and competitors in a different range (such as global, regional and local) of the different positioning, the use of perceptual mapping, cluster analysis, and other related technologies to identify the customer's demand factors and their internal relations, and thus the various target groups to develop appropriate positioning and strategy.  Attitudes and usage  Brand coordination services Reference 4

58 Using cluster analysis to solve  Who is our target and potential consumers?  What are the needs of consumers?  How companies grow? Reference 4

59 Successful  Identifying current and potential risks to your company’s reputation and their likely impact on your profitability and growth  Developing and maintaining corporate positioning that will resonate with your customers  Understanding your brand’s relevance in the market—whether it meets genuine user needs and offers real benefits compared to competitive offerings  Determining your brand’s credibility and sustainability  Monitoring the health of brands over time via in-depth tracking studies on brand/advertising programs  Developing brand and communication strategies and the supporting data -driven rationales/business cases Reference 3

60 Reference Reference 1 Reference 2 Wikipedia Reference 3 px Reference 4 tw/%E7%BE%8E%E5%9B%BD%E6%AC%A7%E7%BB%B 4%E5%B8%8C%E5%85%AC%E5%8F%B8 Reference 5 Reference 6 Youtube

61 C.Explain how can a corporate deal with the problems associated with big data and explain its possible solutions---- C.Explain how can a corporate deal with the problems associated with big data and explain its possible solutions---- Neural Networks Neural Networks 991721 林蔚城

62 What Is A Neural Network? “The simplest definition of a neural network, more properly referred to as an artificial neural network (ANN), is provided by the inventor of one of the first neurocomputers, Dr. Robert Hecht-Nielsen. He defines a neural network as: ◦"...a computing system made up of a number of simple, highly interconnected processing elements, which process information by their dynamic state response to external inputs.” “Although the mathematics involved with neural networking is not a trivial matter, a user can rather easily gain at least an operational understanding of their structure and function.”

63 The Basics of Neural Networks “Neural networks are typically organized in layers. Layers are made up of a number of interconnected nodes which contain an 'activation function'. “ “Patterns are presented to the network via the input layer, which communicates to one or more 'hidden layers' where the actual processing is done via a system of weighted 'connections'. “ “The hidden layers then link to an output layer where the answer is output as shown in the graphic below.”

64 The Basics of Neural Networks Picture 1

65 Backpropagational Neural Networks “ Most ANNs contain some form of 'learning rule' which modifies the weights of the connections according to the input patterns that it is presented with. “ “ The delta rule is often utilized by the most common class of ANNs called 'backpropagational neural networks’ (BPNNs). Backpropagation is an abbreviation for the backwards propagation of error.” “ With the delta rule, as with other types of backpropagation, 'learning' is a supervised process that occurs with each cycle or 'epoch' (i.e. each time the network is presented with a new input pattern) through a forward activation flow of outputs, and the backwards error propagation of weight adjustments. More simply, when a neural network is initially presented with a pattern it makes a random 'guess' as to what it might be. It then sees how far its answer was from the actual one and makes an appropriate adjustment to its connection weights. “

66 Backpropagational Neural Networks Picture 2

67 How Do Neural Networks Differ From Conventional Computing? “A conventional 'serial' computer has a central processor that can address an array of memory locations where data and instructions are stored. Computations are made by the processor reading an instruction as well as any data the instruction requires from memory addresses, the instruction is then executed and the results are saved in a specified memory location as required. In a serial system (and a standard parallel one as well) the computational steps are deterministic, sequential and logical, and the state of a given variable can be tracked from one operation to another.” “In comparison, ANNs are not sequential or necessarily deterministic. There are no complex central processors, rather there are many simple ones which generally do nothing more than take the weighted sum of their inputs from other processors. ANNs do not execute programed instructions; they respond in parallel (either simulated or actual) to the pattern of inputs presented to it. There are also no separate memory addresses for storing data. Instead, information is contained in the overall activation 'state' of the network. 'Knowledge' is thus represented by the network itself, which is quite literally more than the sum of its individual components.”

68 What Applications Should Neural Networks Be Used For? Neural networks are universal approximators, and they work best if the system you are using them to model has a high tolerance to error. they usually work well for: ◦capturing associations or discovering regularities within a set of patterns; ◦where the volume, number of variables or diversity of the data is very great; ◦the relationships between variables are vaguely understood; ◦the relationships are difficult to describe adequately with conventional approaches.

69 What Are Their Limitations? “Backpropagational neural networks (and many other types of networks) are in a sense the ultimate 'black boxes'. Apart from defining the general archetecture of a network and perhaps initially seeding it with a random numbers, the user has no other role than to feed it input and watch it train and await the output. In fact, it has been said that with backpropagation, "you almost don't know what you're doing". “ “Backpropagational networks also tend to be slower to train than other types of networks and sometimes require thousands of epochs. If run on a truly parallel computer system this issue is not really a problem, but if the BPNN is being simulated on a standard serial machine (i.e. a single SPARC, Mac or PC) training can take some time. This is because the machines CPU must compute the function of each node and connection separately, which can be problematic in very large networks with a large amount of data. However, the speed of most current machines is such that this is typically not much of an issue.”

70 What Are Their Advantages Over Conventional Techniques? “Depending on the nature of the application and the strength of the internal data patterns you can generally expect a network to train quite well. This applies to problems where the relationships may be quite dynamic or non-linear. “ “ANNs provide an analytical alternative to conventional techniques which are often limited by strict assumptions of normality, linearity, variable independence etc. Because an ANN can capture many kinds of relationships it allows the user to quickly and relatively easily model phenomena which otherwise may have been very difficult or impossible to explain otherwise.”

71 Reference Reference 1 tml Reference 2 Reference 3 Picture 1 tml Picture 2 html

72 (c) Explain how can a corporate deal with the problems associated with big data and explain its possible solutions---- Pattern Recognition Pattern Recognition 991741 莊雅涵

73 Introduction “In machine learning, pattern recognition is the assignment of a label to a given input value.” “Pattern recognition algorithms generally aim to provide a reasonable answer for all possible inputs and to perform "most likely" matching of the inputs, taking into account their statistical variation.” Reference 1

74 Application “Character recognition: The characters on the Paper can scan from the original character pattern in database is located in easily.” For example : 1. “processing bank checks” : Because the company's accounting staff audit requirements, It is use of photostats to photocopy backup data in the past. But it is ineffective to rummage in the future. Therefore, use of this scanner to scan check quickly and identify symbol, which is on the check accurately so as to greatly enhance accounting staff to handle the speed of verify accounting. Reference 2

75 Application- Character recognition 2. “Scanner captures an image of the text. Image is converted into constituent characters” : barcode Reference 3

76 Application “Face Recognition: C-VIS C-VIS has taken biometrics out of the lab and into the workplace by translating vision technologies into user-friendly applications that meet real-world needs. C-VIS products are currently in use for video surveillance, security systems, airports, casinos, etc.” Reference 4

77 Application-Face Recognition “In recording mode, the FaceSnap RECORDER screen simultaneously shows the live camera shot and the latest sequence of captured facial images.” In search mode, the user can define image groups and search criteria for face identification. Automatic functions for image pattern recognition can also be used to search the image database. Reference 5 Picture 1 Picture 2

78 Application-Face Recognition In the comparison window, images can be viewed, analyzed and compared in detail. Reference 5 Picture 3

79 Application Brush Face Use is the classroom teacher in the roll call, take multiple perspectives on classroom students sub regional photos, then upload your photos to the server, automatically spliced ​​ into a whole map, the system and then the photographs of students Avatar automatic numbering and identification, last seen students personal information, next there "It's me" and "not me” Reference 6 Picture 4

80 Application Fingerprint Recognition: Microsoft Mouse Reader. A mouse that recognizes you. This stylish wireless optical mouse offers the Fingerprint Reader to eliminate password hassles—now user can log on to Web sites and your computer with the touch of a finger. Tilt wheel technology makes navigation easy, and more than six months of battery life lets you stay productive. Reference 3

81 Advantages & Disadvantages Advantages Recognize pattern from different angles. Disadvantages Sometimes the system may recognize wrong. Ex. : My face v. s. My friend’s face Reference 7

82 Reference Reference 1 Wiki : Reference 2 Plustek 全新支票辨識掃描器 讓支票管理變得簡單又可靠 Reference 3 Pattern Recognition : nition.pdf nition.pdf Reference 4 FaceSnap RECORDER 多用途人像处理系统 : Reference 5, Picture 1, Picture 2, Picture 3 C-VIS : Reference 6, Picture 4 刷脸点名神器发明者:实为娱乐而非考勤 : 01/31/c_124301074.htm 01/31/c_124301074.htm Reference 7 Pattern Recognition : 2307856 2307856

83 (c) Explain how can a corporate deal with the problems associated with big data and explain its possible solutions---- Anomaly Detection Anomaly Detection 981745 余凡

84 Anomaly Detection Anomaly Meaning: the situation goes unnormal Detection Meaning: to check something is OK or not

85 WHAT “System compare daily logs with golden logs. If the difference is too high, there must be an anomaly.”--ref1 Golden logs: log files which had been collect and analyze to be some normal activity outline

86 WHAT Statistical Analysis “Use statistical method to analyze and find out Mean, Variance to determine the anomaly is occur or not.”—ref2 Neural Network Analysis “Neural Network system can learn by itself. Within some training, it can recognize the anomaly itself. “ –ref2

87 HOW 1. Must collect amount of log files 2. Analyze these logs to give a summary of some normal activity outline 3. Use these outlines to compare with daily logs 4. If the difference is too high, Anomaly Detection System will alarm the system manager

88 HOW Picture 1

89 WHY Advantage: “Do not need the signature files like misuse detection Do not need to update frequently Can detect error which we don’t know” —ref2 Disadvantage: “The radio of misjudge is usually higher than misuse detection”—ref2

90 WHO

91 Antivirus software Real-time detect the bandwidth of network using Compare the outline of real-time bandwidth using with the normal outline

92 Real-Time Camera Anomaly Detection “System can judge it is anomaly or not by detecting the difference within the real- time screenshot and the normal screenshot”—ref2 Demo Video: &v=NOadEZbAgHc

93 Credit Card Fraud Detection “Using Neural Network Analysis Database record the rules of credit card fraud System compare user credit card record with the rule database to recognize it is fraud or not”—ref3

94 Picture 2

95 Reference Picture 1,3&Ref 1: 創意海豚的部落格 : Picture 2&Ref 3: 信用卡偵測之研究 : Ref 2:Intelligent system lab: anomaly-detection-using-salient-region-for-real-world-video- surveillance anomaly-detection-using-salient-region-for-real-world-video- surveillance

96 (c) Explain how can a corporate deal with the problems associated with big data and explain its possible solutions---- Predictive Modelling Predictive Modelling 981740 嚴訢

97 P redictive M odelling Predictive modelling is the process by which a model is created or chosen to try to best predict the probability of an outcome. In many cases the model is chosen on the basis of detection theory to try to guess the probability of an outcome given a set amount of input data For example:given an email determining how likely that it is spam. Reference 1

98 Predictive Modelling 20 years ago with big data and predictive analytics, the focus was on building a single statistical model and looking for knowledge; we generally used regression algorithms to analyze data; and we used high end workstations for the computations. Today, with big data, we tend to think of collections of models (ensembles, cubes of models, etc.) and focus the actions (not the knowledge) that are possible; we would more typically use algorithms that compute trees or support vector machines; and we do computations over clusters of workstations. Reference 2

99 Phase of Predictive Modeling Reference 3

100 Predictive Modelling for business With the rise of big data, the predictive analytics market has woken up; firms now understand the opportunity to use big data to increase their knowledge of their business, their competitors, and their customers. Firms can use predictive analytics models to reduce risks, make better decisions, and deliver more personal customer experiences. Forrester defines big data predictive analytics solutions as: “Software and/or hardware solutions that allow firms to discover, evaluate, optimize, and deploy predictive models by analyzing big data sources to improve business performance or mitigate risk." Reference 4

101 Predictive Modelling for business predictive analytics is hard to do without the right tools and technologies, given the increasing challenge of storing, processing, and accessing the volume, velocity, and variety of big data. This isn’t a one-time operation; firms must rerun their analysis on new data to make sure the models are still effective and to respond to changes in customer desires and competitors. Many firms analyze data weekly or even continuously. Reference 3

102 A Continuous process Fuels Big data predictive analytics In order to maximize success with predictive analytics programs, organizations must Reference 4

103 Using Predictive Modeling in Several Different Ways. Telecommunications Estimating Demand,Predicting Churn Financial Services Estimating Customer Lifetime Value, Estimating Credit Risk, Targeting Customers, Predicting Startup Success E-Commerce Determining the Next Best Offer Technology Filtering Spam Health Care Improving Care Services Government Predicting Equipment Failure Transportation Optimizing Customer Service Levels Reference 3

104 REFERENCE Reference 1 Reference 2 Reference 3 7/wondering-what-lies-ahead-the-power-of- predictive-modeling/ Reference 4 cument/85601/oid/1-KWYFVB

105 (c) Explain how can a corporate deal with the problems associated with big data and explain its possible solutions---- Sentiment Analysis Sentiment Analysis 991723 古欣玉

106 Sentiment Analysis “Sentiment analysis refers to the application of natural language processing and text analytics to identify and extract subjective information in source materials. Generally speaking, sentiment analysis aims to determine the attitude of a speaker or a writer. The attitude may be his or her judgment or evaluation” “Also known as social media intelligence, this is the branch of business intelligence (BI) whose primary concern is monitoring and interpreting activity on social networks such as Facebook, LinkedIn, YouTube, or Pinterest.” Reference 1

107 Sentiment Analysis “It includes concepts such as social sentiment (the general “feeling” expressed towards your brand by social network users) and predictive analytics, which involves using historical data to make predictions about future activities.” Sentiment analysis can be applied in many aspects such as marketing or consumer research According to sentiment analysis, it divided people’s feeling into two part, positive and negative. For example: I like to drink Starbucks. - positive I think Starbucks is too expensive. - negative Reference 2

108 Sentiment Analysis Advantage – “Sentiment analysis is the future of brand marketing. Before your brand image and message has been fully determined. Understanding how your target audience feels about a product or industry can help you shape your brand message.” Disadvantage – If people use the text with irony, sentiment analysis will classify it as positive or when a word appears in a sentence is negative, it will classify it as negative. It will affect the accuracy and result. Reference 3

109 Starbucks wants to predict its brand reactions and how people feel about their products Picture 1 Picture 2 Reference 4

110 The problem There are many types of social media on the internet now. Like Facebook, Twitter, Microblogging. These social media sites are filled with everyone’s comment or suggestion for a product or company, likes posts, chatting record. “The information from the social network platform is very enormous. For many business, online opinion has turned into a kind of virtual currency. It can accomplish a product or let it enter the market.” Reference 4

111 How to extract the useful information from a great many of comment’s and posts, even the chatting record ? Many enterprises are facing this problem. Reference 5

112 Use the sentiment analysis Through use of social software tools to help companies understand how people feel about them. Starbucks is a classic successful example. “The firm uses a technology platform based on natural language processing and sentiment analysis software, combined with Web site traffic and online news readership data, to track the volume of brand mentions and analyze the sentiments expressed.” Reference 5

113 Why Starbucks is successful ? Gave the product comment is enabled. Starbucks can reply to the customer. Many corporate don’t open this function. Starbucks know that they need to listen to what the customers really want. The more data you have, the more useful information you can use. According to everyone’s comment to Starbucks and their view on the brand, they try their best to improve the product to meet the needs of customers and single out the comments that need to response. Reference 5

114 Reference Reference 1 : Reference 2 : Reference 3 : analysis-to-predict-brand-reactions/ Reference 4 :,1878 Reference 5 : -effective-use-of-social-media/ Picture 1 ks-deals-100-coupon-and-bonus.html Picture 2

115 Assume that you are a team of IT staffs and your team is assigned to provide a cost and benefit evaluation for any of the big data solutions: Sentiment analysis 981730 張懷文

116 How to conduct a sentiment analysis from big data? Use Sendible What is Sendible? Sendible is a platform for engaging with customers, measuring results and monitoring your brand across multiple social media channels at once. Reference:

117 How to Use Sendible? Sendible Tutorial: XFusLrQRWYg XFusLrQRWYg byoJJizjJAI byoJJizjJAI

118 How to conduct a sentiment analysis on Sendible? Step 1. register a new account Step 2. add a monitoring Service(add “Brand and Keyword monitoring”) Reference: How to conduct a sentiment analysis on Sendible?

119 Step 3. type your brand information(keyword at description 、 Brand Name) Reference:

120 How to conduct a sentiment analysis on Sendible? Step 4. Click My Reports (show the dashboard) Reference: How to conduct a sentiment analysis on Sendible?

121 Benefit of Sentiment analysis -fast -understanding Take advantage of big data to: Know your brand image Know how to do brand marketing Understanding your audience(customers) Know how to interaction with audience

122 Cost of Sendible Reference:


Download ppt "指導老師 : 吳思佩 老師 組員 : 981730 張懷文 981740 嚴訢 981745 余凡 991721 林蔚城 991723 古欣玉 991725 陳皓君 991728 李穎宣 991732 張育華 991741 莊雅涵 991760 李嘉翎."

Similar presentations

Ads by Google