Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Text Analytics for Unlocking the Potential of Big Data Bhavani Pacific Brands 5 1 Text analytics & big data 2 New opportunities with text.

Similar presentations


Presentation on theme: "1 Text Analytics for Unlocking the Potential of Big Data Bhavani Pacific Brands 5 1 Text analytics & big data 2 New opportunities with text."— Presentation transcript:

1 1 Text Analytics for Unlocking the Potential of Big Data Bhavani Raskutti @ Pacific Brands 5 1 Text analytics & big data 2 New opportunities with text analytics 3 Challenges when mining text 4 Solutions to overcome challenges Wrap-up

2 2 Text Analytics for Unlocking the Potential of Big Data Bhavani Raskutti @ Pacific Brands 5 1 Text analytics & big data 2 New opportunities with text analytics 3 Challenges when mining text 4 Solutions to overcome challenges Wrap-up

3 3 Text Analytics & Big Data Data used for Analytics NowOther Data Available Customer Data Demographics Usage summary Product Usage Traditional customer feedback Surveys Customer complaints Inbound emails Transactional data Usage records Sales receipts Outputs from sensors Service assurance Social media data Facebook discussions Twitter feeds Blogs Youtube videos Product Data Mix & usage Access Device Data GPS & locale data …… Linear growthExponential growth

4 4 Text Analytics & Big Data Data used for Analytics NowOther Data Available Customer Data Demographics Usage summary Product Usage Traditional customer feedback Surveys Customer complaints Inbound emails Transactional data Usage records Sales receipts Outputs from sensors Service assurance Social media data Facebook discussions Twitter feeds Blogs Youtube videos Product Data Mix & usage Access Device Data GPS & locale data …… Linear growthExponential growth

5 5 Text Analytics & Big Data Data used for Analytics NowOther Data Available Customer Data Demographics Usage summary Product Usage Traditional customer feedback Surveys Customer complaints Inbound emails Transactional data Usage records Sales receipts Outputs from sensors Service assurance Social media data Facebook discussions Twitter feeds Blogs Youtube videos Product Data Mix & usage Access Device Data GPS & locale data …… Linear growthExponential growth

6 6 Text Analytics & Big Data Data used for Analytics NowOther Data Available Customer Data Demographics Usage summary Product Usage Traditional customer feedback Surveys Customer complaints Inbound emails Transactional data Usage records Sales receipts Outputs from sensors Service assurance Social media data Facebook discussions Twitter feeds Blogs Youtube videos Product Data Mix & usage Access Device Data GPS & locale data …… Linear growthExponential growth

7 7 Text Analytics & Big Data Data used for Analytics NowOther Data Available Customer Data Demographics Usage summary Product Usage Traditional customer feedback Surveys Customer complaints Inbound emails Transactional data Usage records Sales receipts Outputs from sensors Service assurance Social media data Facebook discussions Twitter feeds Blogs Youtube videos Product Data Mix & usage Access Device Data GPS & locale data …… Linear growthExponential growth

8 8 Text Analytics for Unlocking the Potential of Big Data Bhavani Raskutti @ Pacific Brands 5 1 Text analytics & big data 2 New opportunities with text analytics 3 Challenges when mining text 4 Solutions to overcome challenges Wrap-up

9 9 New Opportunities with Text Analytics Mine freely available social media data for: Understanding customer sentiment Identifying major customer concerns Tracking sentiment/issues over time Business implications: Ability to act on negative sentiments quickly Respond to customer concerns in a timely manner Target initiatives appropriately by continuous tracking Superior market research & focus group outcomes

10 10 Sentiment Analysis Methodology: Score based on positive & negative sentiment words OR Use supervised learning with labelled examples New Opportunities No sarcasm detection

11 11 Topic Detection Methodology: 1.Create term frequency matrix from text sequences 2.Use un-supervised learning to create clusters 3.Create cluster descriptions New Opportunities

12 12 Text Analytics for Unlocking the Potential of Big Data Bhavani Raskutti @ Pacific Brands 5 1 Text analytics & big data 2 New opportunities with text analytics 3 Challenges when mining text 4 Solutions to overcome challenges Wrap-up

13 13 Challenges in Text Analytics 1.Creating term frequency matrix for machine learning –One row for each entry –One column for each term/feature describing the entries Treat non-alpha as white space Case-insensitive Term = word

14 14 1. Term Frequency Matrix Challenges Presence of non-informative words Different forms of the same words Spelling error & typos Synonyms Homonyms

15 15 2. Very Large Feature Space Challenges Many different terms within a single entry –10 4 features with just 50 to 100 entries –Sparse entries: Many zeros in the martrix Unsupervised learning –Hard to form cohesive clusters with sparse entries Supervised learning –Traditional statistical learning techniques need at least 10 labelled examples for each uncorrelated feature

16 16 Text Analytics for Unlocking the Potential of Big Data Bhavani Raskutti @ Pacific Brands 5 1 Text analytics & big data 2 New opportunities with text analytics 3 Challenges when mining text 4 Solutions to overcome challenges Wrap-up

17 17 1. Term Frequency Matrix Solutions Presence of non-informative words –Create a list of stopwords –Remove them from consideration Different forms of the same words –Use rule based stemming to remove suffix Spelling error & typos –Use some spell-checker OR –Use n-grams (character sequences) as features 5-grams for 'single bill': 'singl', 'ingle', 'ngle ', 'gle b', 'le bi', 'e bil‘, ' bill' Synonyms –Use a thesaurus (manual or statistical) Homonyms –Provide context by using word pair or triplets as features

18 18 2. Very Large Feature Space Solutions Use feature selection to identify significant features Features are of 3 types: –Very frequent low information content (e.g., stopwords) –Infrequent low information content (occurs once/twice in the set) –Significant middle frequency features Many statistical techniques –Inverse document frequency weight –signal-noise ratio –Average discrimination value –…–… Unsupervised learning Hard to form cohesive clusters with sparse entries

19 19 2. Very Large Feature Space (Cont’d) Solutions Use new techniques based on maximal margin separators that can handle large feature space Support Vector Machines Supervised learning Traditional statistical learning techniques need at least 10 labelled examples for each uncorrelated feature

20 20 Support Vector Machines Solutions Customers who Churned to other providers Customers who are loyal Objective: To learn a separator to identify people likely to churn before they do

21 21 Support Vector Machines Solutions What is a good separator? Maximises margin between two parallel supporting hyperplanes Separator depends on support vectors

22 22 Support Vector Machines Solutions Why does maximising margins work? Small margin means more choice & overfits data Large margin means less choice & no overfitting

23 23 2. Very Large Feature Space (Cont’d) Solutions Use new techniques based on maximal margin separators that can handle large feature space Support Vector Machines –Maximises margin between two classes –Separator depends only on support vectors –Separator obtained using quadratic programming Available in some statistical packages Supervised learning Traditional statistical learning techniques need at least 10 labelled examples for each uncorrelated feature

24 24 Wrap-up Text analytics creates new opportunities for businesses to understand their customers –Understanding customer sentiment –Identifying major customer concerns –Tracking sentiment/issues over time A few challenges in implementing text analytics –Creating term frequency matrix from text sequence –Large number of features in matrix Many techniques to overcome these challenges Now is the time to use text analytics to unlock the potential of big data in your business!!


Download ppt "1 Text Analytics for Unlocking the Potential of Big Data Bhavani Pacific Brands 5 1 Text analytics & big data 2 New opportunities with text."

Similar presentations


Ads by Google