Presentation is loading. Please wait.

Presentation is loading. Please wait.

Aug 30, 2012, IIIT Delhi © 2012 IBM Corporation Inferring from the Crowd L Venkata Subramaniam.

Similar presentations


Presentation on theme: "Aug 30, 2012, IIIT Delhi © 2012 IBM Corporation Inferring from the Crowd L Venkata Subramaniam."— Presentation transcript:

1 Aug 30, 2012, IIIT Delhi © 2012 IBM Corporation Inferring from the Crowd L Venkata Subramaniam

2 © 2003 IBM Corporation 2 Event Detection and 360-degree Profile Creation Public Safety Event Detection Intent I am going to the rally tomorrow at 10 Mantar There is a large fire at Mantralaya Sentiment Corruption is a major problem and it sucks that the govt isint doing much about it Personal Attributes Identifiers: name, address, age, gender, occupation… Interests: sports, pets, cuisine… Life Cycle Status: marital, parental Personal Attributes Identifiers: name, address, age, gender, occupation… Interests: sports, pets, cuisine… Life Cycle Status: marital, parental Relationships Personal relationships: family, friends and roommates… Business relationships: co- workers and work/interest network… Relationships Personal relationships: family, friends and roommates… Business relationships: co- workers and work/interest network… Intent Sentiment on products, services, campaigns Personal preferences of products Product Purchase history Suggestions on products & services Intent Sentiment on products, services, campaigns Personal preferences of products Product Purchase history Suggestions on products & services Social Media based 360-degree Consumer Profiles Public Safety Events Life-changing events: relocation, having a baby, getting married, getting divorced, buying a house… Public Safety Events Life-changing events: relocation, having a baby, getting married, getting divorced, buying a house… Public Safety Event Alerting, Mitigation and Management Intelligence Social Media Data Intelligence Management Next Best Action Citizen Intelligence Immigration Data Police Records Passport Data Mobile Records Integrate Social media people profiles with Govt and Security Databases Integrate Social media people profiles with Govt and Security Databases Investigative Management Master Data on Troublemakers & Ringleaders (Internal + External) Master Data on Troublemakers & Ringleaders (Internal + External) Entity Identification 360-degree Social Media Event and People Profiles Personal Events Personal Attributes I am a engineer, mom, and wife Looks like we'll be moving to New Orleans sooner than I thought. Relationships Ritwik and I are both part of the anti makerite movement Citizen Services

3 © 2003 IBM Corporation Social Media based Micro-segmentation and Real-time Correlation Value Proposition –Construct a comprehensive view of entities of interest (e.g., people, companies, products, events) –Identify actionable insights in real-time From –10-100’s of TBs of social media data from sources such as Twitter, blogs, and forums Using –Unstructured data analytics, real-time, and predictive analytics Continuously analyze social media data from a wide range of sources, to construct 360-degree profiles of entities and leverage them in timely decision-making

4 © 2003 IBM Corporation 4 Entity & Relationship Analytics CrawlCrawl Entity Resolution Map/Fuse/Aggregate Extract / Text Analytics Unstructured data sources Entities & Relationships: Object-centric view Entity Views BigInsights / BigData Platform HIL AQL Challenge  Construct and maintain comprehensive profiles of entities and relationships from unstructured data sources  Main Problem: Assemble an entity view of the domain, where each entity aggregates data from thousands of different documents  Multiple stages of complex processing: –Information extraction  From each unstructured document, extract relevant structured records –Entity resolution  Link records (possibly across documents) that are about the same real-world “entity” –Entity population: mapping / fusion / aggregation  Collect all the facts about the same entity into one rich object with clean values and relationships to other entities Entity Integration

5 © 2003 IBM Corporation 360-degree Consumer Profiles from Social Media Personal Attributes Identifiers: name, address, age, gender, occupation… Interests: sports, pets, cuisine… Life Cycle Status: marital, parental Personal Attributes Identifiers: name, address, age, gender, occupation… Interests: sports, pets, cuisine… Life Cycle Status: marital, parental Products Interests Personal preferences of products Product Purchase history Suggestions on products & services Products Interests Personal preferences of products Product Purchase history Suggestions on products & services Social Media based 360-degree Consumer Profiles Life Events Life-changing events: relocation, having a baby, getting married, getting divorced, buying a house… Life Events Life-changing events: relocation, having a baby, getting married, getting divorced, buying a house… Monetizable intent to buy productsLife Events Location announcements Intent to buy a house I'm thinking about buying a home in Buckingham Estates per a recommendation. Anyone have advice on that area? #atx #austinrealestate #austin Looks like we'll be moving to New Orleans sooner than I thought. College: Off to Stanford for my MBA! Bbye chicago! I'm at Starbucks Parque Tezontle I need a new digital camera for my food pictures, any recommendations around 300? What should I buy?? A mini laptop with Windows 7 OR a Apple MacBook!??! Timely Insights Intent to buy various products Current Location Sentiment on products, services, campaigns Incidents damaging reputation Customer satisfaction/attrition Timely Insights Intent to buy various products Current Location Sentiment on products, services, campaigns Incidents damaging reputation Customer satisfaction/attrition Relationships Personal relationships: family, friends and roommates… Business relationships: co-workers and work/interest network… Relationships Personal relationships: family, friends and roommates… Business relationships: co-workers and work/interest network…

6 © 2003 IBM Corporation 6 6 IdAgreement NameDateTotal Amount 1Credit AgreementJune 12, 2009$800,000,000 … IdCompanyRoleCommitment 1Charles Schwab CorporationBorrower 1Citibank, N.A.Administrative Agent 1Citibank, N.A.Lender$90,000,000 1JPMorgan Chase Bank, N.A.Lender$90,000,000 1Bank of America, N.A.Lender$80,000,000 … Loan Information Loan Company Information Loan Document filed by Charles Schwab Corporation On Aug 6, 2009 Extract and cleanse information from headers, tables main content and signatures Extraction: Loan Records from SEC Documents

7 © 2003 IBM Corporation 7 7 Signatures Biographies Committee memberships Who Is James Dimon? Person Information across Documents Do these filings refer to the same person ?  variability in the person ’ s name, lack of a key identifier  supporting attributes vary depending on the context (form type) All these facts need to be linked and integrated Insider Transactions

8 © 2003 IBM Corporation 8 Entity Integration: Master entities Master entities External public data sources (e.g., SEC/FDIC, Twitter, Blogs, Facebook) External data subscriptions (e.g., Acxiom) Extract Entity Resolution Map Fuse Temporal Analyze Entity Integration High-level rule language to specify entity integration - SQL-like statements to populate, aggregate and relate entities - Combines multiple stages of entity analytics into one framework - HIL compiles into Jaql and Hadoop  Entity Population Rules –Mapping and transformation, aggregation –Cleansing, conflict resolution –Entities can be indexed by multiple “dimensions” Facilitate reuse and hierarchical construction of the master data  Entity Resolution Rules –Create links between entities –Rules can incorporate: similarity functions with thresholds scoring blocking for efficient execution

9 © 2003 IBM Corporation Example Application : Lead Generation Micro-segmentation of consumers by hobbies Micro-segmentation of product intents by occupation Real-time product intents enriched with consumer attributes Real-time tracking by micro-segmentation Integration across Social Media sites Entries contain promotional messages, wishful thinking, questions, etc For many of the attributes we need to extract, cleanse, normalize and categorize

10 © 2003 IBM Corporation Social Networks and Communities  Social Network is a graph of individuals (nodes) tied by one or more specific types of interdependencies / interactions (edges).  Social communities are collections of users that display a high degree of relatedness among themselves than rest of the network. 10

11 © 2003 IBM Corporation Topic User Community Models (WWW 2012)  Generative Bayesian models for extracting latent communities from a social network using the link structure as well as the content exchanged between users –Community memberships are dependent on the topics of interest among users and their link relationships –Users can belong to multiple communities –Communities can be related to multiple topics (interests) 11

12 © 2003 IBM Corporation Plate Notation for Topic User Community Model 12

13 © 2003 IBM Corporation Topic Visualization 13

14 © 2003 IBM Corporation 14 (i) Topic proportions for a user, (ii) Community proportions for a user, (iii) Distribution of topics in community 4, (iv) Global Distribution of topics within communities Visualizing Topics and Communities

15 © 2003 IBM Corporation 15 (A1) Unstructured Entity Integration –Complex analytics to populate master data set –Text Analytics: Rule language (AQL) for extracting entities, events, relationships from text and html documents –Entity Integration: Rule language (HIL) to express & customize the integration, cleansing, and aggregation of the master entities (A2) Entity Repository (on MDM) –BigInsights Bridge: Generation of the MDM model for public master entities, from the BigInsights model; and bulk-loading of master entities –Query-based Application Development: Supports the generation of custom queries for individual applications Architecture for Public Master Entities Relational tables with public master entities Relational tables with public master entities A1 Text Analytics and Entity Integration Tooling based on entity model Queries MDM DaaS Applications and Views Data services External public data sources (e.g., SEC/FDIC, Twitter, Blogs, Facebook) External data subscriptions (e.g., Acxiom) A2 Probabilistic Matching BigInsights select cik, Officers, Directors from Company where name = ‘Citigroup’ Enterprise internal Master entities Enterprise internal Master entities

16 © 2003 IBM Corporation 16 Matching Twitter profiles with Internal source Name, work location, job description Employment filter Social media profiles (name, address, gender, age, employment, relationship, …) Social media profiles of IBM employees and their network Resolution Twitter: 45M profiles Name: first, last Home location: city, (state), country Employment: company + role Employee Directory: 460K entries Name: (first, middle, last, preferred) Work location: (city, state, zip, country) Job description Choice of social media profile attributes for linking constrained by availability of IBM BluePage attributes Semantic Name Variations Bill Chamberlin vs. Chamberlain, William H. C. Mohan vs. Mohan Chandrasekaran (Mohan) Geo Proximity Saratoga, CA vs. San Jose, CA New Jersey vs. New York Job Role Disambiguation “Software sales manager at IBM…” vs. “Managing SPSS Sales for Canada…” Current Scenario focused on linking Social Media profiles with Employee database Similar approach to be taken for linking with Customer and Prospect databases Current Demo focused on Name and Location matching, as well as EmployeeOf information

17 © 2003 IBM Corporation Event Detection – using sensors, crowd sensing, social media, etc. 17 Event 6 – 15:15 - warning, excessive crowds Event 1 – 12:10 – traffic accident Event 2 – 14:15 – traffic jam Event 3 – 14:25 – Unidentified object found at train station Event 4 – 14:45 – Fire in commercial establishment Event 5 – 15:05 – warning water pipe broken Event data is uncertain, progressively changing

18 © 2003 IBM Corporation Event Profile  December 2011, Magnitude 6.5 earthquake in Mexico kills 3 people  Actual event time: Sunday, December 11, 2011 at 01:47:26 UTC  Event Support 1123 tweets  WHAT –Methodology: Most frequent keywords extracted from the tweets in the event –#earthquake, Mexico, magnitude, USGS, #Acapulco  WHO –Methodology: Named Entity Extractor used to extract people and organizations –People:guerrero  WHEN –Methodology: Time and date of the first tweet in the event –Sunday, December 11, 2011 at 2:20:00 UTC  WHERE –Methodology: Named Entity Extractor to extract location names from the tweets. Reverse geocode the tweets, most frequent profile locations of the users who have published the tweets in the event –tuxpan guerrero, mexico city, acapulco, iguala, sw mexico, mexico

19 © 2003 IBM Corporation Event Profiles 19 (1) 10:10 river water surging from accumulation of tweets (**) (2) 11:15 fast moving water from accumulation of mobile messages (**) (3) 11:15 flood, major road blocked from accumulation of mobile messages (**) (4) 12:30 flood from accumulation of mobile messages (**) (5) 12:30 traffic accident from accumulation of mobile messages (**) (**) These are progressive events, keep changing as more data becomes available and confidence changes Events are progressive – keep updating as more crowd- source data becomes available Uncertainty (confidence) built in – from the event description to the location Events reflect aggregated data – to prevent overloading by large volume of crowd-source data and to reduce uncertainty by fusing multiple posts Inter-event distance – events are ‘close’ if they share similar semantic meaning, location, time

20 © 2003 IBM Corporation Analytics and Optimization Under Uncertainty  Observed data (sensor and crowd input) is uncertain and is not available for all points on the city network –Data needs to be mathematically estimated for locations that do not have observed data –Effect of other disturbances on the main event needs to be modeled, such as the effect of crowd accumulation, flood, etc., on traffic  There is uncertainty in both the observed data, and the modeled data  Applications such as traffic control, evacuation planning, need to do analytics and optimization under uncertainty –If segment A is dependent on segments B and C, and let us say segment B is affected. Then, the dependency can be such that, the path that goes from C to A will also get affected even though neither C nor A are directly affected. –Now, based on real-time event detection, we can compute the “cascaded impact” based on the dependencies. This will essentially “project” the “reduced capacities” of the segments that are not directly affected. –This in turn can be used for “Evacuation Plans” that adheres to several (source, destination, deadline) pairs that one might want to satisfy. For example, (city, airport, short- deadlines) and (city, suburbs, long-deadlines) or vice-versa depending on the need. 20

21 © 2003 IBM Corporation 21 Data Volume, Velocity, Variety Inconsistent, imprecise, uncertain, unverified, spontaneous, ambiguous, deceptive Uncertainty (1/veracity ) Precise, authoritative, well formed Traditional Data & Processing Smarter Cities Smarter Traffic Weather Modeling Smarter Water Contact Centers Homeland Security Retail Services Medical Transcription Predictive Modeling of Outcomes Disease Progression Market Trends Portfolio Risk Fraud Data Uncertainty at Scale Smart Grid Sensor Data Text, Audio, Video Social Network Data Patient Records Call Detail Records Telco Profiles Credit Card Transactions Electronic Data Interchange SWIFT Account Management CRM Customer Records Market Feeds The need for managing uncertainty at scale is widespread

22 © 2003 IBM Corporation Jun Thank You


Download ppt "Aug 30, 2012, IIIT Delhi © 2012 IBM Corporation Inferring from the Crowd L Venkata Subramaniam."

Similar presentations


Ads by Google