We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byFredrick Dorrell
Modified over 2 years ago
Aug 30, 2012, IIIT Delhi © 2012 IBM Corporation Inferring from the Crowd L Venkata Subramaniam
© 2003 IBM Corporation 2 Event Detection and 360-degree Profile Creation Public Safety Event Detection Intent I am going to the rally tomorrow at 10 am @Jantar Mantar There is a large fire at Mantralaya Sentiment Corruption is a major problem and it sucks that the govt isint doing much about it Personal Attributes Identifiers: name, address, age, gender, occupation… Interests: sports, pets, cuisine… Life Cycle Status: marital, parental Personal Attributes Identifiers: name, address, age, gender, occupation… Interests: sports, pets, cuisine… Life Cycle Status: marital, parental Relationships Personal relationships: family, friends and roommates… Business relationships: co- workers and work/interest network… Relationships Personal relationships: family, friends and roommates… Business relationships: co- workers and work/interest network… Intent Sentiment on products, services, campaigns Personal preferences of products Product Purchase history Suggestions on products & services Intent Sentiment on products, services, campaigns Personal preferences of products Product Purchase history Suggestions on products & services Social Media based 360-degree Consumer Profiles Public Safety Events Life-changing events: relocation, having a baby, getting married, getting divorced, buying a house… Public Safety Events Life-changing events: relocation, having a baby, getting married, getting divorced, buying a house… Public Safety Event Alerting, Mitigation and Management Intelligence Social Media Data Intelligence Management Next Best Action Citizen Intelligence Immigration Data Police Records Passport Data Mobile Records Integrate Social media people profiles with Govt and Security Databases Integrate Social media people profiles with Govt and Security Databases Investigative Management Master Data on Troublemakers & Ringleaders (Internal + External) Master Data on Troublemakers & Ringleaders (Internal + External) Entity Identification 360-degree Social Media Event and People Profiles Personal Events Personal Attributes I am a engineer, mom, and wife Looks like we'll be moving to New Orleans sooner than I thought. Relationships Ritwik and I are both part of the anti makerite movement Citizen Services
© 2003 IBM Corporation Social Media based Micro-segmentation and Real-time Correlation Value Proposition –Construct a comprehensive view of entities of interest (e.g., people, companies, products, events) –Identify actionable insights in real-time From –10-100’s of TBs of social media data from sources such as Twitter, blogs, and forums Using –Unstructured data analytics, real-time, and predictive analytics Continuously analyze social media data from a wide range of sources, to construct 360-degree profiles of entities and leverage them in timely decision-making
© 2003 IBM Corporation 4 Entity & Relationship Analytics CrawlCrawl Entity Resolution Map/Fuse/Aggregate Extract / Text Analytics Unstructured data sources Entities & Relationships: Object-centric view Entity Views BigInsights / BigData Platform HIL AQL Challenge Construct and maintain comprehensive profiles of entities and relationships from unstructured data sources Main Problem: Assemble an entity view of the domain, where each entity aggregates data from thousands of different documents Multiple stages of complex processing: –Information extraction From each unstructured document, extract relevant structured records –Entity resolution Link records (possibly across documents) that are about the same real-world “entity” –Entity population: mapping / fusion / aggregation Collect all the facts about the same entity into one rich object with clean values and relationships to other entities Entity Integration
© 2003 IBM Corporation 360-degree Consumer Profiles from Social Media Personal Attributes Identifiers: name, address, age, gender, occupation… Interests: sports, pets, cuisine… Life Cycle Status: marital, parental Personal Attributes Identifiers: name, address, age, gender, occupation… Interests: sports, pets, cuisine… Life Cycle Status: marital, parental Products Interests Personal preferences of products Product Purchase history Suggestions on products & services Products Interests Personal preferences of products Product Purchase history Suggestions on products & services Social Media based 360-degree Consumer Profiles Life Events Life-changing events: relocation, having a baby, getting married, getting divorced, buying a house… Life Events Life-changing events: relocation, having a baby, getting married, getting divorced, buying a house… Monetizable intent to buy productsLife Events Location announcements Intent to buy a house I'm thinking about buying a home in Buckingham Estates per a recommendation. Anyone have advice on that area? #atx #austinrealestate #austin Looks like we'll be moving to New Orleans sooner than I thought. College: Off to Stanford for my MBA! Bbye chicago! I'm at Starbucks Parque Tezontle http://4sq.com/fYReSj I need a new digital camera for my food pictures, any recommendations around 300? What should I buy?? A mini laptop with Windows 7 OR a Apple MacBook!??! Timely Insights Intent to buy various products Current Location Sentiment on products, services, campaigns Incidents damaging reputation Customer satisfaction/attrition Timely Insights Intent to buy various products Current Location Sentiment on products, services, campaigns Incidents damaging reputation Customer satisfaction/attrition Relationships Personal relationships: family, friends and roommates… Business relationships: co-workers and work/interest network… Relationships Personal relationships: family, friends and roommates… Business relationships: co-workers and work/interest network…
© 2003 IBM Corporation 6 6 IdAgreement NameDateTotal Amount 1Credit AgreementJune 12, 2009$800,000,000 … IdCompanyRoleCommitment 1Charles Schwab CorporationBorrower 1Citibank, N.A.Administrative Agent 1Citibank, N.A.Lender$90,000,000 1JPMorgan Chase Bank, N.A.Lender$90,000,000 1Bank of America, N.A.Lender$80,000,000 … Loan Information Loan Company Information Loan Document filed by Charles Schwab Corporation On Aug 6, 2009 Extract and cleanse information from headers, tables main content and signatures Extraction: Loan Records from SEC Documents
© 2003 IBM Corporation 7 7 Signatures Biographies Committee memberships Who Is James Dimon? Person Information across Documents Do these filings refer to the same person ? variability in the person ’ s name, lack of a key identifier supporting attributes vary depending on the context (form type) All these facts need to be linked and integrated Insider Transactions
© 2003 IBM Corporation 8 Entity Integration: Master entities Master entities External public data sources (e.g., SEC/FDIC, Twitter, Blogs, Facebook) External data subscriptions (e.g., Acxiom) Extract Entity Resolution Map Fuse Temporal Analyze Entity Integration High-level rule language to specify entity integration - SQL-like statements to populate, aggregate and relate entities - Combines multiple stages of entity analytics into one framework - HIL compiles into Jaql and Hadoop Entity Population Rules –Mapping and transformation, aggregation –Cleansing, conflict resolution –Entities can be indexed by multiple “dimensions” Facilitate reuse and hierarchical construction of the master data Entity Resolution Rules –Create links between entities –Rules can incorporate: similarity functions with thresholds scoring blocking for efficient execution
© 2003 IBM Corporation Example Application : Lead Generation Micro-segmentation of consumers by hobbies Micro-segmentation of product intents by occupation Real-time product intents enriched with consumer attributes Real-time tracking by micro-segmentation Integration across Social Media sites Entries contain promotional messages, wishful thinking, questions, etc For many of the attributes we need to extract, cleanse, normalize and categorize
© 2003 IBM Corporation Social Networks and Communities Social Network is a graph of individuals (nodes) tied by one or more specific types of interdependencies / interactions (edges). Social communities are collections of users that display a high degree of relatedness among themselves than rest of the network. 10
© 2003 IBM Corporation Topic User Community Models (WWW 2012) Generative Bayesian models for extracting latent communities from a social network using the link structure as well as the content exchanged between users –Community memberships are dependent on the topics of interest among users and their link relationships –Users can belong to multiple communities –Communities can be related to multiple topics (interests) 11
© 2003 IBM Corporation Plate Notation for Topic User Community Model 12
© 2003 IBM Corporation Topic Visualization 13
© 2003 IBM Corporation 14 (i) Topic proportions for a user, (ii) Community proportions for a user, (iii) Distribution of topics in community 4, (iv) Global Distribution of topics within communities Visualizing Topics and Communities
© 2003 IBM Corporation 15 (A1) Unstructured Entity Integration –Complex analytics to populate master data set –Text Analytics: Rule language (AQL) for extracting entities, events, relationships from text and html documents –Entity Integration: Rule language (HIL) to express & customize the integration, cleansing, and aggregation of the master entities (A2) Entity Repository (on MDM) –BigInsights Bridge: Generation of the MDM model for public master entities, from the BigInsights model; and bulk-loading of master entities –Query-based Application Development: Supports the generation of custom queries for individual applications Architecture for Public Master Entities Relational tables with public master entities Relational tables with public master entities A1 Text Analytics and Entity Integration Tooling based on entity model Queries MDM DaaS Applications and Views Data services External public data sources (e.g., SEC/FDIC, Twitter, Blogs, Facebook) External data subscriptions (e.g., Acxiom) A2 Probabilistic Matching BigInsights select cik, Officers, Directors from Company where name = ‘Citigroup’ Enterprise internal Master entities Enterprise internal Master entities
© 2003 IBM Corporation 16 Matching Twitter profiles with Internal source Name, work location, job description Employment filter Social media profiles (name, address, gender, age, employment, relationship, …) Social media profiles of IBM employees and their network Resolution Twitter: 45M profiles Name: first, last Home location: city, (state), country Employment: company + role Employee Directory: 460K entries Name: (first, middle, last, preferred) Work location: (city, state, zip, country) Job description Choice of social media profile attributes for linking constrained by availability of IBM BluePage attributes Semantic Name Variations Bill Chamberlin vs. Chamberlain, William H. C. Mohan vs. Mohan Chandrasekaran (Mohan) Geo Proximity Saratoga, CA vs. San Jose, CA New Jersey vs. New York Job Role Disambiguation “Software sales manager at IBM…” vs. “Managing SPSS Sales for Canada…” Current Scenario focused on linking Social Media profiles with Employee database Similar approach to be taken for linking with Customer and Prospect databases Current Demo focused on Name and Location matching, as well as EmployeeOf information
© 2003 IBM Corporation Event Detection – using sensors, crowd sensing, social media, etc. 17 Event 6 – 15:15 - warning, excessive crowds Event 1 – 12:10 – traffic accident Event 2 – 14:15 – traffic jam Event 3 – 14:25 – Unidentified object found at train station Event 4 – 14:45 – Fire in commercial establishment Event 5 – 15:05 – warning water pipe broken Event data is uncertain, progressively changing
© 2003 IBM Corporation Event Profile December 2011, Magnitude 6.5 earthquake in Mexico kills 3 people Actual event time: Sunday, December 11, 2011 at 01:47:26 UTC Event Support 1123 tweets WHAT –Methodology: Most frequent keywords extracted from the tweets in the event –#earthquake, Mexico, magnitude, USGS, #Acapulco WHO –Methodology: Named Entity Extractor used to extract people and organizations –People:guerrero WHEN –Methodology: Time and date of the first tweet in the event –Sunday, December 11, 2011 at 2:20:00 UTC WHERE –Methodology: Named Entity Extractor to extract location names from the tweets. Reverse geocode the tweets, most frequent profile locations of the users who have published the tweets in the event –tuxpan guerrero, mexico city, acapulco, iguala, sw mexico, mexico
© 2003 IBM Corporation Event Profiles 19 (1) 10:10 river water surging from accumulation of tweets (**) (2) 11:15 fast moving water from accumulation of mobile messages (**) (3) 11:15 flood, major road blocked from accumulation of mobile messages (**) (4) 12:30 flood from accumulation of mobile messages (**) (5) 12:30 traffic accident from accumulation of mobile messages (**) (**) These are progressive events, keep changing as more data becomes available and confidence changes Events are progressive – keep updating as more crowd- source data becomes available Uncertainty (confidence) built in – from the event description to the location Events reflect aggregated data – to prevent overloading by large volume of crowd-source data and to reduce uncertainty by fusing multiple posts Inter-event distance – events are ‘close’ if they share similar semantic meaning, location, time
© 2003 IBM Corporation Analytics and Optimization Under Uncertainty Observed data (sensor and crowd input) is uncertain and is not available for all points on the city network –Data needs to be mathematically estimated for locations that do not have observed data –Effect of other disturbances on the main event needs to be modeled, such as the effect of crowd accumulation, flood, etc., on traffic There is uncertainty in both the observed data, and the modeled data Applications such as traffic control, evacuation planning, need to do analytics and optimization under uncertainty –If segment A is dependent on segments B and C, and let us say segment B is affected. Then, the dependency can be such that, the path that goes from C to A will also get affected even though neither C nor A are directly affected. –Now, based on real-time event detection, we can compute the “cascaded impact” based on the dependencies. This will essentially “project” the “reduced capacities” of the segments that are not directly affected. –This in turn can be used for “Evacuation Plans” that adheres to several (source, destination, deadline) pairs that one might want to satisfy. For example, (city, airport, short- deadlines) and (city, suburbs, long-deadlines) or vice-versa depending on the need. 20
© 2003 IBM Corporation 21 Data Volume, Velocity, Variety Inconsistent, imprecise, uncertain, unverified, spontaneous, ambiguous, deceptive Uncertainty (1/veracity ) Precise, authoritative, well formed Traditional Data & Processing Smarter Cities Smarter Traffic Weather Modeling Smarter Water Contact Centers Homeland Security Retail Services Medical Transcription Predictive Modeling of Outcomes Disease Progression Market Trends Portfolio Risk Fraud Data Uncertainty at Scale Smart Grid Sensor Data Text, Audio, Video Social Network Data Patient Records Call Detail Records Telco Profiles Credit Card Transactions Electronic Data Interchange SWIFT Account Management CRM Customer Records Market Feeds The need for managing uncertainty at scale is widespread
© 2003 IBM Corporation 22 01- Jun- 2010 Thank You
INTRODUCTION TO INFORMATION SYSTEMS LECTURE 9: DATABASE FEATURES, FUNCTIONS AND ARCHITECTURES PART (2) أ/ غدير عاشور 1.
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware iCARE : A Framework for Big Data Based.
IoT Meets Big Data Standardization Considerations
Smarter Transportation Data… the next natural resource for Smarter Cities Eric-Mark Huitema MD Smarter Transportation, IBM
Web 2.0: Concepts and Applications 5 Connecting People.
Big Data Javad Azimi May First of All… Sorry about the language Feel free to ask any question Please share similar experiences.
This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
Chapter 14 The Second Component: The Database.
Unlock your Big Data with Analytics and BI on Office365 Brian Culver ● SharePoint Fest Seattle● BI102 ● August 18-20, 2015.
1 Business Intelligence in the Information Age © 2006 Acxiom Corporation. All Rights Reserved. Carmen McKenna-McWilliams Marketing Technology Center of.
BIG DATA. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database.
1 Causal Analytics with Social Media Content Lipika Dey Innovation Labs, Delhi.
CIS Information and Databases 1 Information and Databases.
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
© 2010 IBM Corporation Smarter Systems for a Smarter Planet Presenter Name – Presenter Title MM/DD/Year.
Relgo Networks, Inc. Jubilee Hills, Hyderabad Realty Enterprise Resource Planning (ERP)
ACS1803 Lecture Outline 2 DATA MANAGEMENT CONCEPTS Text, Ch. 3 How do we store data (numeric and character records) in a computer so that we can optimize.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
© 2009 IBM Corporation Smarter Decisions for Optimized Performance IBM Global Executive Forum Panel Discussion Business Analytics and Optimization Fred.
+ Big Data IST210 Class Lecture. + Big Data Summary by EMC Corporation (http://www.emc.com) More videos that.
Advance Analytics Capabilities
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
Course Instructor: Aisha Azeem
The Database Environment
Information Retrieval in Practice
© 2013 IBM Corporation October 4, 2013 IT Analytics and Big Data IBM Solutions Paul Smith (Smitty) Service Management Architect.
Digital marketing: Uses digital media to develop communications and exchanges with customers Electronic media (E-marketing): Refers to the strategic.
Shatter Performance & Growth Limits with Measurement Use Data to Understand Your Customers.
Developing, Managing & Using Customer-related Databases Semester Ganjil 2014/2015.
Use of Electronic and Internet advertising options Standard 3.4.
Overview of Search Engines
Creating Collaborative Partnerships
Chapter 1 Business Driven Technology MANGT 366 Information Technology for Business Chapter 1: Management Information Systems: Business Driven MIS.
Data Mining – Intro.
1.Understand the decision-making process of consumer purchasing online. 2.Describe how companies are building one-to-one relationships with customers.
Data Staging Data Loading and Cleaning Marakas pg. 25 BCIS 4660 Spring 2012.
02/15/1999UT Austin: GSLIS LIS Information Management LIS /15/99 Martha Richardson.
CHAPTER 7 Roderick Dickson Kelli Grubb Tracyann Pryce Shakita White.
Search 2.0: The Next Chapter of Search Gora Sudindranath Senior Solutions Consultant BCS ISRG Search Solutions, May 2007.
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
Describe the use of technology in the financial-information management function.
Copyright © 2009 Pearson Education, Inc. Slide 6-1 Chapter 6 E-commerce Marketing Concepts.
© 2011 IBM Corporation Smarter Software for a Smarter Planet The Capabilities of IBM Software Borislav Borissov SWG Manager, IBM.
Business Intelligence - 2 BUS 782. Topics Data warehousing Data Mining.
Course : Study of Digital Convergence. Name : Srijana Acharya. Student ID : Date : 11/28/2014. Big Data Analytics and the Telco : How Telcos.
By: Mr Hashem Alaidaros MIS 211 Lecture 4 Title: Data Base Management System.
© 2010 IBM Corporation Business Analytics software Business Analytics Editable Text Editable Text Editable Text.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
Knowledge Modeling and Discovery. About Thetus Thetus develops knowledge modeling and discovery infrastructure software for customers who: Have high-value.
© 2017 SlidePlayer.com Inc. All rights reserved.