AnHai Doan University of Social Media, Data Integration, and Human

Slides:



Advertisements
Similar presentations
Taxonomy & Ontology Impact on Search Infrastructure John R. McGrath Sr. Director, Fast Search & Transfer.
Advertisements

Muppet Scalable MapUpdate data-stream processing
Building, Maintaining, and Using Knowledge Bases: A Report from the Trenches Omkar Deshpande, Digvijay S. Lamba, Michel Tourn, Sanjib Das, Sri Subramaniam,
ENTERPRISE SEARCH AND ITS VALUE TO THE ENTERPRISE Lee Atkinson or why search and retrieval of ‘relevant’ information is only the start in meeting the business.
GLOCAL Event-based Retrieval of Networked Media NEM Concertation Meeting Brussels, Feb
Social Media Intro to Business & Marketing. The most three most trusted forms of advertising are: Recommendations from people I know - 90% Consumer opinions.
Hadoop in the Wild CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook.
1 Large-Scale Machine Learning at Twitter Jimmy Lin and Alek Kolcz Twitter, Inc. Presented by: Yishuang Geng and Kexin Liu.
Social Media, Data Integration, and Human Computation
Ken Birman Cornell University. CS5410 Fall
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
NTC 2014 Social Data Analysis Bhupesh Chawda. Suggestions This presentation provides links to data sets as well as tools and resources for working on.
Social Media spending is expected to increase from $716M in 2008 to $3.1B in 2014, a 34% CAGR. 1.
AnHai Doan University of Wisconsin Big Data, Big Knowledge, and Big Crowd.
SYSTEMS SUPPORT FOR GRAPHICAL LEARNING Ken Birman 1 CS6410 Fall /18/2014.
ABOUT US WHO WE ARE WE ARE YOUR OUTSOURCED DIGITAL MARKETING DEPARTMENT We are a Full Digital Marketing Agency headquartered in London, United Kingdom.
Social Media Marketing: CONNECT Context Online/Offline Newness Network Expand Convert Test Ivan Surjanovic, Copyright.
Five Fundamentals for Managing a Small Business Web Site William Garnsey E-Commerce Chair.
SiliconIndias’ Business Intelligence Conference, Pune - Jan 29, 2011 By Shashank Garg | Jan 2011 Business Intelligence Trends.
Introduction. Readings r Van Steen and Tanenbaum: 5.1 r Coulouris: 10.3.
USING HADOOP & HBASE TO BUILD CONTENT RELEVANCE & PERSONALIZATION Tools to build your big data application Ameya Kanitkar.
Management Information Systems
Semantic Web outlook and trends May The Past 24 Odd Years 1984 Lenat’s Cyc vision 1989 TBL’s Web vision 1991 DARPA Knowledge Sharing Effort 1996.
Web Search Created by Ejaj Ahamed. What is web?  The World Wide Web began in 1989 at the CERN Particle Physics Lab in Switzerland. The Web did not gain.
Social Media at LISC June LISC Social Media What is it? New ways to distribute our news and stories that engages, interacts and shares. Why do it?
Web 2.0: Concepts and Applications 6 Linking Data.
Tim Finin University of Maryland, Baltimore County 29 January 2013 Joint work with Anupam Joshi, Laura Zavala and our students SRI Social Media Workshop.
Personalized Interaction with Web Resources First Sino-German Symposium on KNOWLEDGE HANDLING: REPRESENTATION, MANAGEMENT AND PERSONALIZED APPLICATION.
BUS 111 Victoria Thern. What is foursquare? o free app o helps you & friends make the most of where you are o used to share & save photos of the places.
Pete Bohman Adam Kunk. Real-Time Search  Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI Feb 2012 Presentation.
1. About Us 2 Social Annex spun out of Immply Group – a web development and design agency specializing in Social media, CMS, social networking and eCommerce.
VMob Mobile Marketing Platform Delivers Highly Targeted Marketing Directly into Shoppers’ Existing Smartphone Apps from the Microsoft Azure Cloud MICROSOFT.
+ Big Data IST210 Class Lecture. + Big Data Summary by EMC Corporation ( More videos that.
KSE631: Content Networking Uichin Lee Feb. 07, 2011.
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
 Steve Craig  A Sacramento native and graduate of UC Davis  Over ten years experience working with web technologies  Associate Product Manager for.
Site Technology TOI Fest Q Celebration From Keyword-based Search to Semantic Search, How Big Data Enables That?
Copyright © 2016 – Curt Hill The Digital World Understanding the challenges of this world.
Social Shopping: Concepts, Benefits, and Models
Social Information Processing March 26-28, 2008 AAAI Spring Symposium Stanford University
Axis AI Solves Challenges of Complex Data Extraction and Document Classification through Advanced Natural Language Processing and Machine Learning MICROSOFT.
Virtual techdays INDIA │ November 2010 Integrating Social Networks with ASP.NET Krishna Chaitanya T │ Future Web Research Lab, SETLabs, Infosys WE.
Sukha Payana Carpooling gamified! Play with your friends!
Business Intelligence for everyone 2 For BI to deliver maximum value, all Information Workers must participate: Broad access to uncover and share insights.
Machine Learning. Definition Machine learning is a subfield of computer science that evolved from the study of pattern recognition and computational.
Information Eastman. Business Process Skills Order to Cash, Forecasting & Budgeting, etc. Process Modeling Project Management Technical Skills.
2014 Lexicon-Based Sentiment Analysis Using the Most-Mentioned Word Tree Oct 10 th, 2014 Bo-Hyun Kim, Sr. Software Engineer With Lina Chen, Sr. Software.
1© 2015 IBM Corporation Unlocking the power of the API economy Client Briefing Nov.
The Cinema Analytics Opportunity 1 Join the Data Revolution.
Cloud Computing for Business Cloud Computing Services Cloud Computing Services.
Chapter 8: Web Analytics, Web Mining, and Social Analytics
Web Design Services to Create Better Customer Journey.
PROFESSIONALISM AND SOCIAL MEDIA Created by: Bedig Galladian.
CMS Trends Petr Palas. WHICH OF YOU LIKE THE TERM “WEB ENGAGEMENT MANAGEMENT”?
Big Data: Every Word Managing Data Data Mining TerminologyData Collection CrowdsourcingSecurity & Validation Universal Translation Monolingual Dictionaries.
Humanizing Business Insights with Social Data #ListenSmarter Feb
Indexing The World Wide Web: The Journey So Far Abhishek Das, Ankit Jain 2011 Paper Presentation : Abhishek Rangnekar 1.
10 Best Technologies to Learn at Eduonix in 2016 The tech field is progressing rapidly, with newer software applications and development tools being released.
The Future of Whole Human Genome Data Management and Analysis, Available on the Microsoft Azure Platform Today MICROSOFT AZURE APP BUILDER PROFILE: SPIRAL.
Microsoft Ignite /28/2017 6:07 PM
Hadoop in the Wild CMSC 491 Hadoop-Based Distributed Computing Spring 2016 Adam Shook.
Exploring Strategies For Optimizing Knowledge Derivation From Imagery
Introduction to Magento Magento is one of the most popular ecommerce solutions in the world. But learning this powerful content management system also.
Designed for Big Data Visual Analytics, Zoomdata Allows Business Users to Quickly Connect, Stream, and Visualize Data in the Microsoft Azure Platform MICROSOFT.
1,2 1,2 1 Omkar Deshpande , Digvijay S. Lamba , Michel Tourn ,
Glynk on Microsoft Azure: A Social Networking Platform Connecting Like-Minded People Nearby for Recommendations, Activities, and Meetups MICROSOFT AZURE.
Technical Capabilities
FashionBrain: Understanding Europe’s Fashion Data Universe
Presentation transcript:

AnHai Doan University of Social Media, Data Integration, and Human

Background Professor at University of Wisconsin-Madison In 2010 took unpaid leave and joined Kosmix –Bay-area startup, did semantic analysis of social media Acquired by Walmart in 2011, became WalmartLabs –Based in San Bruno, local office in India, hundreds of people Why did Walmart buy a social-media startup? –Wanted to catch up with Amazon ( 35B of Amazon) –Major problems if don’t get close in 10 years (see Borders) –Kosmix/WalmartLabs helps in many ways –Provides a core of technical people, help attract more –Improves traditional e-commerce –Builds the e-commerce of the future : Social + Local + Mobile 2

Major R&D Groups at WalmartLabs 3 Search and Products Polaris Giant product catalog Product intelligence Demand Generation SEO, SEM Customer targeting and personalization Social, Mobile, and Local E-Commerce Mining social data Stores + Mobile Build social/mobile apps (get on the self, gift recommendation, etc.) Special Initiatives Big Fast Data Large-scale Machine Learning Data Extraction & Integration Crowdsourcing Social Genome

Mine everything we can out of social data –From tweets, FB feeds, Foursquare, blogs, etc. –Mine users, organizations, products, sentiments, events, etc. Connect them to those in the traditional Web world Put them into a giant knowledge base –Big, evolve rapidly over time –Call this “social genome” Use social genome to power multiple e-commerce applications –Search –Product intelligence –Gift recommendation –Personalized “Groupon” –Etc. 4

Social Genome all people actors Angelia JolieMel Gibson … FB users mel-gibson davesmith … events celebritiessportspolitics … Gibson car crashEgyptian uprising the-same-as Mel crashed. Maserati is Tahrir is packed! Tahrir Cairo Egypt related-to located-in capital-of

Building Social Genome: Three Sample Challenges all people actors Angelia JolieMel Gibson … FB users mel-gibson davesmith … events celebritiessportspolitics … Gibson car crashEgyptian uprising the-same-as Mel crashed. Maserati is Tahrir is packed! Tahrir Cairo Egypt related-to located-in capital-of 2 3 1

Extraction and Disambiguation: Traditional Methods Ill Suited for Social Media all people actors professors Angelia JolieMel Gibson places Long-term, Web context: actor, movie, Oscar, Hollywood Short-term, social context: crash, car, mel crashed. maserati is gone. Mel was arrested again. What a dramatic fall since his Oscar-winning day. Mel Brocks events celebritiessportspolitics … Gibson car crashEgyptian uprising Extraction use rule-based / NLP / machine learning techniques Extraction use dictionaries Disambiguation

Must Maintain a Highly Dynamic Social Genome 9 all people actors professors Angelia JolieMel Gibson places Long-term, Web context: actor, movie, Oscar, Hollywood Short-term, social context: crash, car, Maserati Mel Brocks events celebritiessportspolitics … Gibson car crashEgyptian uprising Latency less than 2 seconds, Maintained using a fast-data processing system

The Giant Traditional Taxonomy is the Secret Weapon Without it, dictionary-based extraction is not possible Provide a framework to –“understand” social media, find related concepts, “hang” social contexts Very hard to develop, takes years –Integrate data from multiple sources, like learning a foreign language Partly explains why it was hard for others to catch up  To integrate social media, must integrate traditional data well, then bootstrap all people actors Angelia JolieMel Gibson places Tahrir Cairo Egypt located-in capital-of

11 Context is also Absolutely Critical Labs Alice tweetsGo Giants! ? SF Giants NY Giants Context/ Disambiguation Alice lives in NYC NY Giants Bob tweets Go Giants! ? SF Giants NY Giants Context/ Disambiguation Bob likes Buster Posey (SF Giants player) SF Giants ? NY Giants Context/ Disambiguation Charlie tweeted on Feb 4 th (day before the Super Bowl (event) – the Web is talking about the NY Giants) NY Giants Charlie tweets Go Giants! Entity Extraction Entity Extraction Entity Extraction

Building Social Genome: Three Sample Challenges all people actors Angelia JolieMel Gibson … FB users mel-gibson davesmith … events celebritiessportspolitics … Gibson car crashEgyptian uprising the-same-as Mel crashed. Maserati is Tahrir is packed! Tahrir Cairo Egypt related-to located-in capital-of 2 3 1

Event Detection: Current Solutions Lot of current work in academia / industry Limitations of most of the current solutions – exploit just one kind of heuristics e.g., find hot, trending, popular words (Egypt, revolt) – does not exploit crowdsourcing – does not scale events celebritiessportspolitics … Gibson car crashEgyptian uprising Twitter 4square Facebook Myspace Flickr … Event detection

Event Dection: Our Solution Twitter Foursquare Detector 2Detector nDetector 1 … Candidate events Candidate events Candidate events Event evaluator and ranker Ranked events Crowdsourcing Population 2 Crowdsourcing Population 3 Crowdsourcing Population 1... Muppet, a platform to process fast data over multiple machines

Processing Fast Data Big data management is well known by now –use MapReduce implementations –simple programming model, widespread adoption But a lot of fast data is also emerging –150 M tweets / day, 1 billion FB shares / day, 3 M Foursquare checkins / day –come into the system as very fast streams Numerous applications over these streams Need to process in real time –to answer “what is happening now?”

Processing Fast Data What we want: a platform that –delivers real-time processing (over multiple machines) –is highly scalable (as the data gets faster and faster) –has simple programming model –so developers can quickly write hundreds of apps –ideally like map-reduce, which developers already know –has real-time query and storage capability –apps can query content in real-time –distributed across multiple machines Answer: Muppet, like Map-Reduce, but for fast data –see “MapReduce-Style Processing of Fast Data”, VLDB-12

Using the Social Genome Gift recommendation: –“I love salt!” –“Your friend has just tweeted about the movie SALT. Would you like to buy something related for her birthday?” 17

Using the Social Genome Search query expansion –“Advil”  “advil headache cramp” Personalized “Groupon” with vendors –“You seem to be interested in gourmet coffee. If 50 persons sign up to buy the new DeLonghi coffee maker, you can get that for a 50% discount.” Stocking a local store –Lot of people in Mountain View are interested in outdoor sport –Stock up local Walmart store with related products A Siri-like shopping assistant 18

Wrapping Up The future of e-commerce: social, mobile, and local Retailers must increasingly be data / Web players Social media is important for e-commerce Integrating social data is fundamentally much harder than integrating “traditional” data –lack of context –dynamic environment, new concepts appear quickly –quality issues, lots of spam –fast data Must integrate “traditional” data well, then bootstrap –giant taxonomy critical Crowdsourcing becomes indispensible –but raises interesting challenges