Presentation is loading. Please wait.

Presentation is loading. Please wait.

AnHai Doan University of Social Media, Data Integration, and Human

Similar presentations


Presentation on theme: "AnHai Doan University of Social Media, Data Integration, and Human"— Presentation transcript:

1 AnHai Doan University of Wisconsin @WalmartLabs Social Media, Data Integration, and Human Computation @WalmartLabs

2 Background Professor at University of Wisconsin-Madison In 2010 took unpaid leave and joined Kosmix –Bay-area startup, did semantic analysis of social media Acquired by Walmart in 2011, became WalmartLabs –Based in San Bruno, local office in India, hundreds of people Why did Walmart buy a social-media startup? –Wanted to catch up with Amazon ( 35B of Amazon) –Major problems if don’t get close in 10 years (see Borders) –Kosmix/WalmartLabs helps in many ways –Provides a core of technical people, help attract more –Improves traditional e-commerce –Builds the e-commerce of the future : Social + Local + Mobile 2

3 Major R&D Groups at WalmartLabs 3 Search and Products Polaris Giant product catalog Product intelligence Demand Generation SEO, SEM Customer targeting and personalization Social, Mobile, and Local E-Commerce Mining social data Stores + Mobile Build social/mobile apps (get on the self, gift recommendation, etc.) Special Initiatives Big Fast Data Large-scale Machine Learning Data Extraction & Integration Crowdsourcing Social Genome

4 Mine everything we can out of social data –From tweets, FB feeds, Foursquare, blogs, etc. –Mine users, organizations, products, sentiments, events, etc. Connect them to those in the traditional Web world Put them into a giant knowledge base –Big, evolve rapidly over time –Call this “social genome” Use social genome to power multiple e-commerce applications –Search –Product intelligence –Gift recommendation –Personalized “Groupon” –Etc. 4

5 Social Genome all people actors Angelia JolieMel Gibson placesTwitter users @melgibson @dsmith … FB users mel-gibson davesmith … events celebritiessportspolitics … Gibson car crashEgyptian uprising the-same-as tweet-about @dsmith: Mel crashed. Maserati is gone. @far213: Tahrir is packed! Tahrir Cairo Egypt related-to located-in capital-of

6

7 Building Social Genome: Three Sample Challenges all people actors Angelia JolieMel Gibson placesTwitter users @melgibson @dsmith … FB users mel-gibson davesmith … events celebritiessportspolitics … Gibson car crashEgyptian uprising the-same-as tweet-about @dsmith: Mel crashed. Maserati is gone. @far213: Tahrir is packed! Tahrir Cairo Egypt related-to located-in capital-of 2 3 1

8 Extraction and Disambiguation: Traditional Methods Ill Suited for Social Media all people actors professors Angelia JolieMel Gibson places Long-term, Web context: actor, movie, Oscar, Hollywood Short-term, social context: crash, car, Maserati @dsmith: mel crashed. maserati is gone. Mel was arrested again. What a dramatic fall since his Oscar-winning day. Mel Brocks events celebritiessportspolitics … Gibson car crashEgyptian uprising Extraction use rule-based / NLP / machine learning techniques Extraction use dictionaries Disambiguation

9 Must Maintain a Highly Dynamic Social Genome 9 all people actors professors Angelia JolieMel Gibson places Long-term, Web context: actor, movie, Oscar, Hollywood Short-term, social context: crash, car, Maserati Mel Brocks events celebritiessportspolitics … Gibson car crashEgyptian uprising Latency less than 2 seconds, Maintained using a fast-data processing system

10 The Giant Traditional Taxonomy is the Secret Weapon Without it, dictionary-based extraction is not possible Provide a framework to –“understand” social media, find related concepts, “hang” social contexts Very hard to develop, takes years –Integrate data from multiple sources, like learning a foreign language Partly explains why it was hard for others to catch up  To integrate social media, must integrate traditional data well, then bootstrap all people actors Angelia JolieMel Gibson places Tahrir Cairo Egypt located-in capital-of

11 11 Context is also Absolutely Critical –Social @Walmart Labs Alice tweetsGo Giants! ? SF Giants NY Giants Context/ Disambiguation Alice lives in NYC NY Giants Bob tweets Go Giants! ? SF Giants NY Giants Context/ Disambiguation Bob likes Buster Posey (SF Giants player) SF Giants ? NY Giants Context/ Disambiguation Charlie tweeted on Feb 4 th (day before the Super Bowl (event) – the Web is talking about the NY Giants) NY Giants Charlie tweets Go Giants! Entity Extraction Entity Extraction Entity Extraction

12 Building Social Genome: Three Sample Challenges all people actors Angelia JolieMel Gibson placesTwitter users @melgibson @dsmith … FB users mel-gibson davesmith … events celebritiessportspolitics … Gibson car crashEgyptian uprising the-same-as tweet-about @dsmith: Mel crashed. Maserati is gone. @far213: Tahrir is packed! Tahrir Cairo Egypt related-to located-in capital-of 2 3 1

13 Event Detection: Current Solutions Lot of current work in academia / industry Limitations of most of the current solutions – exploit just one kind of heuristics e.g., find hot, trending, popular words (Egypt, revolt) – does not exploit crowdsourcing – does not scale events celebritiessportspolitics … Gibson car crashEgyptian uprising Twitter 4square Facebook Myspace Flickr … Event detection

14 Event Dection: Our Solution Twitter Foursquare Detector 2Detector nDetector 1 … Candidate events Candidate events Candidate events Event evaluator and ranker Ranked events Crowdsourcing Population 2 Crowdsourcing Population 3 Crowdsourcing Population 1... Muppet, a platform to process fast data over multiple machines

15 Processing Fast Data Big data management is well known by now –use MapReduce implementations –simple programming model, widespread adoption But a lot of fast data is also emerging –150 M tweets / day, 1 billion FB shares / day, 3 M Foursquare checkins / day –come into the system as very fast streams Numerous applications over these streams Need to process in real time –to answer “what is happening now?”

16 Processing Fast Data What we want: a platform that –delivers real-time processing (over multiple machines) –is highly scalable (as the data gets faster and faster) –has simple programming model –so developers can quickly write hundreds of apps –ideally like map-reduce, which developers already know –has real-time query and storage capability –apps can query content in real-time –distributed across multiple machines Answer: Muppet, like Map-Reduce, but for fast data –see “MapReduce-Style Processing of Fast Data”, VLDB-12

17 Using the Social Genome Gift recommendation: –“I love salt!” –“Your friend has just tweeted about the movie SALT. Would you like to buy something related for her birthday?” 17

18 Using the Social Genome Search query expansion –“Advil”  “advil headache cramp” Personalized “Groupon” with vendors –“You seem to be interested in gourmet coffee. If 50 persons sign up to buy the new DeLonghi coffee maker, you can get that for a 50% discount.” Stocking a local store –Lot of people in Mountain View are interested in outdoor sport –Stock up local Walmart store with related products A Siri-like shopping assistant 18

19 Wrapping Up The future of e-commerce: social, mobile, and local Retailers must increasingly be data / Web players Social media is important for e-commerce Integrating social data is fundamentally much harder than integrating “traditional” data –lack of context –dynamic environment, new concepts appear quickly –quality issues, lots of spam –fast data Must integrate “traditional” data well, then bootstrap –giant taxonomy critical Crowdsourcing becomes indispensible –but raises interesting challenges


Download ppt "AnHai Doan University of Social Media, Data Integration, and Human"

Similar presentations


Ads by Google