Download presentation
Presentation is loading. Please wait.
1
Big Data, Big Analytics and Informed Actions
Dr. Wanli Min Director for Data Science Alibaba Group 12/18/2014
2
Agenda Introduction Data Revolution
Use Cases of Big Data in E-Commerce Beyond E-Commerce
3
Introduction Dr. Wanli Min 闵万里 (山景)
PhD in Statistics from the University of Chicago, 2004 IBM T. J. Watson Research Center, New York IBM Singapore, Singapore Google, Mountain View, California Alibaba, Hangzhou , China
4
Technology Drives Innovation
+ Intelligent Instrumented Interconnected
5
Booming Ecommerce China has the largest online population
Online shopping is gaining popularity among a wide range of social groups
6
Ecommerce Leader: Alibaba
Notes: (1) Though minority investments or joint ventures (2) Through contractual arrangements with Ant Financial Services Group, our related company that operates Alipay
7
Ecommerce Reaches Offline Shopping
12.12 shoppers got 50% off with Alipay Wallet at offline retail stores
8
Ecommerce Big Data Ecom Offline retail Consumer
Internet Internet of Things Offline retail People-centric: Internet enables ecommerce and drives big data
9
Ecommerce Big Data Source: New York Times, August 5, 2009
10
Data Revolution is Ongoing
Social evolution leads to inevitable Data Revolution Big data IT tech Semiconductor Modern Physics Industrial Revolution Agricultural IBM sold PC business 2005, 2006: Smarter Planet Smarter City 2004 Y2K 1980 1967 1947 1937 1905 1844 1730
11
Information Explosion Drives Data Revolution
Internet drives big data: Information Explosion Early days of Internet: “Copyright Do not redistribute.” Nowadays, information sharing is pervasive 2014 Click to share
12
Rising Power of Data Data Influence Exploded in the Past Decades
New Product New Company New Politician New Geo-Politics
13
Data Processing Product
In 1995, processing data of MB
14
Data-driven New Business
2004, processing data of GB, Google went public
15
Data Enables Politician
In 2013, processing data of TB, the U.S. President invited you to town hall meetings to discuss hot issues
16
Big Data Reshaped GeoPolitics
Democracy is a perfect case for big data usage Obama Campaign in 2008, 2012 Source: CNN, November 8, 2012 Source: Uchicago News, April 17, 2013 Facebook, Twitter,Blog,Poll… Where are the persuadable voters ?
17
IT to DT IT ——> DT (Data Technology) August 2009, New York Times
“ The sexy job in the next 10 years will be statisticians” - Hal Varian, Google Chief Economist “What’s ubiquitous and cheap?” “Data.” “What’s scarce is the analytical ability to utilize that data.” In October 2008, Alibaba set its long-term strategy: Alibaba is a data company Alibaba empowers itself to make cloud computing as utility available to the public In February 2014, Alibaba announced its latest strategy : 云+端,Data Technology
18
Outlook of Big Data How big is “Big Data”?
- If sample data covers nearly everywhere in the entire probability space, then the inference from such sample data is less dependent on specific model Volume, Velocity, Variety, Veracity Sufficient Statistics vs. Big Data Ergodicity
19
Outlook of Big Data Internet Internet of Things Big Data
Data Application vs. Data Storage/Data Warehouse Unstructured data vs. Structured data User behavior data , personalized offering Online advertisement: mass display user targeting Real-time bidding (RTB): pay for traffic pay for audience
20
Environmental feature
Value Add by Variety of Data Advertisement CTR lift 1,235% 550% 132% Environmental feature Demographics feature Behavioral feature Source: Acxiom Chief Analytics Officer, Dr. Jie Cheng
21
Connect & Combine Data Connect multiple data sources to generate collective & collaborative value Mutually beneficial Waze: crowd sourcing Airline companies: code share / alliance
22
Alibaba Embraces Big Data Revolution
Data Scientists are very popular at Silicon Valley and worldwide Alibaba Data Science Team Mission Improve business efficiency by enabling data-driven operation in LOB Create new data product to support multiple LOBs Utilize data to drive business innovation
23
Data Science in E-Commerce
Product Planning Reduced JuHuaSuan manual workload Increase Tmall “瞄一眼”revenue per slot User Targeting Sina Weibo Targeting O2O Xiami Music Product Buyers matching
24
Apps Supported by Data Science
25
Apps Supported by Data Science
Walle Model
26
A Graph-based Framework
Construct dynamic activity graph Buyers Product i wij u Puv Wij could be multi-dimension vector. Could be at clustering level j v Sellers Wij is a summary statistics of links between buyers i & j conditioning on given basket of product. Puv is a summary statistics of product u & v joint activity conditioning on given group of buyers.
27
A Graph-based Framework
Discover potential buyers of a particular product Buyers Product i wij u Puv j v Sellers Product search: for target product u, search its top KNN on {Puv} and Construct such activity graph conditioning on top KNN Transform edge distance as 1 / Starting from the nodes of known buyers, vote by shortest path Approximate Dijkstra's algorithm wij
28
Use Case: sales planning
Predict product sales Lots of features (DSR, user, price, etc.) Large scale model training Productized (app)
29
Predict Best Selling Product
Business Flow: Sellers register for sales event: discount & quantity Operators choose who will participate the sale event Model: Sales prediction for each proposed product Objective: Help operators select best-selling products
30
Who will buy this perfume? Will he/she buy perfume?
Predict Best Selling Product Gradient Boosting Decision Tree Parallelized Multiple Decision Tree,sample size of a billion Who will buy this perfume? age? occupation? rating? 买 青 中 老 否 是 优 良 不买 信息量要求比较低: 我们最多只需要问两个问题,就可以给出结论 Will he/she buy perfume?
31
Use Case: User Targeting
Operators Offline sellers What is my target users’ profile ? Where are my hidden customers? How to prioritize dissemination to different groups ? Is my brand image well aligned with targeted customers? Do I need to reach potential homebuyers? Do I need to reach lottery players? Who are Nike fans? What is the best channel to reach my targeted customers? Where can I find the hidden elite customers?
32
Use Case: User Targeting
Traditional User Profiling (user tags): KYC (Know Your Customer): Demographics profiling Post-event Too many user tags, often confusing to users Users’ intent is hard to infer Cannot differentiate importance/relevance of different tags
33
Use Case: User Targeting
Big Data enables user targeting CYC (Catch Your Customer) Fuse data from multiple sources in the eco system Propensity Model , semi-supervised / supervised Give answer to: Who, when, where, what Discriminative of different tags.
34
Use Case : User Targeting
Objective: show one product’s promotion ads to buyers Take a typical Internet ads campaign record for example - Post-event analysis:calculate targeted users’ likelihood and assign to groups Note: Categorize users by likelihood into 6 groups corresponding to descending scores 5, 4, 3, 2, 1, 0 Source: Alibaba Group’s project in 2013
35
Many Challenges for Data Scientist
Help people discover products of distinctive characteristics
36
Personalization on Mobile Taobao
高大上 女性
37
Personalized Recommendation
38
Beyond E-Commerce: HealthCare
Source: news.alibaba.com, March 18, 2014 + Intelligent Instrumented Interconnected
39
Pandemic Risk Map Background
Pandemic disease in highly populated city requires early detection, prompt action, mass awareness Problems & Challenges Reported cases are in isolation, no predictive view Medical treatment cost, loss of productivity and workforce Solution Aggregate silo data in optimal resolution Predict risk in future Extrapolation to city-wide area Visualization on map cross platform Business Case Singapore project of X-Dengue
40
Mobile: Connected HealthCare
Disease management Medical Home Health Wellness Medication Adherence Mobile Care Management – e-prescribing, doctor/hospital directory Remote Video Coaching Real-time Biometric Display Wellness, Medical Devices Integration Data Capture Trend / Chart Storage, Store and Forward Questionnaire Alert ( , SMS) Alarm Video Chat DATA Analytics Server Interface Reminders
41
Prevention, Prediction, Participation, Personalization
Patient Info Patient matching 4P Applications Use Diagnosis Support assessment analytics Practice Management Resource Allocation (Patient-Physician Matching) EMR Patient characterization Predictive Analytics Care management Patient Segmentation (Utilization Patterns) 41 41 41
42
Big Data got its limitations !
Leinweber, David J.: “Stupid data miner tricks: overfitting the S&P 500.” The Journal of Investing 16.1 (2007): S&P500 has 99% correlations with : 1. Bangladesh Cheese production 2. American Cheese production 3. Total number of Sheep in USA & Bangladesh
43
Gracias Grazie Merci Danke Obrigado Traditional Chinese Italian Thai
Spanish Merci French Russian Obrigado Brazilian Portuguese Arabic Danke German Simplified Chinese Japanese
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.