Download presentation

Presentation is loading. Please wait.

Published byCarolina Volk Modified over 2 years ago

1
Mining Group Patterns of Mobile Users

2
Outline Motivation Problem definition Algorithms for mining mobile groups Handling untraceable time intervals Evaluation Conclusions

3
Introduction Facts CRM is quite successful in online stores Real-time recommendation Faster checkouts Price/features comparisons Why? e-CRM is able to collect detailed customer data. e-CRM is able to identify customers But there are limitations with e-CRM.

4
Searching vs Purchasing Vast majority of search-to-purchase conversion occurs offline

5
Where Did the Business Go? Client Shoppers 10 Million+ 4% of Clients Shoppers were also Buyers 96% of Clients Shoppers became Visitor Non-Buyers (VNBs) One Month Period 69% of Clients VNBs shopped at least one competitive merchant 8% of Clients Visitor/Non-Buyers purchased at a competitive merchant (652,200 buyers) Avg $460 per Buyer = ~ $300MM in Lost Sales Annual revenue potential from capturing 10% of these buyers category sales approaches $36 million

6
Connecting Attitudes and Behavior: Why Didn t VNBs Purchase? Which of the following reasons describe why you did not make a purchase (from major retailer A)? Respondent base: Known Visitor/Non-buyers of retailer A based upon observed behavior

7
Catalogs Make a Difference could be TWICE as likely to make an online purchase could spend more meaningful time at your web site, including a 75% likelihood to enter a secure session could be even more likely to purchase online with more frequent catalogs What this all adds up to is a revenue lift of $21.1 million per million unique visitors Think Direct Mail is irrelevant in the Internet age? Think again. As experts in the direct mail business, we teamed up with comScore and conducted a Catalog Study to track the purchasing habits of consumers who receive catalogs vs. those who do not. Our research shows that… When you mail consumers a catalog, they:

8
Can physical stores gain the advantages of e-CRM without its limitations? We need to identify customers collect data about their buying behavior.

9
Introduction Trends The cost of location tracking/positioning technologies have been dramatically reduced and the use of mobile devices is increasingly popular Wi-Fi/Cell id/sensor network GPS RFID Identification and browsing/transaction behavior of customers in physical stores is becoming a reality in the near future.

10
Introduction What data can you get by tracking customers in a physical store? Customer identification Products placed in a shopping cart Customer aisle behavior Purchase profile of the customer Reviews and ratings

11
Introduction What can you do by tracking customers in a physical store? Intelligent in-store customer service Fast checkout Price and feature comparison Physical location and map Price optimization Identification of shopping pals. Requires tags on products Possible in the near future

12
Introduction Mining Mobile Groups Many ways can determine the groups a user belongs to. Grouping based on demographics Grouping based on purchasing behavior Groups formed by using spatial-temporal information are useful. Objects within a mobile group tend to closely influence one another. Potential applications: Construction of social network Animal behavior study group-based pricing models or marketing strategies This work gives a precise definition about mobile groups and derives algorithms for efficiently identifying mobile groups. Physical proximity between group members. Temporal proximity between group members.

13
Work done so far Y. Wang, EP. Lim, SY Hwang, On Mining Group Patterns of Mobile Users, DEXA2003. Y. Wang, EP. Lim, SY Hwang, Efficient Group Pattern Mining Using Data Summarization, DASFAA2004, Korea An extension of the two previous work will appear in DKE journal. SY. Hwang, YH. Liu, JK. Chiu, EP. Lim Mining Mobile Group Patterns: A Trajectory-based Approach, PAKDD2005. Y. Wang, EP. Lim, SY Hwang, Efficiently Mining Maximal Valid Groups, to appear in VLDB journal

14
Problem Definition

16
Definition. Given a group of users G, a maximum distance threshold max_dis, and a minimal time duration threshold min_dur, a set of consecutive time points [t,t+k] is called a valid segment of G if 1.All users in G are not more than max_dis apart at time t, t+1, …, and t+k; 2.Some users in G are more than max_dis apart at time t-1: 3.Some users in G are more than max_dis apart at time t+(k+1); 4.(k+1)>=min_dur

17
Problem Definition max_dis=10, min_dur=3;

18
Problem Definition Definition. Let P be a mobile group with valid segments s 1, …,s n, and N denotes the number of time points in the database, the weight of P is defined as:

19
Problem Definition If the weight of a mobile group exceeds a threshold min_wei, we call it a valid group. For example, if min_wei =50%, the mobile group P={u 2,u 3,u 4} is valid, since it has valid segments{[1,3][6,8]} and weight 6/10>0.5. The valid mobile group mining problem: Given D, max_dis, min_dur, and min_wei, find all valid groups.

20
AGP: Algorithm based on Apriori Property [The Apriori property of valid mobile groups]: Given D, max_dis, min_dur, and min_wei, if a mobile group is valid, then any of its subset is valid.

21
AGP: Algorithm based on Apriori Property

22
VG-Growth: Algorithm based on Valid Group Graph Data Structures Bottleneck of AGP Candidate generation Involving many iterations of database scanning Definition. A valid group graph (or VG-graph) is a directed graph (V,E), where V is a set of vertices representing users in the set of valid 2-groups. E is a set of edges representing the set of valid 2- groups. Each edge is also associated with the valid segments of the corresponding valid 2-group pattern.

23
VG-Growth: Algorithm based on Valid Group Graph Data Structures max_dis = 10, min_dur = 3 and min_wei = 60%

24
VG-Growth: Algorithm based on Valid Group Graph Data Structures

25
u4u4

26
Evaluation Movement database generated by IBM City Simulator Covering 1500m 1000m 1000m area of 48 roads and 72 buildings Time unit is 10 minutes. M1kN1k means 1000 users moving for 1000 time units.

27
Evaluation

29
Data Summarization The bottleneck with VG-Growth is on the computation of 2-groups. Distances computed: N ( M 2 ) To reduce the time for identifying valid 2 groups, the location data of each user is summarized. Locations of a user within a time window w is summarized as an instance of some summarized model (SM)

30
Summarized models Sphere cuboid

31
Summarized Location Sphere The resultant location database is called a summarized database, where the number of time points becomes N/w. Basic ideas Find a set of candidate user pairs based on summarized database Scan the summarized database and original database (if necessary) to determine valid 2- groups.

32
Finding candidate 2-groups from summarized DB Two users u i and u j are possibly close at a time point t in the summarized database if A maximal set of consecutive time points [t a, t b ] is called possibly close segment For each pair of users u i and u j, we compute the upper bound of its weight. A pair of users is a candidate if the upper bound of its weight is no less than min_wei.

33
CASE 1 CASE 2 CASE 3

34
Summarization models Sphere location summarization method (SLS) Each sphere is represented as (p c, r)

35
Summarization models Cuboid location summarization method (CLS) Each cuboid is represented as two 3-D points (v min, v max ) V min.xV max.x V min.xV max.x

36
Summarization models

37
Performance Evaluation

38
Pitfalls of the location model To maintain accurate location tracking, the frequency of sampling users locations must be high. (Tracking 1000 users every second will result in 1GB per day) In reality, moving objects may be disconnected from time to time voluntarily or involuntarily. It is almost impossible to have perfectly synchronized sampling of users locations in reality.

39
Remedies Use trajectories with untraceable periods to model user locations The mobile group mining problem has to be modified. The algorithms have to be modified.

40
Trajectory model A trajectory T is a set of piecewise linear functions, each of which maps from a disjoint time interval to an n- dimensional space. E.g.

41
Trajectory-based location DB reference_pointvelocitystart_timeend_time o1o1 (1,1)(3,1)03 (7,-11)(1,5)35 (10,-3)(4,3)69 o2o2 (2,2)(2,1)03 (2,-13)(2,6)35 (-4,5)(3,2)610 o3o3 (2,4)(3,1)03 (17.-5)(-2,4)35 (12,35)(-1,-4)58

42
How to convert location data into trajectories Classified either batch or online algorithms [ N. Meratnia and R.A. de By, ETDB2004 ] Top-Down Douglas-Peucker( batch algorithm) Douglas-Peucker Open window from one window then is growing its size. (Online algorithm)Open window

43
How to convert location data into trajectories Top-Down

44
How to convert location data into trajectories Open Window

45
Determining the distance of 2 objects For trajectories of two objects o 1 and o 2 Synchronize linear pieces Calculate the distance for each time segment Object o 1 : Object o 2 :

46
Determining the distance of 2 objects Location of object o1 at time t: (1 + 3t, 1 + t) Location of object o2 at time t: (2 + 2t, 2 + t) Enclidean distance of o1 and o2 when 0 t<3:

47
Determining close intervals Given a distance function dist(t) of two objects o1 and o2 within an interval I, we would like to identify the subintervals I in I such that dist(t) max_dis, t I. E.g. Let 3=max_dis= [ ] [0, 3)= [0, 3)

48
Determining the weight Accordingly, the far segments and undecided segments can be determined. The weight of a mobile group is defined based on the lengths of its close, far, and undecided segments.

49
Definitions For a user group P Geographically close, far, or undecided at a time point t. The valid close segments and valid far segments of P can be accordingly defined. The weight of P is defined as

50
The problem The problem is to find all valid mobile groups under such a model Apriori property still holds if a moble group is valid, all of its subgroup will also be valid.

51
Apriori Trajectory-based Group Pattern Mining

52
Trajectory VG-Growth It behaves the same as VG-Growth except that each edge in TVG graph is associated with far segments and close segments. The close and far segments of a conditional TVG graph have to be properly updated. c(o1, o2 | o3) = c(o1, o2) c(o1, o3) c(o2, o3) f(o1, o2 | o3) = f(o1, o2) f(o1, o3) f(o2, o3)

53
Performance evaluation We compare the other two methods for handling untraceable intervals Pessimistic Linear Performance metrics

54
Performance evaluation Precision of pessimistic method is always 1. Recall of pessimistic method is low Proportional method has comparable Precision with and higher recall than linear method.

55
Ongoing/Future work Calendar-based mobile group mining Considering calendar patterns in mining mobile groups, e.g., Find mobile groups that are valid on every Friday or Wednesday We need a calendar schema, e.g. (year, month, day), a set of calendar patterns, e.g., (*,12, 1) Given a set of calendar patterns C based on a common calendar schema, a set of timestamped objects movement data D, and the user-specified thresholds max_dis, min_dur, min_wei, and min_match_ratio, the calendar-based mobile group mining problem is to find out all mobile group and calendar pattern pairs (g, c) such that g is valid with c, where c C.

56
Ongoing/Future work Employing data correction techniques before mining mobile groups.

57
Kalman Filter

59
Ongoing/Future work Maximal/closed mobile group mining Guidelines for parameter settings Mobile group clustering Other p-CRM applications and technologies

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google