Mining Trajectory Profiles for Discovering User Communities Speaker : Chih-Wen Chang National Chiao Tung University, Taiwan 2009.11.03 Chih-Chieh Hung,

Slides:



Advertisements
Similar presentations
Online Mining of Frequent Query Trees over XML Data Streams Hua-Fu Li*, Man-Kwan Shan and Suh-Yin Lee Department of Computer Science.
Advertisements

Mining User Similarity Based on Location History Yu Zheng, Quannan Li, Xing Xie Microsoft Research Asia.
Mining Frequent Patterns II: Mining Sequential & Navigational Patterns Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Mining Frequent Spatio-temporal Sequential Patterns
Correlation Search in Graph Databases Yiping Ke James Cheng Wilfred Ng Presented By Phani Yarlagadda.
Frequent Closed Pattern Search By Row and Feature Enumeration
Similarity Search in High Dimensions via Hashing
Fast Algorithm for Nearest Neighbor Search Based on a Lower Bound Tree Yong-Sheng Chen Yi-Ping Hung Chiou-Shann Fuh 8 th International Conference on Computer.
Constructing Popular Routes from Uncertain Trajectories Authors of Paper: Ling-Yin Wei (National Chiao Tung University, Hsinchu) Yu Zheng (Microsoft Research.
Constructing Popular Routes from Uncertain Trajectories Ling-Yin Wei 1, Yu Zheng 2, Wen-Chih Peng 1 1 National Chiao Tung University, Taiwan 2 Microsoft.
Incremental Discovery of Sequential Patterns (ACM-SIGMOD's 96 Data Mining Workshop)
--Presented By Sudheer Chelluboina. Professor: Dr.Maggie Dunham.
Graph Algorithms: Minimum Spanning Tree We are given a weighted, undirected graph G = (V, E), with weight function w:
Aki Hecht Seminar in Databases (236826) January 2009
3 -1 Chapter 3 The Greedy Method 3 -2 The greedy method Suppose that a problem can be solved by a sequence of decisions. The greedy method has that each.
Mining Interesting Locations and Travel Sequences from GPS Trajectories defense by Alok Rakkhit.
Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions.
Efficient Data Mining for Path Traversal Patterns CS401 Paper Presentation Chaoqiang chen Guang Xu.
1 A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS* by Gökhan Yavaş Feb 22, 2005 *: To appear in Data and Knowledge Engineering, Elsevier.
Modern Information Retrieval Chapter 4 Query Languages.
Overview of Web Data Mining and Applications Part I
Friends and Locations Recommendation with the use of LBSN
Motif Discovery in Protein Sequences using Messy De Bruijn Graph Mehmet Dalkilic and Rupali Patwardhan.
SOS: A Safe, Ordered, and Speedy Emergency Navigation Algorithm in Wireless Sensor Networks Andong Zhan ∗ †, Fan Wu ∗, Guihai Chen ∗ ∗ Shanghai Key Laboratory.
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
Mining Interesting Locations and Travel Sequences from GPS Trajectories IDB & IDS Lab. Seminar Summer 2009 강 민 석강 민 석 July 23 rd,
Knowledge Discovery and Delivery Lab (ISTI-CNR & Univ. Pisa)‏ www-kdd.isti.cnr.it Anna Monreale Fabio Pinelli Roberto Trasarti Fosca Giannotti A. Monreale,
2015/10/111 DBconnect: Mining Research Community on DBLP Data Osmar R. Zaïane, Jiyang Chen, Randy Goebel Web Mining and Social Network Analysis Workshop.
Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )
南台科技大學 資訊工程系 A web page usage prediction scheme using sequence indexing and clustering techniques Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2010/10/15.
Spatial-Temporal Models in Location Prediction Jingjing Wang 03/29/12.
Friends and Locations Recommendation with the use of LBSN By EKUNDAYO OLUFEMI ADEOLA
Shape-based Similarity Query for Trajectory of Mobile Object NTT Communication Science Laboratories, NTT Corporation, JAPAN. Yutaka Yanagisawa Jun-ichi.
1 Approximate Algorithms (chap. 35) Motivation: –Many problems are NP-complete, so unlikely find efficient algorithms –Three ways to get around: If input.
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
Web Mining Issues Size Size –>350 million pages –Grows at about 1 million pages a day Diverse types of data Diverse types of data.
Intelligent DataBase System Lab, NCKU, Taiwan Josh Jia-Ching Ying 1, Wang-Chien Lee 2, Tz-Chiao Weng 1 and Vincent S. Tseng 1 1 Department of Computer.
Mining Graph Patterns Efficiently via Randomized Summaries Chen Chen, Cindy X. Lin, Matt Fredrikson, Mihai Christodorescu, Xifeng Yan, Jiawei Han VLDB’09.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A personal route prediction system base on trajectory.
Predicting the Location and Time of Mobile Phone Users by Using Sequential Pattern Mining Techniques Mert Özer, Ilkcan Keles, Ismail Hakki Toroslu, Pinar.
Extracting stay regions with uncertain boundaries from GPS trajectories a case study in animal ecology Haidong Wang.
Improving Search for Emerging Applications * Some techniques current being licensed to Bimaple Chen Li UC Irvine.
Hello Everyone!!! 1. Tree And Graphs 2 Features of Trees  Tree Nodes Each node have 0 or more children A node have must one parent  Binary tree Tree.
An Energy-Efficient Approach for Real-Time Tracking of Moving Objects in Multi-Level Sensor Networks Vincent S. Tseng, Eric H. C. Lu, & Kawuu W. Lin Institute.
DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion.
© Prentice Hall1 DATA MINING Web Mining Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Companion slides.
黃福銘 (Angus F.M. Huang) ANTS Lab, IIS, Academia Sinica Exploring Spatial-Temporal Trajectory Model for Location.
Spanning Trees Dijkstra (Unit 10) SOL: DM.2 Classwork worksheet Homework (day 70) Worksheet Quiz next block.
Computational Challenges in BIG DATA 28/Apr/2012 China-Korea-Japan Workshop Takeaki Uno National Institute of Informatics & Graduated School for Advanced.
XRANK: RANKED KEYWORD SEARCH OVER XML DOCUMENTS Lin Guo Feng Shao Chavdar Botev Jayavel Shanmugasundaram Abhishek Chennaka, Alekhya Gade Advanced Database.
Ning Jin, Wei Wang ICDE 2011 LTS: Discriminative Subgraph Mining by Learning from Search History.
Diversified Trajectory Pattern Ranking in Geo-Tagged Social Media
Mining User Similarity from Semantic Trajectories
Outline Introduction State-of-the-art solutions
DATA MINING © Prentice Hall.
Binary search tree. Removing a node
Data Mining: Concepts and Techniques
Approximate Algorithms (chap. 35)
Spatial Online Sampling and Aggregation
Mining Complex Data COMP Seminar Spring 2011.
Mining Access Pattrens Efficiently from Web Logs Jian Pei, Jiawei Han, Behzad Mortazavi-asl, and Hua Zhu 2000년 5월 26일 DE Lab. 윤지영.
Lecture 13 CSE 331 Oct 1, 2012.
Md. Abul Kashem, Chowdhury Sharif Hasan, and Anupam Bhattacharjee
Efficient Subgraph Similarity All-Matching
Tree -decomposition * 竹内 和樹 * 藤井 勲.
Lecture 12 CSE 331 Sep 26, 2011.
Jongik Kim1, Dong-Hoon Choi2, and Chen Li3
Lecture 11 CSE 331 Sep 21, 2017.
Lecture 12 CSE 331 Sep 22, 2014.
Presentation transcript:

Mining Trajectory Profiles for Discovering User Communities Speaker : Chih-Wen Chang National Chiao Tung University, Taiwan Chih-Chieh Hung, Chih-Wen Chang, Wen-Chih Peng

Outline MotivationMotivation GoalGoal FrameworkFramework –Preprocess –Construct User’s Profiles –Formulate Distance function –Identify Community ExperimentsExperiments ConclusionConclusion 2

Motivation (1/2) Rapid development of positioning techniques, users can easily collect their trajectoriesRapid development of positioning techniques, users can easily collect their trajectories –GPS Logger, smart phones and navigation devices 3

Motivation (2/2) Many GPS community sites are establishedMany GPS community sites are established –Users can share their own trajectories –Users can search trajectories 4 My tracks Every Trail Query

Goal Mine user communities from raw trajectoriesMine user communities from raw trajectories –User Communities Sets of users who have similar moving behaviors ApplicationsApplications –Find new friends –Recommendation –Rank of trajectories 5

6 Profile Measure Distance Between Users Community 2 Community 1 1. Construct User’s Profile2. Formulate distance function 3. Identify users communities

Outline MotivationMotivation GoalGoal FrameworkFramework –Preprocess –Construct User’s Profiles –Formulate Distance function –Identify Community ExperimentsExperiments ConclusionConclusion 7

Framework 8 Preprocess Construct User’s Profile Measure Distance Between Users Identify Community

Preprocessing 9 Step 1:Step 1: –Find frequent regions Input: all trajectories of users Output: frequent regions Density-based approach Step 2:Step 2: –Transform trajectories into sequences of frequnet region id T1 :

Framework 10 Preprocess Construct User’s Profile Measure Distance Between Users Identify Community

Construct User’s Profiles (1/2) User’s ProfileUser’s Profile –Probabilistic Suffix Tree (abbreviated as PST) Find and organize trajectory patterns Record the probability of next movements 11 Frequently moving sequence Conditional tables (next possible movements)

Construct User’s Profiles (2/2) Construct PSTConstruct PST –Level by level –Two operations: Create a child node –The counts of Before symbol > MinSup Add a symbol into the related conditional table –The counts of After symbol > MinSup 12 root A:0.5B:0.375 A A B ABE ABA AC B ADF H JHI EDH AB:0.25 Before symbol A : 2  2/3 × = 0.25 After symbol A : 1  1/2 = 0.5 E : 1  1/2 = 0.5 Node B SIDCount C. Prob. A10.5 E1 ABE ABA AC B ADF H JHI EDH ABE ABA AC B ADF H JHI EDH B:0.375 MinSup = 0.2

Framework 13 Preprocess Construct User’s Profile Measure Distance Between Users Identify Community

Determine distance of usersDetermine distance of users 1.Transform the PST into Moving Sequence List Each element in moving sequence list is a branch of PST with their probability Formulate Distance function (1/3) 14 L 1 [1..2] =

Formulate Distance function (2/3) 2.Define the distance between PSTs −Find the minimal dist(L i [1..m], L j [1..n]) −Use three editing operations Insertion 15 L 1 ={m 1 :0.3,m 2 :0.2,m 3 :0.3}L 2 ={m 1 :0.3,m 2 :0.2} L 1 ={m 1 :0.3,m 2 :0.2,m 3 :0.3}L 2 ={m 1 :0.3,m 2 :0.2,m 3 :0.3} Insert T1 T2 Cost = 0.3

Deletion Replacement L 1 ={m 1 :0.2,m 2 :0.2,m 3 :0.2} L 2 ={m 1 :0.2,m 2 :0.2,m 3 :0.2} Replace Formulate Distance function (3/3) 16 L 1 ={m 1 :0.2,m 2 :0.3} L 2 ={m 1 :0.2,m 2 :0.3,m 3 :0.3} Delete L 1 ={m 1 :0.2,m 2 :0.3}L 2 ={m 1 :0.2,m 2 :0.3,____} L 1 ={m 1 :0.2,m 2 :0.2,m 3 :0.2} L 2 ={m 1 :0.2,m 2 :0.2,m 4 :0.3} T1T2 T1T2 0.3Cost = Cost = =

Framework 17 Preprocess Construct User’s Profile Measure Distance Between Users Identify Community

Identify Community (1/4) User communityUser community –The same community: δ MLS (T i,T j ) < threshold δ –The number of communities is minimal Transform the relation between PSTs into a graphTransform the relation between PSTs into a graph –A vertex represents a user –An edge exists between two vertices when δ MLS (T i,T j ) < threshold δ 18 O1 O2O5O3 O4

Identify Community (2/4) Model as a minimum clique problemModel as a minimum clique problem –A clique is a set of pair-wise adjacent vertices Example 19 O1 O2 O5 O3 O4

Identify Community (3/4) Select a representative PST for each communitySelect a representative PST for each community –Represent all PSTs in the same community –Advantages Reduce the overhead of storages Speed up query processing Identify new users for their communities 20 Representative PST Add into ?

Identify Community (4/4) Two factorsTwo factors 1.Size of representative PST ▪The number of tree nodes, denoted as N(Ti) 2. Distance between the selected PST and others in the same community ▪The error sum, denoted as ES - Sum of the distance between selected PST and others Representative PSTRepresentative PST –Minimize 21

Outline MotivationMotivation GoalGoal FrameworkFramework –Preprocess –Construct User’s Profiles –Formulate Distance function –Identify Community ExperimentsExperiments ConclusionConclusion 22

Experiments (1/4) Simulator ModelSimulator Model –Use real trajectories from CarWeb to simulate the group mobility of users Total : 2400 trajectories 23

Compare to General Sequential Pattern mining algorithm (GSP)Compare to General Sequential Pattern mining algorithm (GSP) –Set of sequential patterns Ex. sp 1, sp 2,..., sp n –Trajectory profile of a user represented as a –Distance function between profiles Cosine similarity measurement, similarity(V i, V j ) = Example Experiments (2/4) 24 Similarity : . | || |

Experiments (3/4) Impact of Trajectory ProfilesImpact of Trajectory Profiles 25 Storage Prediction GSP are always larger than PST Especially in MinSup smaller than 0.15

Experiments (4/4) Impact of the threshold δ and MinSupImpact of the threshold δ and MinSup –Smaller threshold δ will find more number of communities 26 Storage Prediction

Outline MotivationMotivation GoalGoal FrameworkFramework –Preprocess –Construct User’s Profiles –Formulate Distance function –Identify Community ExperimentsExperiments ConclusionConclusion 27

Conclusion Explore the problem of mining communities from trajectoriesExplore the problem of mining communities from trajectories 28 Preprocess Construct User’s Profile Measure Distance Between Users Identify Community Find frequent regions Replace trajectories by region ids Formulate distance function Cluster users by distance function Select Representative PSTs Build probabilistic suffix tree (abbreviated as PST)

THANK YOU! 29