Tokyo Research Laboratory © Copyright IBM Corporation 2009 | 2009/04/03 | SDM 09 / Travel-Time Prediction Travel-Time Prediction using Gaussian Process.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Xiaolei Li, Zhenhui Li, Jiawei Han, Jae-Gil Lee. 1. Motivation 2. Anomaly Definitions 3. Algorithm 4. Experiments 5. Conclusion.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Biointelligence Laboratory, Seoul National University
Computer vision: models, learning and inference Chapter 8 Regression.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Dynamic Bayesian Networks (DBNs)
4/15/2017 Using Gaussian Process Regression for Efficient Motion Planning in Environments with Deformable Objects Barbara Frank, Cyrill Stachniss, Nichola.
Clustering II.
Modeling Pixel Process with Scale Invariant Local Patterns for Background Subtraction in Complex Scenes (CVPR’10) Shengcai Liao, Guoying Zhao, Vili Kellokumpu,
Pattern Recognition and Machine Learning
x – independent variable (input)
First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.
Tokyo Research Laboratory © Copyright IBM Corporation 2009 | 2009/04/ | SDM 2009 / Anomaly-Detection Proximity-Based Anomaly Detection using Sparse Structure.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Lecture 5: Learning models using EM
An Optimal Learning Approach to Finding an Outbreak of a Disease Warren Scott Warren Powell
Based on Slides by D. Gunopulos (UCR)
Collaborative Ordinal Regression Shipeng Yu Joint work with Kai Yu, Volker Tresp and Hans-Peter Kriegel University of Munich, Germany Siemens Corporate.
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Overview of Kernel Methods Prof. Bennett Math Model of Learning and Discovery 2/27/05 Based on Chapter 2 of Shawe-Taylor and Cristianini.
Assigning User Class Link and Route Flows Uniquely to Urban Road Networks ______________________________________________________________________________.
Presented By Wanchen Lu 2/25/2013
PATTERN RECOGNITION AND MACHINE LEARNING
The horseshoe estimator for sparse signals CARLOS M. CARVALHO NICHOLAS G. POLSON JAMES G. SCOTT Biometrika (2010) Presented by Eric Wang 10/14/2010.
1 Mining surprising patterns using temporal description length Soumen Chakrabarti (IIT Bombay) Sunita Sarawagi (IIT Bombay) Byron Dom (IBM Almaden)
Ch 6. Kernel Methods Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by J. S. Kim Biointelligence Laboratory, Seoul National University.
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.
Tokyo Research Laboratory © Copyright IBM Corporation 2005 | 2005/11/30 | ICDM 05 IBM Research, Tokyo Research Lab. Tsuyoshi Idé Pairwise Symmetry Decomposition.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Line detection Assume there is a binary image, we use F(ά,X)=0 as the parametric equation of a curve with a vector of parameters ά=[α 1, …, α m ] and X=[x.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Discovering Deformable Motifs in Time Series Data Jin Chen CSE Fall 1.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
AAAI 2011, San Francisco Trajectory Regression on Road Networks Tsuyoshi Idé (IBM Research – Tokyo) Masashi Sugiyama (Tokyo Institute of Technology)
A Passive Approach to Sensor Network Localization Rahul Biswas and Sebastian Thrun International Conference on Intelligent Robots and Systems 2004 Presented.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Regularization and Feature Selection in Least-Squares Temporal Difference Learning J. Zico Kolter and Andrew Y. Ng Computer Science Department Stanford.
Gaussian Processes Li An Li An
CS Statistical Machine learning Lecture 24
ECE-7000: Nonlinear Dynamical Systems Overfitting and model costs Overfitting  The more free parameters a model has, the better it can be adapted.
Biointelligence Laboratory, Seoul National University
Linear Models for Classification
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Paper Reading Dalong Du Nov.27, Papers Leon Gu and Takeo Kanade. A Generative Shape Regularization Model for Robust Face Alignment. ECCV08. Yan.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Tokyo Research Laboratory © Copyright IBM Corporation 2005SDM 05 | 2005/04/21 | IBM Research, Tokyo Research Lab Tsuyoshi Ide Knowledge Discovery from.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Urban Traffic Simulated From A Dual Perspective Hu Mao-Bin University of Science and Technology of China Hefei, P.R. China
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
VizTree Huyen Dao and Chris Ackermann. Introducing example
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Mapping of Traffic Conditions at Downtown Thessaloniki with the Help of GPS Technology P. D. Savvaidis and K. Lakakis Aristotle University of Thessaloniki,
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Clustering (2) Center-based algorithms Fuzzy k-means Density-based algorithms ( DBSCAN as an example ) Evaluation of clustering results Figures and equations.
Ch 12. Continuous Latent Variables ~ 12
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Machine Learning Basics
Clustering (3) Center-based algorithms Fuzzy k-means
Biointelligence Laboratory, Seoul National University
LECTURE 15: REESTIMATION, EM AND MIXTURES
Topological Signatures For Fast Mobility Analysis
Chapter 9 Graph algorithms
Probabilistic Surrogate Models
Presentation transcript:

Tokyo Research Laboratory © Copyright IBM Corporation 2009 | 2009/04/03 | SDM 09 / Travel-Time Prediction Travel-Time Prediction using Gaussian Process Regression: A Trajectory-Based Approach IBM Tokyo Research Lab. Tsuyoshi Idé

Tokyo Research Laboratory © Copyright IBM Corporation 2009 | 2009/04 | SDM 09 Travel-Time Prediction / 3:00-3:20 Page 2 /18 Contents  Problem setting  Background  Formulation  Implementation  Experiment

Tokyo Research Laboratory © Copyright IBM Corporation 2009 | 2009/04 | SDM 09 Travel-Time Prediction / 3:00-3:20 Page 3 /18 Problem setting: Predict travel time along arbitrary path  Given traffic history data, find a p.d.f.  Traffic history data is a set of (path, travel time) :  Assuming all the paths in D share the same origin and destination origin destination (x (i ), y (i ) ) (x (j ), y (j ) ) x input pathtravel time Link road segment between neighboring intersections Path sequence of links

Tokyo Research Laboratory © Copyright IBM Corporation 2009 | 2009/04 | SDM 09 Travel-Time Prediction / 3:00-3:20 Page 4 /18 Background (1/2): Traditional time-series modeling is not useful for low-traffic links  Traditional approach: time-series modeling for particular link  Construct an AR model or a variant model for computing travel time as a function of time  Limitation: hard to model low-traffic links  Time-series modeling needs a lot of data for individual links  However, a path includes low-traffic links in general many side roads have little traffic date travel time [s] Traffic history on a particular link

Tokyo Research Laboratory © Copyright IBM Corporation 2009 | 2009/04 | SDM 09 Travel-Time Prediction / 3:00-3:20 Page 5 /18 Background (2/2): Trajectory mining is an emerging research field  Hurricane trajectory analysis  Clustering and outlier detection for trajectories  Shopping path analysis  Analyzing shipping paths in stores for marketing  Travel time prediction (this work)  Predicting travel time for each trajectory “An exploratory look at supermarket shopping paths”, Jeffrey S. Larson, et al., “Trajectory Outlier Detection: A Partition-and-Detect Framework”, Jae-Gil Lee, Jiawei Han, Xiaolei Li, ICDE 2008.

Tokyo Research Laboratory © Copyright IBM Corporation 2009 | 2009/04 | SDM 09 Travel-Time Prediction / 3:00-3:20 Page 6 /18 Our problem can be thought of as a non-standard regression problem, where input x is not a vector but a path date travel time travel time [s] path ?  Our problem: input = path (or trajectory)  Conventional: input = time (real value) Use string kernel for computing similarity between trajectories Use Gaussian process regression for probabilistic prediction Our solution Generally includes low-traffic links  time-series modeling is hard due to lack of data.

Tokyo Research Laboratory © Copyright IBM Corporation 2009 | 2009/04 | SDM 09 Travel-Time Prediction / 3:00-3:20 Page 7 /18 (Review) Comparing standard regression with kernel regression  Standard regression explicitly needs input vectors  Input = data matrix (design matrix)  Kernel regression needs only similarities  Input = kernel matrix i.e. only similarities matter # of samples dimensionality of input space

Tokyo Research Laboratory © Copyright IBM Corporation 2009 | 2009/04 | SDM 09 Travel-Time Prediction / 3:00-3:20 Page 8 /18 Formulation (1/4): Employing string kernels for similarity between paths  Each path is represented as a sequences of symbols  The “symbol” can be link ID e.g. the 3 rd sample may look like  String kernel is a natural measure for similarity between strings  We used p-spectrum kernel [Leslie 02] Set of subsequences of p consecutive symbols # of occurrences of a subsequence u in a path x (i) link ID

Tokyo Research Laboratory © Copyright IBM Corporation 2009 | 2009/04 | SDM 09 Travel-Time Prediction / 3:00-3:20 Page 9 /18 Formulation (2/4): Intuitions behind p-spectrum kernel – “split-and-compare” = =  Step 1: Split each path into subsequences  Step 2: Sum up number of co-occurrences  Example: p = 2, alphabet = {north, south, east, west}  If u = = (east, north), N u (blue) = 2 and N u (red) = 3. +

Tokyo Research Laboratory © Copyright IBM Corporation 2009 | 2009/04 | SDM 09 Travel-Time Prediction / 3:00-3:20 Page 10 /18  Assumption 1: Observation noise is Gaussian   Assumption 2: Prior distribution of latent variables is also Gaussian  Close points favor similar values of the latent variable -i.e. “underlying function should be smooth” Formulation (3/4): Employing Gaussian process regression (GPR). Two assumptions of GPR Latent variable Observation : similarity between path i and j

Tokyo Research Laboratory © Copyright IBM Corporation 2009 | 2009/04 | SDM 09 Travel-Time Prediction / 3:00-3:20 Page 11 /18  Predictive distribution is also Gaussian  (See the paper for derivation) Formulation (4/4): Employing Gaussian process regression (GPR). Predictive distribution is analytically obtained GPR predictive distribution mean m(x) variance s 2 (x) Input path travel time (hyper-parameter)

Tokyo Research Laboratory © Copyright IBM Corporation 2009 | 2009/04 | SDM 09 Travel-Time Prediction / 3:00-3:20 Page 12 /18  Find so that marginal likelihood is maximized  Log marginal likelihood (log-evidence):  We can derive fixed-point equations for  No need to use gradient method in 2D space  Alternately solve Cholesky factorization is needed at each iteration -More efficient algorithm  future work Implementation (1/2): Hyper-parameters are determined from the data

Tokyo Research Laboratory © Copyright IBM Corporation 2009 | 2009/04 | SDM 09 Travel-Time Prediction / 3:00-3:20 Page 13 /18 Implementation (2/2): Algorithm summary

Tokyo Research Laboratory © Copyright IBM Corporation 2009 | 2009/04 | SDM 09 Travel-Time Prediction / 3:00-3:20 Page 14 /18 Experiment (1/4): Generating traffic simulation data on an actual map  We used IBM Mega Traffic Simulator  Agent-based simulator which allows modeling complex behavior of individual drivers  Generated traffic on actual Kyoto City map  Data generation procedure: simulating sensible drivers  Pick one of top N 0 shortest paths for a given OD pair  Inject the car at the origin with Poisson time interval  Determine vehicle speed at every moment as a function of legal speed limit and vehicular gaps Give waiting time at each intersection  Upon arrival, compute travel time by adding up transit times of all the links

Tokyo Research Laboratory © Copyright IBM Corporation 2009 | 2009/04 | SDM 09 Travel-Time Prediction / 3:00-3:20 Page 15 /18 Experiment (2/4): We compare three different kernels  ID kernel  p-spectrum kernel whose alphabet is a set of link IDs themselves p is an input parameter  Direction kernel  p-spectrum kernel whose alphabet is the direction of each link North, South, East, West -These are determined from longitude and latitude of each link  Area kernel  Based on enclosing area S between trajectory pairs  Can be thought of as a counterpart of standard distances (Euclid distance etc.)

Tokyo Research Laboratory © Copyright IBM Corporation 2009 | 2009/04 | SDM 09 Travel-Time Prediction / 3:00-3:20 Page 16 /18 Experiment (3/4): Correlation coefficient as evaluation metric  Evaluation metric r : correlation coefficient between predicted and actual values   We used N = 100 paths for training, and the rest for testing  Total N 0 = 132 paths were generated  Compare different intersection waiting times  Compare different lengths of substring

Tokyo Research Laboratory © Copyright IBM Corporation 2009 | 2009/04 | SDM 09 Travel-Time Prediction / 3:00-3:20 Page 17 /18 Experiment (4/4): String kernel showed good agreement with actual travel time  Comparing different substring lengths (ID and direction kernels)  p = 2 gave the best result when > 0 Major contribution comes from individual links, but turning patterns at intersections also matter  Comparing different kernels  ID kernel is the best in terms of high r and small variance  Area kernel doesn’t work The “shapes” of trajectories shouldn’t be directly compared ID kernel

Tokyo Research Laboratory © Copyright IBM Corporation 2009 | 2009/04 | SDM 09 Travel-Time Prediction / 3:00-3:20 Page 18 /18 Summary  We formulated the task of travel-time prediction as the problem of trajectory mining  We Introduced two new ideas  We tested our approach using simulation data and showed good predictability Use of string kernels as a similarity metric between trajectories Use of Gaussian process regression for travel-time prediction

Tokyo Research Laboratory © Copyright IBM Corporation 2009 | 2009/04 | SDM 09 Travel-Time Prediction / 3:00-3:20 Thank You!