Network-Wide Bike Availability Clustering Using the College Admission Algorithm: A Case Study of San Francisco Bay Area Hesham Rakha, Ph.D., P.Eng. Samuel.

Slides:



Advertisements
Similar presentations
Clustering.
Advertisements

PARTITIONAL CLUSTERING
Data Mining Techniques: Clustering
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
© 2005, it - instituto de telecomunicações. Todos os direitos reservados. Gerhard Maierbacher Scalable Coding Solutions for Wireless Sensor Networks IT.
Gene Expression 1. Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC EPCLUST 2.
Parallel K-Means Clustering Based on MapReduce The Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences Weizhong Zhao, Huifang.
Geographic Routing Without Location Information A. Rao, C. Papadimitriou, S. Shenker, and I. Stoica In Proceedings of the 9th Annual international Conference.
Least-Mean-Square Training of Cluster-Weighted-Modeling National Taiwan University Department of Computer Science and Information Engineering.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
CSIE Dept., National Taiwan Univ., Taiwan
Lecture 20: Cluster Validation
Apache Mahout. Mahout Introduction Machine Learning Clustering K-means Canopy Clustering Fuzzy K-Means Conclusion.
Multi-Agent Behaviour Segmentation via Spectral Clustering Dr Bálint Takács, Simon Butler, Dr Yiannis Demiris Intelligent Systems and Networks Group Electrical.
Pattern Recognition April 19, 2007 Suggested Reading: Horn Chapter 14.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.
A Bandwidth Scheduling Algorithm Based on Minimum Interference Traffic in Mesh Mode Xu-Yajing, Li-ZhiTao, Zhong-XiuFang and Xu-HuiMin International Conference.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 28 Nov 9, 2005 Nanjing University of Science & Technology.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Cluster Analysis Dr. Bernard Chen Ph.D. Assistant Professor Department of Computer Science University of Central Arkansas Fall 2010.
Clustering Approaches Ka-Lok Ng Department of Bioinformatics Asia University.
An unsupervised conditional random fields approach for clustering gene expression time series Chang-Tsun Li, Yinyin Yuan and Roland Wilson Bioinformatics,
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
Clustering Machine Learning Unsupervised Learning K-means Optimization objective Random initialization Determining Number of Clusters Hierarchical Clustering.
南台科技大學 資訊工程系 Region partition and feature matching based color recognition of tongue image 指導教授:李育強 報告者 :楊智雁 日期 : 2010/04/19 Pattern Recognition Letters,
Clustering (2) Center-based algorithms Fuzzy k-means Density-based algorithms ( DBSCAN as an example ) Evaluation of clustering results Figures and equations.
CSIE & NC Chaoyang University of Technology Taichung, Taiwan, ROC
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
January Sunday Monday Tuesday Wednesday Thursday Friday Saturday
Clustering Anna Reithmeir Data Mining Proseminar 2017
CSIE Dept., National Taiwan Univ., Taiwan
I.Panapakidis, A.Dagoumas Energy & Environmental Policy laboratory,
Mohammed Elhenawy1 and Hesham Rakha2
Online Conditional Outlier Detection in Nonstationary Time Series
May 2017 Four Year Old Class S&T: red 2 S&T: yellow 3 S&T: blue 4 5 6
Modeling Bike Availability in a Bike-Sharing System Using Machine Learning Hesham Rakha, Ph.D., P.Eng. Samuel Reynolds Pritchard Professor of Engineering,
Mining Spatio-Temporal Reachable Regions over Massive Trajectory Data
Rutgers Intelligent Transportation Systems (RITS) Laboratory
Clustering (3) Center-based algorithms Fuzzy k-means
Clustering Evaluation The EM Algorithm
DAYS OF THE WEEK.
JANUARY 2018 SUNDAY MONDAY TUESDAY WEDNESDAY THURSDAY FRIDAY SATURDAY
REMOTE SENSING Multispectral Image Classification
Information Organization: Clustering
A M P M Name: ________ Voice Log Week of __/__/__ Monday Tuesday
KAIST CS LAB Oh Jong-Hoon
Locations for CS 115 Activities
Dimension reduction : PCA and Clustering
Department of Electrical Engineering
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
Cluster Analysis.
Directed Numbers Friday, 12 April 2019.
Text Categorization Berlin Chen 2003 Reference:
SEPTEMBER ½ Day Unit PLC
JANUARY 2018 SUNDAY MONDAY TUESDAY WEDNESDAY THURSDAY FRIDAY SATURDAY
Extended Christmas Hours Thursday December 8th 9am -6:30pm Friday December 9th 9am -6:30pm Saturday December 10th 9am-6pm Thursday December.
| January Sunday Monday Tuesday Wednesday Thursday Friday
Time 1.
Contact
NOVEMBER READING LOG Student: When you have read, record your minutes and have your parent initial the proper box (each day). At the end of the month,
2011年 5月 2011年 6月 2011年 7月 2011年 8月 Sunday Monday Tuesday Wednesday
Introduction to Machine learning
June Sunday Monday Tuesday Wednesday Thursday Friday Saturday
January Monday Tuesday Wednesday Thursday Friday Saturday Sunday 30 31
Open Bowling Times 9:30 pm – 12 Midnight Moonrock
Presentation transcript:

Network-Wide Bike Availability Clustering Using the College Admission Algorithm: A Case Study of San Francisco Bay Area Hesham Rakha, Ph.D., P.Eng. Samuel Reynolds Pritchard Professor of Engineering, Charles E. Via, Jr. Dept. of Civil & Environmental Engineering Director, Center for Sustainable Mobility, Virginia Tech Transportation Institute Courtesy Professor, Bradley Dept. of Electrical and Computer Engineering

Center for Sustainable Mobility Presentation Outline Introduction Proposed algorithm Application Results Conclusion Research (n.d.). Retrieved November 11, 2016, from http://cep-probation.org/research-note/ Hesham Rakha Center for Sustainable Mobility

Center for Sustainable Mobility Introduction Clustering is an unsupervised learning technique that identifies underlying structure (natural grouping) of unlabeled data. What is a natural grouping among these objects? (quoted from SORAC Fall meeting’s presentation for Dr. Mohammed Elhenawy) Hesham Rakha Center for Sustainable Mobility

Center for Sustainable Mobility Introduction (cont.) Clustering is an unsupervised learning technique that identifies underlying structure (natural grouping) of unlabeled data. Finding a good clustering depends on the clustering criterion and the final aim of the clustering algorithm. One clustering solution Another clustering solution Blue cluster Rectangular cluster Circles cluster Red cluster (quoted from SORAC Fall meeting’s presentation for Dr. Mohammed Elhenawy) Hesham Rakha Center for Sustainable Mobility

Center for Sustainable Mobility Introduction (cont.) Therefore, the classical clustering techniques such as Kmeans, Fuzzy, DBSCAN clustering algorithms are blind! Only clusters the data based on one parameter: distance Maximize dispersion between clusters (minimize distortion inside clusters). So, it doesn’t consider other attributes such as the shape or color of the point! What is the solution? Come up with a new algorithm to maximum dispersion considering the shape or color of the point (i.e. maximize purity simultaneously!) Cluster.2 Cluster.1 Hesham Rakha Center for Sustainable Mobility

Center for Sustainable Mobility Proposed algorithm Built using two well-known algorithms, namely the College Admission (CA) algorithm and the K-median algorithm. CA tries to match between colleges and applicants with the goal of finding the optimal solution that satisfies both colleges and applicants through a series of iterations. K-median is similar to K-means but using median instead of mean. Combines the advantages of both supervised and unsupervised algorithms. It is a multi-objective algorithm where the impurity and distance in the cluster are minimized simultaneously. It matches between the clusters (minimizing distances) and data points (maximizing purity) until it converges. Hesham Rakha Center for Sustainable Mobility

Proposed algorithm - example 1 2 . n Maximize purity Minimize distance Data points Cluster 1 Cluster 2 Cluster 3 Preference list 3 Point 10 Point 3 Point 4 Point 2 Point 20 Point 13 Point 15 Clusters

How to find the optimal k? Consensus Clustering (CC) Hesham Rakha Center for Sustainable Mobility

Selecting the Optimal Number of Clusters Consensus Clustering (CC) subsamples the data set and calculates the consensus rate between all pairs of samples. It creates a similarity matrix that identifies the number of times two data points are assigned to the same cluster centroid, 𝑘∈𝐾, that can be used to show the degree of stability for each K. One of the measures for CC that can show the cluster stability is the cumulative distribution function (CDF) against consensus rate. Every curve represents a K, and the more the curve is flat, the more stable the number of clusters K is. Hesham Rakha Center for Sustainable Mobility

A Case Study of San Francisco Bay Area, Bike Sharing System Retrieved June 19, 2017, from http://www.cloud9living.com/san-francisco Hesham Rakha Center for Sustainable Mobility

Center for Sustainable Mobility Background Bike Sharing System (BSS): Last mile transportation solution. Sustainable urban transportation system. Environmentally friendly. Efficient and effective solution for traffic jams. Affordable. Retrieved June 19, 2017, from https://www.portlandpedalpower.com/blog/2014/03/bike-share-alternatives-developing-the-future-of-cycling/ Hesham Rakha Center for Sustainable Mobility

Center for Sustainable Mobility Background More than 37,000 stations in 50 countries! But, we always have a balancing problem! Number of recent research studies was conducted trying to rebalance the bike stations by anticipating the bike availability at each station. Retrieved June 19, 2017, from http://www.shareable.net/blog/what-can-we-learn-from-the-bike-sharing-world-map Hesham Rakha Center for Sustainable Mobility

Objective & Contribution Find the network-wide availability patterns and how these patterns evolve temporally with the goal of detecting imbalances in the BBS. Contribution We proposed a multi-objective clustering algorithm based on two algorithms. The proposed algorithm tries to cluster 15-minute entries of the bike availability across the network and find the similarity between them according to day-of-week and time-of-day. This provides an expected pattern of bikes usage for each cluster. Thereafter, we addressed when and where the system would be imbalanced. Hesham Rakha Center for Sustainable Mobility

Center for Sustainable Mobility Data Set Docking stations data collected from August 2013 to August 2015 in the San Francisco Bay Area as shown below. 70 stations (70 dimensions). Retrieved June 19, 2017, from https://www.kaggle.com Hesham Rakha Center for Sustainable Mobility

Data Set (Cont.) Reduced from one-minute to 15-minute. 48,000 entries. Using the proposed algorithm, we try to find the similarity between these entries and cluster them with regard to this similarity (bike availability) and the recorded time (time of day or day of week). Bikes availability every 15 minutes = 48,000 entries Hesham Rakha Center for Sustainable Mobility

Center for Sustainable Mobility Results: Day of week K=2 is the optimal k. Cumulative distribution function against consensus index value for each cluster (hours). Hesham Rakha Center for Sustainable Mobility

Results: Day of week (cont.) Tuesdays Wednesdays Thursdays Mondays Fridays Saturdays Sundays The probability of the day of week to be in one of the three clusters (k=3). Hesham Rakha Center for Sustainable Mobility

Results: Day of week (cont.) Available bikes of the three clusters for each station in the network. Hesham Rakha Center for Sustainable Mobility

Center for Sustainable Mobility Results: Time of day K=3 is the optimal k. Cumulative distribution function against consensus index value for each cluster (hours). Hesham Rakha Center for Sustainable Mobility

Results: Time of day (Cont.) Peak hours 8 a.m. to 5 p.m. Non-peak hours 6 p.m. to 7 a.m. The probability of hour to be in one of the two clusters (k=2). Hesham Rakha Center for Sustainable Mobility

Results: Time of day (Cont.) Available bikes of the two clusters for each station in the network. Hesham Rakha Center for Sustainable Mobility

Center for Sustainable Mobility Conclusion A new supervised algorithm was proposed overcome the classical clustering algorithms limitations. It was tested on a BSS in the San Francisco Bay Area. It was used to anticipate the bikes availability across the network with respect to time of day and day of week. The results show the days of week can be grouped into three clusters with an associated patter of bike availability. The time of day was clustered into two groups, peak and non-peak hours. The exploratory spatial-temporal analysis shows the BBS can be balanced with minimum cost and effort. Hesham Rakha Center for Sustainable Mobility

Center for Sustainable Mobility Questions? Hesham Rakha Center for Sustainable Mobility