Presentation is loading. Please wait.

Presentation is loading. Please wait.

Network-Wide Bike Availability Clustering Using the College Admission Algorithm: A Case Study of San Francisco Bay Area Hesham Rakha, Ph.D., P.Eng. Samuel.

Similar presentations


Presentation on theme: "Network-Wide Bike Availability Clustering Using the College Admission Algorithm: A Case Study of San Francisco Bay Area Hesham Rakha, Ph.D., P.Eng. Samuel."— Presentation transcript:

1 Network-Wide Bike Availability Clustering Using the College Admission Algorithm: A Case Study of San Francisco Bay Area Hesham Rakha, Ph.D., P.Eng. Samuel Reynolds Pritchard Professor of Engineering, Charles E. Via, Jr. Dept. of Civil & Environmental Engineering Director, Center for Sustainable Mobility, Virginia Tech Transportation Institute Courtesy Professor, Bradley Dept. of Electrical and Computer Engineering

2 Center for Sustainable Mobility
Presentation Outline Introduction Proposed algorithm Application Results Conclusion Research (n.d.). Retrieved November 11, 2016, from Hesham Rakha Center for Sustainable Mobility

3 Center for Sustainable Mobility
Introduction Clustering is an unsupervised learning technique that identifies underlying structure (natural grouping) of unlabeled data. What is a natural grouping among these objects? (quoted from SORAC Fall meeting’s presentation for Dr. Mohammed Elhenawy) Hesham Rakha Center for Sustainable Mobility

4 Center for Sustainable Mobility
Introduction (cont.) Clustering is an unsupervised learning technique that identifies underlying structure (natural grouping) of unlabeled data. Finding a good clustering depends on the clustering criterion and the final aim of the clustering algorithm. One clustering solution Another clustering solution Blue cluster Rectangular cluster Circles cluster Red cluster (quoted from SORAC Fall meeting’s presentation for Dr. Mohammed Elhenawy) Hesham Rakha Center for Sustainable Mobility

5 Center for Sustainable Mobility
Introduction (cont.) Therefore, the classical clustering techniques such as Kmeans, Fuzzy, DBSCAN clustering algorithms are blind! Only clusters the data based on one parameter: distance Maximize dispersion between clusters (minimize distortion inside clusters). So, it doesn’t consider other attributes such as the shape or color of the point! What is the solution? Come up with a new algorithm to maximum dispersion considering the shape or color of the point (i.e. maximize purity simultaneously!) Cluster.2 Cluster.1 Hesham Rakha Center for Sustainable Mobility

6 Center for Sustainable Mobility
Proposed algorithm Built using two well-known algorithms, namely the College Admission (CA) algorithm and the K-median algorithm. CA tries to match between colleges and applicants with the goal of finding the optimal solution that satisfies both colleges and applicants through a series of iterations. K-median is similar to K-means but using median instead of mean. Combines the advantages of both supervised and unsupervised algorithms. It is a multi-objective algorithm where the impurity and distance in the cluster are minimized simultaneously. It matches between the clusters (minimizing distances) and data points (maximizing purity) until it converges. Hesham Rakha Center for Sustainable Mobility

7 Proposed algorithm - example
1 2 . n Maximize purity Minimize distance Data points Cluster 1 Cluster 2 Cluster 3 Preference list 3 Point 10 Point 3 Point 4 Point 2 Point 20 Point 13 Point 15 Clusters

8 How to find the optimal k? Consensus Clustering (CC)
Hesham Rakha Center for Sustainable Mobility

9 Selecting the Optimal Number of Clusters
Consensus Clustering (CC) subsamples the data set and calculates the consensus rate between all pairs of samples. It creates a similarity matrix that identifies the number of times two data points are assigned to the same cluster centroid, 𝑘∈𝐾, that can be used to show the degree of stability for each K. One of the measures for CC that can show the cluster stability is the cumulative distribution function (CDF) against consensus rate. Every curve represents a K, and the more the curve is flat, the more stable the number of clusters K is. Hesham Rakha Center for Sustainable Mobility

10 A Case Study of San Francisco Bay Area, Bike Sharing System
Retrieved June 19, 2017, from Hesham Rakha Center for Sustainable Mobility

11 Center for Sustainable Mobility
Background Bike Sharing System (BSS): Last mile transportation solution. Sustainable urban transportation system. Environmentally friendly. Efficient and effective solution for traffic jams. Affordable. Retrieved June 19, 2017, from Hesham Rakha Center for Sustainable Mobility

12 Center for Sustainable Mobility
Background More than 37,000 stations in 50 countries! But, we always have a balancing problem! Number of recent research studies was conducted trying to rebalance the bike stations by anticipating the bike availability at each station. Retrieved June 19, 2017, from Hesham Rakha Center for Sustainable Mobility

13 Objective & Contribution
Find the network-wide availability patterns and how these patterns evolve temporally with the goal of detecting imbalances in the BBS. Contribution We proposed a multi-objective clustering algorithm based on two algorithms. The proposed algorithm tries to cluster 15-minute entries of the bike availability across the network and find the similarity between them according to day-of-week and time-of-day. This provides an expected pattern of bikes usage for each cluster. Thereafter, we addressed when and where the system would be imbalanced. Hesham Rakha Center for Sustainable Mobility

14 Center for Sustainable Mobility
Data Set Docking stations data collected from August 2013 to August 2015 in the San Francisco Bay Area as shown below. 70 stations (70 dimensions). Retrieved June 19, 2017, from Hesham Rakha Center for Sustainable Mobility

15 Data Set (Cont.) Reduced from one-minute to 15-minute. 48,000 entries.
Using the proposed algorithm, we try to find the similarity between these entries and cluster them with regard to this similarity (bike availability) and the recorded time (time of day or day of week). Bikes availability every 15 minutes = 48,000 entries Hesham Rakha Center for Sustainable Mobility

16 Center for Sustainable Mobility
Results: Day of week K=2 is the optimal k. Cumulative distribution function against consensus index value for each cluster (hours). Hesham Rakha Center for Sustainable Mobility

17 Results: Day of week (cont.)
Tuesdays Wednesdays Thursdays Mondays Fridays Saturdays Sundays The probability of the day of week to be in one of the three clusters (k=3). Hesham Rakha Center for Sustainable Mobility

18 Results: Day of week (cont.)
Available bikes of the three clusters for each station in the network. Hesham Rakha Center for Sustainable Mobility

19 Center for Sustainable Mobility
Results: Time of day K=3 is the optimal k. Cumulative distribution function against consensus index value for each cluster (hours). Hesham Rakha Center for Sustainable Mobility

20 Results: Time of day (Cont.)
Peak hours 8 a.m. to 5 p.m. Non-peak hours 6 p.m. to 7 a.m. The probability of hour to be in one of the two clusters (k=2). Hesham Rakha Center for Sustainable Mobility

21 Results: Time of day (Cont.)
Available bikes of the two clusters for each station in the network. Hesham Rakha Center for Sustainable Mobility

22 Center for Sustainable Mobility
Conclusion A new supervised algorithm was proposed overcome the classical clustering algorithms limitations. It was tested on a BSS in the San Francisco Bay Area. It was used to anticipate the bikes availability across the network with respect to time of day and day of week. The results show the days of week can be grouped into three clusters with an associated patter of bike availability. The time of day was clustered into two groups, peak and non-peak hours. The exploratory spatial-temporal analysis shows the BBS can be balanced with minimum cost and effort. Hesham Rakha Center for Sustainable Mobility

23 Center for Sustainable Mobility
Questions? Hesham Rakha Center for Sustainable Mobility


Download ppt "Network-Wide Bike Availability Clustering Using the College Admission Algorithm: A Case Study of San Francisco Bay Area Hesham Rakha, Ph.D., P.Eng. Samuel."

Similar presentations


Ads by Google