Presentation is loading. Please wait.

Presentation is loading. Please wait.

Guided By, Prof. Dr. Dirk C. Mattfeld, M.Sc. Jan Brinkmann Pattern Recognition in Multiple Bikesharing Systems for Comparability Presented By, Athiq Ahamed.

Similar presentations


Presentation on theme: "Guided By, Prof. Dr. Dirk C. Mattfeld, M.Sc. Jan Brinkmann Pattern Recognition in Multiple Bikesharing Systems for Comparability Presented By, Athiq Ahamed."— Presentation transcript:

1 Guided By, Prof. Dr. Dirk C. Mattfeld, M.Sc. Jan Brinkmann Pattern Recognition in Multiple Bikesharing Systems for Comparability Presented By, Athiq Ahamed

2 Vorlesungstitel | Semester | Kapitel x | Folie 2 © Prof. Dr. Dirk C. Mattfeld Motivation for Bikesharing Systems (BSS) Increase in the proportion of people living in urban areas United Nations predicts by 2050, 86 % of the world will be urbanized Most important modes of transport are public and private Several problems in urban areas for transport Traditional urban transportation does not solve the problems

3 Vorlesungstitel | Semester | Kapitel x | Folie 3 © Prof. Dr. Dirk C. Mattfeld BSS being a Promising Nominee (Shared Mobility) http://bike-sharing.blogspot.de/ BSS is a sustainable short-term bicycle rental service Cost effective and flexible form of transportation (Customer, Subscriber) First BSS program was deployed on July 28, 1965 Benefits Health Use of free ride (money) Pollution control Having fun!!!!!!

4 Vorlesungstitel | Semester | Kapitel x | Folie 4 © Prof. Dr. Dirk C. Mattfeld Issues and Hypotheses Issues Difficulties in operating and managing these BSS Predicting usage (open data) Availability of bikes and free racks User satisfaction !!!! Solution Planning the location properly and predicting bike usage (activity pattern) Prior redistribution of bikes Hypotheses 1. Rentals and returns depend on the spatial and temporal factors. 2. Profile of users influences rentals and returns. 3. Profile of the user influences in business development in the neighborhood.

5 Vorlesungstitel | Semester | Kapitel x | Folie 5 © Prof. Dr. Dirk C. Mattfeld Knowledge Discovery in Databases Preprocessing (Data Cleansing) Data reduction / integration / transformation Data Mining (Knowledge Discovery) Clustering, Classification Visualization Geo BI Several other methods Post processing Decision support

6 Vorlesungstitel | Semester | Kapitel x | Folie 6 © Prof. Dr. Dirk C. Mattfeld Related Work Froehlich, Neumann and Oliver (2009) : Analyzed BSS usage to infer human mobility patterns in the city (Barcelona). Kaltenbrunner et al (2010): Developed a short-term statistical prediction model for station occupancy (Barcelona). Borgnat et al. (2010): Developed a statistical model for predicting the occupancy of station (Lyon). Oliver O’Brien et al. (2013) : They analyzed 38 bikesharing systems, widespread all around the world (Europe, Middle East, Asia, Australasia and the Americas) Vogel et al. (2011) [69]: They presented complete analyses of operational data from Vienna’s BSS (Vienna).

7 Vorlesungstitel | Semester | Kapitel x | Folie 7 © Prof. Dr. Dirk C. Mattfeld Problems with Existing Systems Existing system analyzed data with simple analysis techniques Geocoding with population, household, employment data was never taken by any of the systems Most of their future work is comparability No in-depth comparability of systems is done None of the systems check comparability of two BSS with similar features Very fewer systems analyze cluster movements

8 Vorlesungstitel | Semester | Kapitel x | Folie 8 © Prof. Dr. Dirk C. Mattfeld Why Comparability ? Revisiting Comparability is crucial to get insights of activity patterns from multiple systems For predicting or anticipating such a bike activity in future Designing a new or extending an existing BSS Location for a new system is planned properly Serves as an input for several applications As a result, bikes or free racks available all the time

9 Vorlesungstitel | Semester | Kapitel x | Folie 9 © Prof. Dr. Dirk C. Mattfeld Which Systems for Comparability ? Population Weather Household ratio Economical aspects Tourism

10 Vorlesungstitel | Semester | Kapitel x | Folie 10 © Prof. Dr. Dirk C. Mattfeld Which Systems for Comparability?  Citi Bike NewYork and Capital Bike-Share Washington, D.C  Open data from Citi Bike and Capital Bikeshare websites(2014) Citi Bike NewYorkCapital Bike-Share Washington 332 station ids with 340 station’s356 stations Annual (45 minutes free), 24 hour, 7(30 minutes free) Annual, 30-day, 3 day,1 day( free for 30 minutes) 80,81,216 trips approx per year which reduced by 3% approx after data cleansing 29,45,512 trips approx per year which reduced by 3% approx after data cleansing Thousands of bikes, kiosks, docking stations…… Thousands of bikes, kiosks docking stations ……

11 Vorlesungstitel | Semester | Kapitel x | Folie 11 © Prof. Dr. Dirk C. Mattfeld Goal To identify patterns in BSS To prove the patterns are interesting using Data Mining and Geo BI With the hypotheses, one can prove that the patterns are interesting When the patterns are interesting hypotheses are proved When Hypotheses are proved, systems are comparable

12 Vorlesungstitel | Semester | Kapitel x | Folie 12 © Prof. Dr. Dirk C. Mattfeld Architecture Of NyDc Clustering and Classification Visualization TasksTools DBPostgres Data Cleansing Postgres, SAS ClusteringRapidMiner VisualizationTableau NyDCTableau Decision Support

13 Vorlesungstitel | Semester | Kapitel x | Folie 13 © Prof. Dr. Dirk C. Mattfeld Overview of the Process DurationStart time Stop time Start_idStop_idStart name Stop name Start longitude Stop longitude Start latitude Stop latitude Bike_idUser type Birth year Gender Station IDRental 0-1------Rental 23-0Returns 0-1------Return 23-0 IDStart / Stop timeAverage rentals / returns Station ID Rental 0-1------Rental 23-0Returns 0-1------Return 23-0Clusters

14 Vorlesungstitel | Semester | Kapitel x | Folie 14 © Prof. Dr. Dirk C. Mattfeld Data Cleansing (Selection / reduction / Intergration / Transformation)  For clustering meaningful attributes is necessary (cleaned)  Only duration greater than 60 seconds are chosen  Only summer months are chosen  Data integrated from multiple data sources and average rentals and returns per station per hour is calculated  Input with 48 attributes and one ID after transformation (each hour as an attribute) Weekday Casual Weekday Casual Weekday Subscriber Weekday Subscriber Weekday Weekend Subscriber Weekend Subscriber Weekend Casual Weekend Casual Weekend Data

15 Vorlesungstitel | Semester | Kapitel x | Folie 15 © Prof. Dr. Dirk C. Mattfeld Citi Bike Weekday

16 Vorlesungstitel | Semester | Kapitel x | Folie 16 © Prof. Dr. Dirk C. Mattfeld Capital Bikeshare Weekday

17 Vorlesungstitel | Semester | Kapitel x | Folie 17 © Prof. Dr. Dirk C. Mattfeld Citi Bike Subscriber Weekday

18 Vorlesungstitel | Semester | Kapitel x | Folie 18 © Prof. Dr. Dirk C. Mattfeld Capital Bikeshare Subscriber Weekday

19 Vorlesungstitel | Semester | Kapitel x | Folie 19 © Prof. Dr. Dirk C. Mattfeld Citi Bike Customer Weekday

20 Vorlesungstitel | Semester | Kapitel x | Folie 20 © Prof. Dr. Dirk C. Mattfeld Capital Bikeshare Customer Weekday

21 Vorlesungstitel | Semester | Kapitel x | Folie 21 © Prof. Dr. Dirk C. Mattfeld Data Mining (Knowledge Discovery) Clustering Unsupervised learning, process of grouping common objects Data contains no labels Common objects are the ones which are similar (members or attributes) Idea is to find some structure/pattern in a collection of unlabeled data It is learning by observation, not with example (K-means and K-medoids) Goal, high intra-cluster similarity opposite for inter-cluster similarity Areas Almost all the research fields Market research to medicines Image processing to spatial data

22 Vorlesungstitel | Semester | Kapitel x | Folie 22 © Prof. Dr. Dirk C. Mattfeld Data Mining (Knowledge Discovery) Classification Classification is a supervised learning technique It’s a process of finding a model or function Distinguishes the data consisting of class labels. The given data is usually divided into training data (known class label) and test data (unknown), (K-NN and Naive Bayes) Recall : It is the measure of completeness ---- TP/(TP + FN) Precision : It is the measure of exactness ---- TP/(TP + FP) Accuracy: The percentage of test set tuples that are correctly classified by the classifier. Class “A”Class “Not A” Test says “A”True PositiveFalse Positive Test says “Not A”False NegativeTrue Negative

23 Vorlesungstitel | Semester | Kapitel x | Folie 23 © Prof. Dr. Dirk C. Mattfeld K-means: Clustering Algorithm A simple clustering algorithm for high intra-cluster similarity and opposite for inter-cluster similarity Working 1)It begins by randomly selecting k data points (initial centroids) 2)Creates k empty clusters. 3)It then assign’s exactly one centroid to each cluster. 4)After assigning, it iterates over all instances. It then assigns each data point to one cluster with the nearest centroid (mean). 5)After each iteration, it computes cluster centroids based on the new data points. 6)It checks if clustering is good enough (until no change) or it returns to (2).

24 Vorlesungstitel | Semester | Kapitel x | Folie 24 © Prof. Dr. Dirk C. Mattfeld Complicated Questions How many clusters ??? Davies–Bouldin index (DBI) Accuracy using Classification Experience Why K-means ? Davies–Bouldin index (DBI) shows a low value High accuracy, precision, and recall using classification algorithms Pseudo code for NyDc Run clustering algorithms Get accuracy using classification algorithms (choose the best one) Evaluate using Davies-Bouldin Index Use Geo BI to validate the analysis or proving the hypotheses

25 Vorlesungstitel | Semester | Kapitel x | Folie 25 © Prof. Dr. Dirk C. Mattfeld Clustering Accuracy Evaluation Recall Cluster 0Cluster 1Cluster 2Cluster 3 K-means83.3387.599.1795.08 K-medoids93.1785.0784.6280 EM87.685.7197.8585.71 Precision Cluster 0Cluster 1Cluster 2Cluster 3 K-means94.5910096.7795.08 K-medoids92.593.4482.575.36 EM94.6485.7186.6792.31 Accuracy Naive Bayes K-NN K-means91.4696.32 K-medoids87.3387.92 EM91.8389.56

26 Vorlesungstitel | Semester | Kapitel x | Folie 26 © Prof. Dr. Dirk C. Mattfeld Clustering Validation For understanding it clearly these clusters are named Commuter cluster (active day rental and return) Tourist or mix cluster (late afternoon and evening) Leisure cluster and utility cluster (active night and early morning) Residential or outer city cluster (low activity all time) Proof for hypothesis one Sub-hypothesis 1: Temporal factors: time of the day plays an important role Sub-hypothesis 2: Spatial factors: Location plays an important role

27 Vorlesungstitel | Semester | Kapitel x | Folie 27 © Prof. Dr. Dirk C. Mattfeld Sub-hypothesis 1- Temporal Validation (Citi Bike)

28 Vorlesungstitel | Semester | Kapitel x | Folie 28 © Prof. Dr. Dirk C. Mattfeld Sub-hypothesis 1- Temporal Validation (Capital Bikeshare)

29 Vorlesungstitel | Semester | Kapitel x | Folie 29 © Prof. Dr. Dirk C. Mattfeld Examples for Validation Commuter 519 - Grand central terminal (railroad terminal) Dupont station- Dupont circle Tourist 2006 -Central park Smithsonian - National mall Washington Leisure 293- Lafayette Street U St and 13 St NW- U Street. Residential Brooklyn Arlington county

30 Vorlesungstitel | Semester | Kapitel x | Folie 30 © Prof. Dr. Dirk C. Mattfeld Sub-hypothesis 2: Spatial validation Citi Bike Weekday

31 Vorlesungstitel | Semester | Kapitel x | Folie 31 © Prof. Dr. Dirk C. Mattfeld Sub-hypothesis 2: Spatial validation Capital Bikeshare Weekday

32 Vorlesungstitel | Semester | Kapitel x | Folie 32 © Prof. Dr. Dirk C. Mattfeld Proof for Hypothesis 2 If he or she is a subscriber they are regular commuters Subscribers are educated and rich Similar morning pickup and return evening pattern (workers) validated in white color job map 41 % of the subscribers are master degree holders and 63 % are under 35 Subscribers spend more money on BSS than the customers

33 Vorlesungstitel | Semester | Kapitel x | Folie 33 © Prof. Dr. Dirk C. Mattfeld Proof for Hypothesis 2 Nine in ten survey respondents were employed USA census reports shows that only about seven in ten adults in Washington, D.C., are employed Customers are tourists or shoppers visiting neighborhood Customers show a less average activity in weekdays and more in the weekends. Late pick ups and active afternoons proves them to be tourists or shoppers or for household activities

34 Vorlesungstitel | Semester | Kapitel x | Folie 34 © Prof. Dr. Dirk C. Mattfeld Proof for Hypothesis 3 The riders visit spending locations more frequently High average activity in weekends proves to be tourists or leisure user. Cyclist visits a supermarket 3.2 times per week Motorist visit 2.5 times and spends more money If there are more customers or tourist then it’s likely to be better business in the neighborhood If there are more subscriber’s its likely to have a high usage of bikes

35 Vorlesungstitel | Semester | Kapitel x | Folie 35 © Prof. Dr. Dirk C. Mattfeld NyDc Since hypotheses are proved, patterns are interesting Since interesting patterns are similar they are comparable Final comparability model (Nydc) can be used for various applications Could serve as a benchmark for comparability study Several useful features Prediction model prototype was developed using NyDc This model can be mapped to the new location PopulationLocationTemporalHot spotHousehold...Cluster average Location decisions

36 Vorlesungstitel | Semester | Kapitel x | Folie 36 © Prof. Dr. Dirk C. Mattfeld Conclusion In-depth analysis is done by separating data Bike activity patterns are obtained for future prediction Hypotheses ( patterns are interesting ) are proved for solving the issues of BSS Yes, the two system’s are comparable to a greater extent Can be mapped to other cities BSS design one with similar attributes or prediction Business development, city dynamics or providing location based services.

37 Vorlesungstitel | Semester | Kapitel x | Folie 37 © Prof. Dr. Dirk C. Mattfeld Future Work Developing an analogy based bike sharing information system Using NyDc to develop a new Artificial intelligence recommendation system Developing better algorithm for predictions (taking more features) Personalized data capturing for recommender system (automatic path calculation with temporal information) NyDc comparison to other city somewhere in Asia. Human dynamics in different parts of the world.

38 Vorlesungstitel | Semester | Kapitel x | Folie 38 © Prof. Dr. Dirk C. Mattfeld


Download ppt "Guided By, Prof. Dr. Dirk C. Mattfeld, M.Sc. Jan Brinkmann Pattern Recognition in Multiple Bikesharing Systems for Comparability Presented By, Athiq Ahamed."

Similar presentations


Ads by Google