Guided By, Prof. Dr. Dirk C. Mattfeld, M.Sc. Jan Brinkmann Pattern Recognition in Multiple Bikesharing Systems for Comparability Presented By, Athiq Ahamed.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Data Mining Tools Overview Business Intelligence for Managers.
Random Forest Predrag Radenković 3237/10
Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
GPS and Multi-Week Data Collection of Activity-Travel Patterns Harry Timmermans Eindhoven University of Technology 4/19/2015.
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Capital Bikeshare Expansion to Montgomery County Public Meeting 11/29/11.
1 1 Slide STATISTICS FOR BUSINESS AND ECONOMICS Seventh Edition AndersonSweeneyWilliams Slides Prepared by John Loucks © 1999 ITP/South-Western College.
An Overview of Machine Learning
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
ICS 421 Spring 2010 Data Mining 2 Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 4/8/20101Lipyeow Lim.
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Basic concepts of Data Mining, Clustering and Genetic Algorithms Tsai-Yang Jea Department of Computer Science and Engineering SUNY at Buffalo.
Data Mining By Archana Ketkar.
Presented by Zeehasham Rasheed
Predicting Sequential Rating Elicited from Humans Aviv Zohar & Eran Marom.
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
1 A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions Zhihong Zeng, Maja Pantic, Glenn I. Roisman, Thomas S. Huang Reported.
Evaluation of Utility of LSA for Word Sense Discrimination Esther Levin, Mehrbod Sharifi, Jerry Ball
Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. How to Get a Good Sample Chapter 4.
Major Tasks in Data Preprocessing(Ref Chap 3) By Prof. Muhammad Amir Alam.
Introduction to machine learning
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
Evaluating Performance for Data Mining Techniques
PAKDD'15 DATA MINING COMPETITION: GENDER PREDICTION BASED ON E-COMMERCE DATA Team members: Maria Brbic, Dragan Gamberger, Jan Kralj, Matej Mihelcic, Matija.
Knowledge Discovery & Data Mining process of extracting previously unknown, valid, and actionable (understandable) information from large databases Data.
Data Mining Techniques
B.Ramamurthy. Data Analytics (Data Science) EDA Data Intuition/ understand ing Big-data analytics StatsAlgs Discoveries / intelligence Statistical Inference.
Data Mining Chun-Hung Chou
Presented by Tienwei Tsai July, 2005
Chapter 1 Introduction to Data Mining
Beyond Co-occurrence: Discovering and Visualizing Tag Relationships from Geo-spatial and Temporal Similarities Date : 2012/8/6 Resource : WSDM’12 Advisor.
Big Idea 1: The Practice of Science Description A: Scientific inquiry is a multifaceted activity; the processes of science include the formulation of scientifically.
United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys Bangkok,
1 Lecture 10 Clustering. 2 Preview Introduction Partitioning methods Hierarchical methods Model-based methods Density-based methods.
SHARED CAR NETWORK PRODUCTION SCHEDULING PROJECT – SPRING 2014 Tyler Ritrovato (tr2397) Peter Gray (png2105)
Evaluating What’s Been Learned. Cross-Validation Foundation is a simple idea – “ holdout ” – holds out a certain amount for testing and uses rest for.
AI Week 14 Machine Learning: Introduction to Data Mining Lee McCluskey, room 3/10
Experimental Evaluation of Learning Algorithms Part 1.
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
Methodology Qiang Yang, MTM521 Material. A High-level Process View for Data Mining 1. Develop an understanding of application, set goals, lay down all.
1 Learning Chapter 18 and Parts of Chapter 20 AI systems are complex and may have many parameters. It is impractical and often impossible to encode all.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Data Reduction via Instance Selection Chapter 1. Background KDD  Nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Chapter 14 – Cluster Analysis © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Information Retrieval Lecture 6 Introduction to Information Retrieval (Manning et al. 2007) Chapter 16 For the MSc Computer Science Programme Dell Zhang.
Computer Graphics and Image Processing (CIS-601).
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
An Investigation of Commercial Data Mining Presented by Emily Davis Supervisor: John Ebden.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
Traffic Prediction in a Bike-Sharing System
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Support Vector Machines and Gene Function Prediction Brown et al PNAS. CS 466 Saurabh Sinha.
Data Mining and Decision Support
Predicting the Location and Time of Mobile Phone Users by Using Sequential Pattern Mining Techniques Mert Özer, Ilkcan Keles, Ismail Hakki Toroslu, Pinar.
Cluster Analysis Dr. Bernard Chen Assistant Professor Department of Computer Science University of Central Arkansas.
An unsupervised conditional random fields approach for clustering gene expression time series Chang-Tsun Li, Yinyin Yuan and Roland Wilson Bioinformatics,
Applying Link-based Classification to Label Blogs Smriti Bhagat, Irina Rozenbaum Graham Cormode.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
A Decision Support Based on Data Mining in e-Banking Irina Ionita Liviu Ionita Department of Informatics University Petroleum-Gas of Ploiesti.
Show Me Potential Customers Data Mining Approach Leila Etaati.
Machine Learning Lecture 4: Unsupervised Learning (clustering) 1.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
CS548 Fall 2016 Clustering Showcase
Chapter 15 – Cluster Analysis
CSE 4705 Artificial Intelligence
Presentation transcript:

Guided By, Prof. Dr. Dirk C. Mattfeld, M.Sc. Jan Brinkmann Pattern Recognition in Multiple Bikesharing Systems for Comparability Presented By, Athiq Ahamed

Vorlesungstitel | Semester | Kapitel x | Folie 2 © Prof. Dr. Dirk C. Mattfeld Motivation for Bikesharing Systems (BSS) Increase in the proportion of people living in urban areas United Nations predicts by 2050, 86 % of the world will be urbanized Most important modes of transport are public and private Several problems in urban areas for transport Traditional urban transportation does not solve the problems

Vorlesungstitel | Semester | Kapitel x | Folie 3 © Prof. Dr. Dirk C. Mattfeld BSS being a Promising Nominee (Shared Mobility) BSS is a sustainable short-term bicycle rental service Cost effective and flexible form of transportation (Customer, Subscriber) First BSS program was deployed on July 28, 1965 Benefits Health Use of free ride (money) Pollution control Having fun!!!!!!

Vorlesungstitel | Semester | Kapitel x | Folie 4 © Prof. Dr. Dirk C. Mattfeld Issues and Hypotheses Issues Difficulties in operating and managing these BSS Predicting usage (open data) Availability of bikes and free racks User satisfaction !!!! Solution Planning the location properly and predicting bike usage (activity pattern) Prior redistribution of bikes Hypotheses 1. Rentals and returns depend on the spatial and temporal factors. 2. Profile of users influences rentals and returns. 3. Profile of the user influences in business development in the neighborhood.

Vorlesungstitel | Semester | Kapitel x | Folie 5 © Prof. Dr. Dirk C. Mattfeld Knowledge Discovery in Databases Preprocessing (Data Cleansing) Data reduction / integration / transformation Data Mining (Knowledge Discovery) Clustering, Classification Visualization Geo BI Several other methods Post processing Decision support

Vorlesungstitel | Semester | Kapitel x | Folie 6 © Prof. Dr. Dirk C. Mattfeld Related Work Froehlich, Neumann and Oliver (2009) : Analyzed BSS usage to infer human mobility patterns in the city (Barcelona). Kaltenbrunner et al (2010): Developed a short-term statistical prediction model for station occupancy (Barcelona). Borgnat et al. (2010): Developed a statistical model for predicting the occupancy of station (Lyon). Oliver O’Brien et al. (2013) : They analyzed 38 bikesharing systems, widespread all around the world (Europe, Middle East, Asia, Australasia and the Americas) Vogel et al. (2011) [69]: They presented complete analyses of operational data from Vienna’s BSS (Vienna).

Vorlesungstitel | Semester | Kapitel x | Folie 7 © Prof. Dr. Dirk C. Mattfeld Problems with Existing Systems Existing system analyzed data with simple analysis techniques Geocoding with population, household, employment data was never taken by any of the systems Most of their future work is comparability No in-depth comparability of systems is done None of the systems check comparability of two BSS with similar features Very fewer systems analyze cluster movements

Vorlesungstitel | Semester | Kapitel x | Folie 8 © Prof. Dr. Dirk C. Mattfeld Why Comparability ? Revisiting Comparability is crucial to get insights of activity patterns from multiple systems For predicting or anticipating such a bike activity in future Designing a new or extending an existing BSS Location for a new system is planned properly Serves as an input for several applications As a result, bikes or free racks available all the time

Vorlesungstitel | Semester | Kapitel x | Folie 9 © Prof. Dr. Dirk C. Mattfeld Which Systems for Comparability ? Population Weather Household ratio Economical aspects Tourism

Vorlesungstitel | Semester | Kapitel x | Folie 10 © Prof. Dr. Dirk C. Mattfeld Which Systems for Comparability?  Citi Bike NewYork and Capital Bike-Share Washington, D.C  Open data from Citi Bike and Capital Bikeshare websites(2014) Citi Bike NewYorkCapital Bike-Share Washington 332 station ids with 340 station’s356 stations Annual (45 minutes free), 24 hour, 7(30 minutes free) Annual, 30-day, 3 day,1 day( free for 30 minutes) 80,81,216 trips approx per year which reduced by 3% approx after data cleansing 29,45,512 trips approx per year which reduced by 3% approx after data cleansing Thousands of bikes, kiosks, docking stations…… Thousands of bikes, kiosks docking stations ……

Vorlesungstitel | Semester | Kapitel x | Folie 11 © Prof. Dr. Dirk C. Mattfeld Goal To identify patterns in BSS To prove the patterns are interesting using Data Mining and Geo BI With the hypotheses, one can prove that the patterns are interesting When the patterns are interesting hypotheses are proved When Hypotheses are proved, systems are comparable

Vorlesungstitel | Semester | Kapitel x | Folie 12 © Prof. Dr. Dirk C. Mattfeld Architecture Of NyDc Clustering and Classification Visualization TasksTools DBPostgres Data Cleansing Postgres, SAS ClusteringRapidMiner VisualizationTableau NyDCTableau Decision Support

Vorlesungstitel | Semester | Kapitel x | Folie 13 © Prof. Dr. Dirk C. Mattfeld Overview of the Process DurationStart time Stop time Start_idStop_idStart name Stop name Start longitude Stop longitude Start latitude Stop latitude Bike_idUser type Birth year Gender Station IDRental Rental 23-0Returns Return 23-0 IDStart / Stop timeAverage rentals / returns Station ID Rental Rental 23-0Returns Return 23-0Clusters

Vorlesungstitel | Semester | Kapitel x | Folie 14 © Prof. Dr. Dirk C. Mattfeld Data Cleansing (Selection / reduction / Intergration / Transformation)  For clustering meaningful attributes is necessary (cleaned)  Only duration greater than 60 seconds are chosen  Only summer months are chosen  Data integrated from multiple data sources and average rentals and returns per station per hour is calculated  Input with 48 attributes and one ID after transformation (each hour as an attribute) Weekday Casual Weekday Casual Weekday Subscriber Weekday Subscriber Weekday Weekend Subscriber Weekend Subscriber Weekend Casual Weekend Casual Weekend Data

Vorlesungstitel | Semester | Kapitel x | Folie 15 © Prof. Dr. Dirk C. Mattfeld Citi Bike Weekday

Vorlesungstitel | Semester | Kapitel x | Folie 16 © Prof. Dr. Dirk C. Mattfeld Capital Bikeshare Weekday

Vorlesungstitel | Semester | Kapitel x | Folie 17 © Prof. Dr. Dirk C. Mattfeld Citi Bike Subscriber Weekday

Vorlesungstitel | Semester | Kapitel x | Folie 18 © Prof. Dr. Dirk C. Mattfeld Capital Bikeshare Subscriber Weekday

Vorlesungstitel | Semester | Kapitel x | Folie 19 © Prof. Dr. Dirk C. Mattfeld Citi Bike Customer Weekday

Vorlesungstitel | Semester | Kapitel x | Folie 20 © Prof. Dr. Dirk C. Mattfeld Capital Bikeshare Customer Weekday

Vorlesungstitel | Semester | Kapitel x | Folie 21 © Prof. Dr. Dirk C. Mattfeld Data Mining (Knowledge Discovery) Clustering Unsupervised learning, process of grouping common objects Data contains no labels Common objects are the ones which are similar (members or attributes) Idea is to find some structure/pattern in a collection of unlabeled data It is learning by observation, not with example (K-means and K-medoids) Goal, high intra-cluster similarity opposite for inter-cluster similarity Areas Almost all the research fields Market research to medicines Image processing to spatial data

Vorlesungstitel | Semester | Kapitel x | Folie 22 © Prof. Dr. Dirk C. Mattfeld Data Mining (Knowledge Discovery) Classification Classification is a supervised learning technique It’s a process of finding a model or function Distinguishes the data consisting of class labels. The given data is usually divided into training data (known class label) and test data (unknown), (K-NN and Naive Bayes) Recall : It is the measure of completeness ---- TP/(TP + FN) Precision : It is the measure of exactness ---- TP/(TP + FP) Accuracy: The percentage of test set tuples that are correctly classified by the classifier. Class “A”Class “Not A” Test says “A”True PositiveFalse Positive Test says “Not A”False NegativeTrue Negative

Vorlesungstitel | Semester | Kapitel x | Folie 23 © Prof. Dr. Dirk C. Mattfeld K-means: Clustering Algorithm A simple clustering algorithm for high intra-cluster similarity and opposite for inter-cluster similarity Working 1)It begins by randomly selecting k data points (initial centroids) 2)Creates k empty clusters. 3)It then assign’s exactly one centroid to each cluster. 4)After assigning, it iterates over all instances. It then assigns each data point to one cluster with the nearest centroid (mean). 5)After each iteration, it computes cluster centroids based on the new data points. 6)It checks if clustering is good enough (until no change) or it returns to (2).

Vorlesungstitel | Semester | Kapitel x | Folie 24 © Prof. Dr. Dirk C. Mattfeld Complicated Questions How many clusters ??? Davies–Bouldin index (DBI) Accuracy using Classification Experience Why K-means ? Davies–Bouldin index (DBI) shows a low value High accuracy, precision, and recall using classification algorithms Pseudo code for NyDc Run clustering algorithms Get accuracy using classification algorithms (choose the best one) Evaluate using Davies-Bouldin Index Use Geo BI to validate the analysis or proving the hypotheses

Vorlesungstitel | Semester | Kapitel x | Folie 25 © Prof. Dr. Dirk C. Mattfeld Clustering Accuracy Evaluation Recall Cluster 0Cluster 1Cluster 2Cluster 3 K-means K-medoids EM Precision Cluster 0Cluster 1Cluster 2Cluster 3 K-means K-medoids EM Accuracy Naive Bayes K-NN K-means K-medoids EM

Vorlesungstitel | Semester | Kapitel x | Folie 26 © Prof. Dr. Dirk C. Mattfeld Clustering Validation For understanding it clearly these clusters are named Commuter cluster (active day rental and return) Tourist or mix cluster (late afternoon and evening) Leisure cluster and utility cluster (active night and early morning) Residential or outer city cluster (low activity all time) Proof for hypothesis one Sub-hypothesis 1: Temporal factors: time of the day plays an important role Sub-hypothesis 2: Spatial factors: Location plays an important role

Vorlesungstitel | Semester | Kapitel x | Folie 27 © Prof. Dr. Dirk C. Mattfeld Sub-hypothesis 1- Temporal Validation (Citi Bike)

Vorlesungstitel | Semester | Kapitel x | Folie 28 © Prof. Dr. Dirk C. Mattfeld Sub-hypothesis 1- Temporal Validation (Capital Bikeshare)

Vorlesungstitel | Semester | Kapitel x | Folie 29 © Prof. Dr. Dirk C. Mattfeld Examples for Validation Commuter Grand central terminal (railroad terminal) Dupont station- Dupont circle Tourist Central park Smithsonian - National mall Washington Leisure 293- Lafayette Street U St and 13 St NW- U Street. Residential Brooklyn Arlington county

Vorlesungstitel | Semester | Kapitel x | Folie 30 © Prof. Dr. Dirk C. Mattfeld Sub-hypothesis 2: Spatial validation Citi Bike Weekday

Vorlesungstitel | Semester | Kapitel x | Folie 31 © Prof. Dr. Dirk C. Mattfeld Sub-hypothesis 2: Spatial validation Capital Bikeshare Weekday

Vorlesungstitel | Semester | Kapitel x | Folie 32 © Prof. Dr. Dirk C. Mattfeld Proof for Hypothesis 2 If he or she is a subscriber they are regular commuters Subscribers are educated and rich Similar morning pickup and return evening pattern (workers) validated in white color job map 41 % of the subscribers are master degree holders and 63 % are under 35 Subscribers spend more money on BSS than the customers

Vorlesungstitel | Semester | Kapitel x | Folie 33 © Prof. Dr. Dirk C. Mattfeld Proof for Hypothesis 2 Nine in ten survey respondents were employed USA census reports shows that only about seven in ten adults in Washington, D.C., are employed Customers are tourists or shoppers visiting neighborhood Customers show a less average activity in weekdays and more in the weekends. Late pick ups and active afternoons proves them to be tourists or shoppers or for household activities

Vorlesungstitel | Semester | Kapitel x | Folie 34 © Prof. Dr. Dirk C. Mattfeld Proof for Hypothesis 3 The riders visit spending locations more frequently High average activity in weekends proves to be tourists or leisure user. Cyclist visits a supermarket 3.2 times per week Motorist visit 2.5 times and spends more money If there are more customers or tourist then it’s likely to be better business in the neighborhood If there are more subscriber’s its likely to have a high usage of bikes

Vorlesungstitel | Semester | Kapitel x | Folie 35 © Prof. Dr. Dirk C. Mattfeld NyDc Since hypotheses are proved, patterns are interesting Since interesting patterns are similar they are comparable Final comparability model (Nydc) can be used for various applications Could serve as a benchmark for comparability study Several useful features Prediction model prototype was developed using NyDc This model can be mapped to the new location PopulationLocationTemporalHot spotHousehold...Cluster average Location decisions

Vorlesungstitel | Semester | Kapitel x | Folie 36 © Prof. Dr. Dirk C. Mattfeld Conclusion In-depth analysis is done by separating data Bike activity patterns are obtained for future prediction Hypotheses ( patterns are interesting ) are proved for solving the issues of BSS Yes, the two system’s are comparable to a greater extent Can be mapped to other cities BSS design one with similar attributes or prediction Business development, city dynamics or providing location based services.

Vorlesungstitel | Semester | Kapitel x | Folie 37 © Prof. Dr. Dirk C. Mattfeld Future Work Developing an analogy based bike sharing information system Using NyDc to develop a new Artificial intelligence recommendation system Developing better algorithm for predictions (taking more features) Personalized data capturing for recommender system (automatic path calculation with temporal information) NyDc comparison to other city somewhere in Asia. Human dynamics in different parts of the world.

Vorlesungstitel | Semester | Kapitel x | Folie 38 © Prof. Dr. Dirk C. Mattfeld