Download presentation

Presentation is loading. Please wait.

Published byKevon Trussell Modified over 2 years ago

1
CS548 Spring 2015 Showcase By Yang Liu, Viseth Sean, Azharuddin Priyotomo Showcasing work by Le, Abrahart, and Mount on "M5 Model Tree applied to modelling town centre area activities for the city of Nottingham" bbc.co.uk

2
References [WFH 2011] Ian H. Witten, Eibe Frank, Mark A. Hall (2011). Data Mining: Practical Machine Learning Tools and Techniques 3rd Edition (pp. 67-68, 251-259). Burlington, MA: Morgan Kaufmann. [LAM 2007] T. K. T. Le, R. J. Abrahart, N. J. Mount (2007). M5 Model Tree applied to modelling town centre area activities for the city of Nottingham. Proceedings of the 9th International Conference on GeoComputation National Centre for Geocomputation, National University of Ireland, Maynooth, September 2007 [WW 1997] Y. Wang, I. H. Witten. Induction of model trees for predicting continuous classes. In Proc European Conference on Machine Learning Poster Papers, pages 128-137, Prague, Czech Republic, 1997. [Quin 1992] Ross J. Quinlan. Learning with Continuous Classes. In 5th Australian Joint Conference on Artificial Intelligence, Singapore, pages 343-348, 1992.

3
Content ❖ Regression Tree & Model Tree ❖ Model Tree Induction Algorithm ❖ Real Application

4
Regression Tree & Model Tree Taken from [WFH 2011] Regression Tree: Number @ each leaf Model Tree: Linear Regression Model @ each leaf Taken from [WFH 2011]

5
Model Tree Induction (M5) ❖ Following ordinary decision tree induction algorithm to build an initial tree. ❖ Splitting criterion: Std Dev instead of entropy; but, based on the same rationale: The lower the Std Dev, the shallower the subtree and the shorter the tree/rule. ❖ Pruning algorithm stays the same except replacing a sub- tree by a regression plane instead of a constant. ❖ Smoothing: remove any sharp discontinuities that exist between neighboring leaves of the pruned tree.

6
Real Application ❖ Analyzing patterns of city activities using spatial data ❖ Spatial data is usually stored as coordinates and topology, and is data that can be mapped. Spatial data is often accessed, manipulated or analyzed through Geographic Information Systems (GIS).GIS ❖ Main Attributes: TCPIs

7
❖ Town Center Performance Indicator ❖ Indicators used for defining vital activities in a town center ❖ Publicly agreed over a set of 8 TCPIs for this application TCPI

8
Town Center Performance Indicators GIS input layers: 8 considered TCPIs for Nottingham’s town centre Taken from [LAM 2007]

9
❖ The different perceptions of the significance of each TCPI and their relative importance ❖ How to choose representative sample ❖ How many linear models in the tree The Main Problems

10
Spatial Data Collection (Cool Stuff) Change spray size Wipe the map Add New Area When done click on “send” button User sprays on map Write in comments

11
Model Tree Creation ❖ Data instances: 4250 instances as training set, generated by random sampling ❖ Attributes: 8 TCPIs (Leisure, car park, commerce, public, pedestrian, industry, population, education) ❖ Splitting input space of the training set (town center area activities) into sub spaces (sub-areas) ❖ Building a linear regression model (at the leaves) for each sub-space

12
An Example of M5 Algorithm Splitting the input space of the training set[X1, X2] using M5 algorithm Each model is a linear regression model Y = a0 + a1X1 + a2X2 Taken from [LAM 2007]

13
The Model Tree Tree model results from 4250 instances for eight TCPIs Associated indicators ❖ Commerce ❖ Pedestrian ❖ Leisure ❖ Car_park ❖ Public_building Less associated indicators ❖ Population ❖ Industry ❖ Education Taken from [LAM 2007]

14
Why choose 14 linear models? Taken from [LAM 2007]

15
A single overall public mental town centre map(web-based GIS survey) Nottingham Mental Map ●Target output of the model ●The darker the red color is, the more confident those areas belong to town center area activities. Taken from [LAM 2007]

16
Result In Maps 14th linear model 13th linear model 12th linear model High dense in commercial and pedestrian flow High dense in commercial Less dense in commercial & High dense in leisure and pedestrian flow Taken from [LAM 2007]

17
Result In Maps 3rd linear model 2nd linear model Less dense in commercial & High in Residential use High dense in Industry Taken from [LAM 2007]

18
Pros of this Model Tree ❖ Tells the story of how significant each indicator (attribute) is for prediction ❖ Tells to which degree each indicator explains the output (town center area activities) ❖ Is particularly useful for natural temporal and complex characteristics of urban city

19
Thank You!!!

Similar presentations

OK

Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.

Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on face recognition project Ppt on question tags esl Ppt on information technology companies Ppt on sanskrit grammar for class 10 Ppt on obesity prevention program Ppt on summary writing examples Seminar ppt on data mining Ppt on polynomials and coordinate geometry games Ppt on world diabetes day facebook Ppt on fact and opinion