Predicting popular areas of a tiled Web map as a strategy for server-side caching Sterling Quinn
Introduction This project presents a predictive model for popular areas of a Web map Model output indicates where server-side cache of map tiles should be created Selectively caching based on popular map areas can save time and disk space
Project objectives Describe server-side caching of map tiles Describe the need for selective caching Present a predictive model for popular areas of the map Describe ways the model could be used and evaluated
Web map optimization and the advent of server-side caching
Organizing large maps in manageable “tiles” is not new Large paper map series are indexed in organized grids CGIS, a pioneering GIS, used “frames” to organize data (right) From Tomlinson, Calkins, & Marble, 1976, p. 56.
Other techniques for organizing maps in tiles or grid systems Pyramid technique successively generalizes rasters in groups of four cells (right) Quadtree structures index datasets in a hierarchy of quadrants From De Cola & Montagne, 1993, p
Server-side caching of map tiles is new Tiled maps allow users to retrieve just the needed pieces of the map Cached map tiles give extremely fast performance Traditional map servers (ArcIMS, WMS) draw the image on the fly Early static map servers returned the entire map at once
Advent of tiled maps and server- side caching Microsoft Terra Server an early deployment of massive amounts of cached imagery tiles Google Maps serves cached map tiles with AJAX techniques to create a “seamless” Web mapping experience Many sites have followed Google’s pattern
Caching options
Current caching options Current GIS software allows analysts to create tile caches for their own maps ESRI’s ArcGIS Server Mapnik Microsoft MapCruncher
Caching can require enormous resources on the server Caches covering big areas at large scales can include millions of tiles Many gigabytes, or even terabytes of storage Days, weeks, or sometimes months to generate Many GIS shops lack resources to maintain large caches
Selective caching as a strategy for saving resources Administrator can cache only the areas anticipated to be most visited Remaining areas can be: Added to the cache “on-demand” when first user navigates there Covered with a “Data not available” tile Left blank
Implications of selective caching Wise because some tiles (ocean, desert) will rarely, if never, be accessed Requires an admission that some areas are more important than others Poses challenge of predicting popular areas before the map is released.
The need for a predictive model
Project presents a predictive model for where to pre-cache tiles “Which places are most interesting?” Inputs are datasets readily available to GIS analyst Output vector features are a template for where to pre-cache tiles
Purpose of the model Help majority of users see a fast Web map while minimizing cache creation time and storage space
Not a descriptive model Descriptive model would show where existing users have already viewed Microsoft Hotmap a good example of a descriptive tool (right) Microsoft Hotmap Microsoft Hotmap Descriptive models are useful in deriving predictive models Source: Microsoft Hotmap
Advantages of a predictive model Doesn’t require the map to be deployed already Can include fixed and varying geographic phenomena Has applications far beyond map caching
Proposed methods
Study area and conditions Model will predict popular places for a general base map Study area of California May create models for thematic maps if time allows
Input datasets Populated / developed areas Road networks Coastlines Points of interest
Populated / developed areas Human Influence Index grid created by the Socioeconomic Data and Applications Center (SEDAC) at Columbia University Model selects all grid cells over a certain value
Road networks Major roads buffered by a given distance All roads within national parks, monuments, historical sites, and recreation areas, buffered by a given distance
Coastlines All coastlines buffered by a given distance (wider buffer on inland side)
Points of interest Set of 50 interesting points chosen by model author Mountain peaks Theme parks Sports arenas Etc. Represents a flexible layer that could be tailored to local needs
Deriving the output Merge all layers together Clip to California outline buffered by ½ mile Remove small holes and polygons Dissolve into one multipart feature
Using the model output Output is a vector dataset that can be used as a template for creating cached tiles Compare model output area with total area to understand percent coverage Compare model output with actual usage over time Refine if necessary
Limitations Models on a world or country level should account for Internet connectivity Input datasets have varying collection dates Input datasets vary in resolution and precision Maps with many scales might require multiple iterations and variations of the model
Questions?