Download presentation
Presentation is loading. Please wait.
Published byPierce Allen Modified over 8 years ago
1
1 GeoServer in Production Andrea Aime (OpenGeo) and the GeoServer community
2
2 Executive summary Reliability: reduce failed requests Availability: make sure the service will be up in face of difficulties Performance: make the software stack fast Real world examples INTRODUCTION
3
3 Reliability
4
4 Measures failed requests in a given amount of time Avoid errors, exert control over your service: – Simple is better – Control resources used by each request RELIABILITY
5
5 Simplify Disable all services that you won't need. For example, if you just need WMS and read only WFS: – Disable WCS – Disable WFS-T RELIABILITY
6
6 Control resources Setup service resource limits: – WMS: max memory per request, max time per request – WFS: max features returned – Web container: max number of concurrent connections allowed RELIABILITY WFS WMS
7
7 Sizing the WMS memory limits Make sure the JVM (Java Virtual Machine) has enough memory to handle the expected WMS workload Memory used by a request: – bpp x width x height x maxfts bpp: bytes per pixels (3 if no transparency, 4 if transparency used) width, height: size of the image maxfts: maximum number of “FeatureTypeStyle” found in the styles used by the request RELIABILITY
8
8 Memory usage examples Memory allocated to produce a certain output varyi\ng size, transparency and number of FeatureTypeStyle elements. The actual output size is much smaller, around 300KB for the last request RELIABILITY
9
9 Resources used by WFS WFS normally works in full streaming: get a record, output it, move to the next Memory consumption: – Minimal for GML, JSON, CSV – Shape-zip output format uses disk – Excel fully in RAM, pay attention Most outputs are big, but GeoServer transparently compresses them to reduce their size (HTTP 1.1 gzip compression). RELIABILITY
10
10 Response time The bigger the output, the more time to return it back Parallel users will have to share your available bandwidth, each will get only a percentage of it: the more concurrent users, the more time to deliver responses Plan to have enough bandwidth to deal with the expected peak load Configure your server to allow no more than X requests in parallel (Tomcat allows 200 by default, too many)
11
11 Time it takes to deliver a 300KB response (screen sized complex WMS response, or 8000 WFS features, gzip compressed) varying the available bandwidth Transfer time sampler
12
12 Tomcat configuration sample Tomcat configures the max number of requests being served in parallel in $TOMCAT/etc/server.xml... <Connector port="8080" protocol="HTTP/1.1" ConnectionTimeout="20000" redirectPort="8443" maxThreads="20" minSpareThreads="20" />...
13
13 Availability
14
14 Availability Measures uptime percentage: uptime / (uptime + downtime) Keep the service up, share the load – Actively control the service (watchdog) – Add redundancy – Remove single points of failure AVAILABILITY
15
15 Watchdog An external software that periodically checks if the server is still up and running. RELIABILITY GeoServer Still there? Of course I am! Still there? Die! GeoServer Start new one GeoServer So busy! GeoServer Anybody? Start new one
16
16 #!/bin/bash PID_FILE=/var/run/catalina/catalina.pid HTTP_URL=http://localhost:8080/geoserver/openlayers/img/west-mini.png CATALINA_SCRIPT=/opt/tomcat-6.0/bin/catalina.sh GEOSERVER_LOG=/var/lib/geoserver_data/default/logs/geoserver.log CATALINA_LOG=/opt/tomcat-6.0/logs/catalina.out LOG_COPY=/home/tomcat PID=`cat $PID_FILE` function catalinarestart() { $CATALINA_SCRIPT stop sleep 5 kill 9 $PID cp $GEOSERVER_LOG $LOG_COPY cp $CATALINA_LOG $LOG_COPY $CATALINA_SCRIPT start } if [ -d /proc/$PID ] then wget $HTTP_URL -T 1 --timeout=20 -O /dev/null &> /dev/null if [ $? -ne "0" ] then echo Restarting Catalina because $HTTP_URL does not respond, pid $PID catalinarestart #else #echo No Problems! fi else echo Restarting Catalina because pid $PID is dead. catalinarestart fi RELIABILITY Setup variables Kills and restarts GS Kills and restarts GS Schedule periodically with cron
17
17 Load balancing Make it so a single failure won't bring down the entire service, and enable horizontal scalability OGC services are stateless: no need to have a client always talk to the same server, there is no session. No need for application server clusters, a simple load balancer will suffice AVAILABILITY OGC Server OGC Server OGC Server HTTP Load balancer Single point of failure Redundant, single failure safe
18
18 High Availability (HA) A setup with no single points of failure – GeoServer is replicated – The data sources are replicated – The load balacer itself is replicated Usually done at the load balancer level, either via hardware (enterprise level load balancer) or software AVAILABILITY
19
19 A simple example The very minimal configuration, only 2 servers to make an HA setup Only open source software: – vrrpd daemon that provides transparent fail-over over a single IP (http://off.net/~jme/vrrpd/) – balance as the software load balancer (http://www.inlab.de/balance.html) – GeoServer and PostGIS AVAILABILITY
20
20 Router 10.1.1.2 / 10.1.1.4 10.1.1.3 / 10.1.1.4 vrrpd1 vrrpd2 balance2GeoServer2PostGIS GeoServer1PostGIS Net AVAILABILITY 10.1.1.4, please execute this GetMap 10.1.1.4, that would be me! Sending requests to both GeoServer-s Inspired by “the ultimate cheapskate cluster” http://siag.nu/pen/vrrpd-linux2.shtml balance1
21
21 Router 10.1.1.2 10.1.1.4 10.1.1.3 10.1.1.4 vrrpd1 vrrpd2 balance1 balance2GeoServer2PostGIS2 GeoServer1PostGIS Net AVAILABILITY 10.1.1.4, please execute this GetMap 10.1.1.4? Err... I guess it's my turn... yeah, it's me! Oh, GeoServer1 is not responding, better stop routing requests there Wow, what's with all these requests??
22
22 Performance
23
23 Performance Once the service is reliable, make it fast Performance is related to a number of items: – Hardware – Data level optimization – Software optimization – Network saturation – Caching –... PERFORMANCE
24
24 Software configuration Sun JVM is significantly faster than other alternatives in WMS tests Use a recent JVM release Make sure the VM is running in server mode Check “Java server ergonomics” http://java.sun.com/docs/hotspot/gc5.0/ergo5.html > java -version java version "1.6.0_16" Java(TM) SE Runtime Environment (build 1.6.0_16-b01) Java HotSpot(TM) Server VM (build 14.2-b01, mixed mode) PERFORMANCE
25
25 Detailed JVM setup If the machine is not a server class, or if you want to get the last bit of performance: – -server: enables the server JIT compiler – -Xms2048m -Xmx2048m: have the JVM use two gigabytes of memory, always – -XX:+UseParallelOldGC -XX:+UserParalleGC: enables multi-threaded garbage collections, useful if you have more than two cores – -XX:NewRatio=2: informs the JVM there will be a high number of short lived objects – -XX:+AggressiveOpt: enable experimental optimizations that will be defaults in future versions of the JVM
26
26 JAI and JAI Image I/O Java Advanced Imaging: raster manipulation and analysis JAI Image I/O: image encoding and decoding Both come in a pure Java version with the option to install native components to boost performance Output encoding (png, jpeg) Output encoding (png, jpeg) Raster scaling, slicing, re-projection Raster data reading (TIFF) JAI JAI IIO JAI JAI IIO PERFORMANCE
27
27 JDK/JAI comparison PERFORMANCE TIGER roads (3M roads), Texas area, randomized requests Same benchmark done in the “benchmark shootout”, but with simpler styling (plain black line), and run on a slower machine Compares OpenJDK6 with Sun JDK 1.6.0_16, each run with and without the help of the JAI and JAI Image I/O native extensions
28
28 One minute data optimization Vector: index the attributes you're filtering on (spatial, non spatial) Raster: add tiling and overviews to single files, create pyramids for bigger data sets Style: use scale dependencies to draw less, reserve complex styling for the higher zoom levels
29
29 Real world examples
30
30 MassGIS ArcSDE 9.2 sp5 Oracle 10.2 Balance SAMBA GeoServer Squid Watchdog GeoServer Squid Watchdog GeoServer Squid Watchdog 2.8Ghz Xeon 1.5GB RAM Debian Squid HTTP accelerator 700Mhz Pentium Debian Vector data Raster data Serving over 22.000 requests a day A mix of open source and proprietary GIS http://lyceum.massgis.state.ma.us/wiki
31
31 Enterprise Load balancer GeoServer Enterprise Load balancer WMS/WFS requests Tile requests Testing/pre-seeding server Ready made tile cache PostGIS Config updates via Subversion TileCache GeoServer PostGIS TileCache Production Staging Balance GeoServer PostGIS TileCache Balance Trimet A fully OS example Two enterprise load balancers in fail-over http://maps.trimet.org/
32
32 GeoServer TileCache DeeGree 52North Enterprise Load balancer 3 servers for WMS and tiles 3 servers for WFS and WPS Oracle RAC Oracle RAC Oracle RAC 5 servers Spatial database Enterprise Load balancer Two load balancer in HA configuration Each server is an Opteron 880, 2.4Ghz 8GB memory, Windows Server 2003 64 bit CartoCiudad A highly available Windows Server cluster Instituto Geográfico Nacional, Spain http://www.cartociudad.es/http://www.cartociudad.es/ http://www.cartociudad.es/portalhttp://www.cartociudad.es/portal
33
33 Oracle RAC Oracle RAC GIS blade Oracle RAC Oracle RAC HA load Balancer blade Spare blade NAS server GeoServer VM * 8 PostGIS VM GIS blade: 9 VMWare Ubuntu Server LTS, 8 with GS, 1 with PostGIS Apache mod_proxy mod_balance * 5 * 2 Each blade - 2 x Intel Quad-Core Xeon E5310 / 1.6 GHz - 32 GB RAM * 5 4x - HP 2312fc DC Modular Smart Array 40 x - HP MSA2 1TB 7.2K rpm 3.5 inch SATA HDD Italian Civil Protection, GeoSDI team A “cloud in the house” approach with powerful hardware and virtualization http://www.geosdi.org/
34
34 That's all, folks! Questions?
35
35 A few of the things I'd liked to talk about, but... would not fit in 25 minutes – Requirements checklist: determine what the service is going to do – Quick tips about data and style setup – GeoWebCache considerations Bonus trail!
36
36 REQUIREMENTS CHECKLIST
37
37 Workload What services? – WMS, WFS, WCS,... What kind of requests? – Read/write mix – Output formats used Requests size – WMS: tiled vs plotter printing – WFS: how long will a request be up Data sources – Optimized for web serving REQUIREMENTS
38
38 Availability requirements How long can be the server down? – Working hours availability, 24x7,... – Cost of each hour of downtime (in terms of money and/or image) How often can the server be down before causing major problems? REQUIREMENTS
39
39 Security requirements Any write access? Any sensible/reserved information? What kind of applications need to access your servers, and what kind of security do they support natively? Do you control the clients, can you ask them to install extra software? REQUIREMENTS
40
40 Performance requirements Any service level agreement? Examples: – no more than 5s to generate a 800x600 map with a certain set of layers under peak load – Serve 1000 features in GML2 within 10s at peak load If none, make some reasonable assumptions, have some objectives to meet anyway REQUIREMENTS
41
41 WFS Extras
42
42 ● Size of the response to a WFS request for the tiger:tiger_roads sample layer (8000 features, simple ones) ● Transparent gzip compression reduces the amount of data sent by 90% in this test WFS response size sampler
43
43 Quick tips about data preparation
44
44 Vector data setup Choose formats designed for performance: – Spatial databases – Shapefiles Avoid interchange formats, such as raw GML, as data sources Always use spatial indexing For multi-scale rendering – use attribute indexing as well – consider using different data sources for different scales (different levels of generalization) PERFORMANCE
45
45 Setting up raster data Prepare the data for web serving Tiling and multi-resolution are the key Single file: GeoTIFF with tiling and overviews or wavelet formats (ECW, MrSid) Many files: external pyramid (fs or dbms) GDAL utilities are a must
46
46 Styling setup Use scale limits to avoid rendering too many features at once – Draw few features when zoomed out – Draw important features at middle scales – Draw everything only when you're about to paint 1000 features per request Careful about heavy styling: partial transparency, labeling, halos, multiple feature type styles and multiple symbolizers per feature PERFORMANCE
47
47
48
48 Some more details about GeoWebCache
49
49 Avoid repeated work: caching Trading off CPU time for disk space A tile cache (GeoWebCache, TileCache) can perform caching on layers that – do not change over time (or change slowly) – do not require on the fly custom styling and filtering If there is a trend of repeated WFS queries Squid reverse proxy can come in handy as well The speed up factor of a fully cached layer can be 10, 100 or even 1000 PERFORMANCE
50
50 GWC vs GeoServer, time PERFORMANCE 256x256 tiles 1.2 GB.shp file with over 5 million lines representing roads and rivers over Texas (TIGER 2008) Simple black line style GWC is fully pre- cached Speedup factor … between 20 and 225 Average resp. time ms/number of concurrent users
51
51 GWC vs GeoServer, space http://geowebcache.org/trac/wiki/resources How much time and space to prepare the cache for the same area (Texas)? Assumptions: – time to draw a 768x768 image = 0.1s – tile is 20KB on average – 4 CPUs
52
52 What about security? Important topic for sure. Also a very wide one, deserves its own presentation. Some possible topics: Authentication protocols and their support in client applications Authentication backends (built in, LDAP, JASS,...) Getting SSL keys Using SSL in a clustered environment, issues with load balancing and number of keys to be acquired Built in security system vs security proxies, pros and cons The GeoXACML standard Performance considerations
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.