Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lars Arge 1/13 Efficient Handling of Massive (Terrain) Datasets Lars Arge A A R H U S U N I V E R S I T E T Department of Computer Science.

Similar presentations

Presentation on theme: "Lars Arge 1/13 Efficient Handling of Massive (Terrain) Datasets Lars Arge A A R H U S U N I V E R S I T E T Department of Computer Science."— Presentation transcript:

1 Lars Arge 1/13 Efficient Handling of Massive (Terrain) Datasets Lars Arge A A R H U S U N I V E R S I T E T Department of Computer Science

2 Lars Arge 2/13 Massive Data Algorithmics  Massive data being acquired/used everywhere  Storage management software is billion-$ industry  Science is increasingly about mining massive data (Nature 2/06) Examples (2002):  Phone: AT&T 20TB phone call database, wireless tracking  Consumer: WalMart 70TB database, buying patterns  WEB: Google index 8 billion web pages  Geography: NASA satellites generate Terrabytes each day

3 Lars Arge 3/13 Terrain Data  New technologies: Much easier/cheaper to collect detailed data  Previous ‘manual’ or radar based methods −Often 30 meter between data points −Sometimes 10 meter data available  New laser scanning methods (LIDAR) −Less than 1 meter between data points −Centimeter accuracy (previous meter) Denmark:  ~2 million points at 30 meter (<<1GB)  ~18 billion points at 1 meter (>>1TB)  COWI (and other) now scanning DK  NC scanned after Hurricane Floyd in 1999

4 Lars Arge 4/13 Massive data = I/O-Bottleneck  I/O is often bottleneck when handling massive datasets  Disk access is 10 6 times slower than main memory access!  Disk systems try to amortize large access time transferring large contiguous blocks of data  Need to store and access data to take advantage of blocks! “The difference in speed between modern CPU and disk technologies is analogous to the difference in speed in sharpening a pencil using a sharpener on one’s desk or by taking an airplane to the other side of the world and using a sharpener on someone else’s desk.” (D. Comer)

5 Lars Arge 5/13 I/O-efficient Algorithms  Taking advantage of block access important  Traditionally algorithms developed without block considerations  I/O-efficient algorithms leads to large runtime improvements data size running time Normal algorithm I/O-efficient algorithm Main memory size

6 Lars Arge 6/13 RAMRAM Scalability: Hierarchical Memory  Block access not only important on disk level  Machines have complicated memory hierarchy  Levels get larger and slower  Block transfers on all levels L1L1 L2L2 data size running time

7 Lars Arge 7/13 My Research Work  Theoretically I/O- (and cache-) efficient algorithms work  Data structures, computational geometry, graph theory, …  Focus on Geographic Information Systems problems  Algorithm engineering work, e.g  TPIE system for simple, efficient, and portable implementation of I/O-efficient algorithms  Software for terrain data processing  LIDAR data handling  Terrain flow computations

8 Lars Arge 8/13 Example: Terrain Flow  Terrain water flow has many important applications  Predict location of streams, areas susceptible to floods…  Conceptually flow is modeled using two basic attributes  Flow direction: The direction water flows at a point  Flow accumulation: Amount of water flowing through a point  Flow accumulation used to compute other hydrological attributes: drainage network, topographic convergence index… 7 am 3pm

9 Lars Arge 9/13 Terrain Flow Accumulation  Collaboration with environmental researchers at Duke University  Appalachian mountains dataset:  800x800km at 100m resolution  a few Gigabytes  On ½GB machine:  ArcGIS:  Performance somewhat unpredictable  Days on few gigabytes of data  Many gigabytes of data…..  Appalachian dataset would be Terabytes sized at 1m resolution 14 days!!

10 Lars Arge 10/13 Terrain Flow Accumulation: TerraFlow  We developed theoretically I/O-optimal algorithms  TPIE implementation was very efficient  Appalachian Mountains flow accumulation in 3 hours!  Developed into comprehensive software package for flow computation on massive terrains: TerraFlow  Efficient: times faster than existing software  Scalable: >1 billion elements!  Flexible: Flexible flow modeling (direction) methods  Extension to ArcGIS

11 Lars Arge 11/13 LIDAR Terrain Data Work: TerraStream  Now TerraStream software “pipeline” for handling terrain data  Points to DEM (incl. breaklines)  DEM flow modeling (incl “flooding”, “flat” routing, “noise” reduce)  DEM flow accumulation (incl river extraction)  DEM hierarchical watershed computation  All work for both grid and TIN DEM’s  Capable of handling massive datasets  Test dataset: 400M point Neuse river basin (1/3 NC) (>17GB)

12 Lars Arge 12/13 Examples of Ongoing Terrain Work  Terrain modeling, e.g  “Raw” LIDAR to point conversion (LIDAR point classification) (incl feature, e.g. bridge, detection/removal)  Further improved flow and erosion modeling (e.g. carving)  Contour line extraction (incl. smoothing and simplification)  Terrain (and other) data fusion (incl format conversion)  Terrain analysis, e.g  Choke point, navigation, visibility, change detection,…  Major grand goal:  Construction of hierarchical (simplified) DEM where derived features (water flow, drainage, choke points) are preserved/consistent

13 Lars Arge 13/13 Thanks Lars Arge Work supported in part by  US National Science Foundation ESS grant EIA– , RI grant EIA– , CAREER grant EIA– , and ITR grant EIA–  US Army Research Office grants W911NF and DAAD  Ole Rømer Scholarship from Danish Science Research Council  NABIIT grant from the Danish Strategic Research Council  Danish National Research foundation (MADALGO center)

Download ppt "Lars Arge 1/13 Efficient Handling of Massive (Terrain) Datasets Lars Arge A A R H U S U N I V E R S I T E T Department of Computer Science."

Similar presentations

Ads by Google