Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lars Arge 1/14 A A R H U S U N I V E R S I T E T Department of Computer Science Efficient Handling of Massive (Terrain) Datasets Professor Lars Arge University.

Similar presentations


Presentation on theme: "Lars Arge 1/14 A A R H U S U N I V E R S I T E T Department of Computer Science Efficient Handling of Massive (Terrain) Datasets Professor Lars Arge University."— Presentation transcript:

1 Lars Arge 1/14 A A R H U S U N I V E R S I T E T Department of Computer Science Efficient Handling of Massive (Terrain) Datasets Professor Lars Arge University of Aarhus and Duke University September 22, 2006

2 Lars Arge 2/14 A A R H U S U N I V E R S I T E T Department of Computer Science Who am I?  PhD in Computer Science 1996 from AU  Research career at Duke University in US  Post doc 1996……Professor 2006  Ole Rømer Professorship AU 2004  Research:  Efficient algorithms and data structures for massive datasets  Lots of theoretical work…  … but also practical work, especially on GIS and terrain data  Funded by e.g. US ARO and NSF, Danish basic and strategic research councils, private companies (COWI, DHI)

3 Lars Arge 3/14 A A R H U S U N I V E R S I T E T Department of Computer Science Massive Data Algorithmics  Massive data being acquired/used everywhere  Storage management software is billion-$ industry  Science is increasingly about mining massive data (Nature 2/06) Examples (2002):  Phone: AT&T 20TB phone call database, wireless tracking  Consumer: WalMart 70TB database, buying patterns  WEB: Google index 8 billion web pages  Geography: NASA satellites generate Terrabytes each day

4 Lars Arge 4/14 A A R H U S U N I V E R S I T E T Department of Computer Science Terrain Data  New technologies: Much easier/cheaper to collect detailed data  Previous ‘manual’ or radar based methods −Often 30 meter between data points −Sometimes 10 meter data available  New laser scanning methods (LIDAR) −Less than 1 meter between data points −Centimeter accuracy (previous meter) Denmark:  ~2 million points at 30 meter (<<1GB)  ~18 billion points at 1 meter (>>1TB)  COWI A/S in the process of scanning DK  NC scanned after Hurricane Floyd in 1999

5 Lars Arge 5/14 A A R H U S U N I V E R S I T E T Department of Computer Science Massive data = I/O-Bottleneck  I/O is often bottleneck when handling massive datasets  Disk access is 10 6 times slower than main memory access!  Disk systems try to amortize large access time transferring large contiguous blocks of data  Need to store and access data to take advantage of blocks!

6 Lars Arge 6/14 A A R H U S U N I V E R S I T E T Department of Computer Science I/O-efficient Algorithms  Taking advantage of block access very important  Traditionally algorithms developed without block considerations  I/O-efficient algorithms leads to large runtime improvements data size running time Normal algorithm I/O-efficient algorithm Main memory size

7 Lars Arge 7/14 A A R H U S U N I V E R S I T E T Department of Computer Science RAMRAM Scalability: Hierarchical Memory  Block access not only important on disk level  Machines have complicated memory hierarchy  Levels get larger and slower  Block transfers on all levels L1L1 L2L2 data size running time

8 Lars Arge 8/14 A A R H U S U N I V E R S I T E T Department of Computer Science My Research Work  Theoretically I/O- (and cache-) efficient algorithms work  Data structures, computational geometry, graph theory, …  Focus on Geographic Information Systems problems  Algorithm engineering work, e.g  TPIE system for simple, efficient, and portable implementation of I/O-efficient algorithms  Software for terrain data processing  LIDAR data handling  Terrain flow computations

9 Lars Arge 9/14 A A R H U S U N I V E R S I T E T Department of Computer Science Example: Terrain Flow  Terrain water flow has many important applications  Predict location of streams, areas susceptible to floods…  Conceptually flow is modeled using two basic attributes  Flow direction: The direction water flows at a point  Flow accumulation: Amount of water flowing through a point  Flow accumulation used to compute other hydrological attributes: drainage network, topographic convergence index… 7 am 3pm

10 Lars Arge 10/14 A A R H U S U N I V E R S I T E T Department of Computer Science Terrain Flow Accumulation  Collaboration with environmental researchers at Duke University  Appalachian mountains dataset:  800x800km at 100m resolution  a few Gigabytes  On ½GB machine:  ArcGIS:  Performance somewhat unpredictable  Days on few gigabytes of data  Many gigabytes of data…..  Appalachian dataset would be Terabytes sized at 1m resolution 14 days!!

11 Lars Arge 11/14 A A R H U S U N I V E R S I T E T Department of Computer Science Terrain Flow Accumulation: TerraFlow  We developed theoretically I/O-optimal algorithms  TPIE implementation was very efficient  Appalachian Mountains flow accumulation in 3 hours!  Developed into comprehensive software package for flow computation on massive terrains: TerraFlow  Efficient: 2-1000 times faster than existing software  Scalable: >1 billion elements!  Flexible: Flexible flow modeling (direction) methods  Extension to ArcGIS

12 Lars Arge 12/14 A A R H U S U N I V E R S I T E T Department of Computer Science LIDAR Terrain Data Work  Now “pipeline” of software for handling terrain (LIDAR) data  Points to DEM (incl. breaklines)  DEM flow modeling (incl “flooding”, “flat” routing, “noise” reduce)  DEM flow accumulation (incl river extraction)  DEM hierarchical watershed computation  All work for both grid and TIN DEM’s  Capable of handling massive datasets  Test dataset: 400M point Neuse river basin (1/3 NC) (>17GB)

13 Lars Arge 13/14 A A R H U S U N I V E R S I T E T Department of Computer Science Examples of Ongoing Terrain Work  Terrain modeling, e.g  “Raw” LIDAR to point conversion (LIDAR point classification) (incl feature, e.g. bridge, detection/removal)  Further improved flow and erosion modeling (e.g. carving)  Contour line extraction (incl. smoothing and simplification)  Terrain (and other) data fusion (incl format conversion)  Terrain analysis, e.g  Choke point, navigation, visibility, change detection,…  Major grand goal:  Construction of hierarchical (simplified) DEM where derived features (water flow, drainage, choke points) are preserved/consistent

14 Lars Arge 14/14 A A R H U S U N I V E R S I T E T Department of Computer Science Thanks Lars Arge large@daimi.au.dk Work supported in part by  US National Science Foundation ESS grant EIA–98070734, RI grant EIA–9972879, CAREER grant EIA–9984099, and ITR grant EIA– 0112849  US Army Research Office grants W911NF-04-1-0278 and DAAD19-03- 1-0352  Ole Rømer Scholarship from Danish Science Research Council  NABIIT grant from the Danish Strategic Research Council


Download ppt "Lars Arge 1/14 A A R H U S U N I V E R S I T E T Department of Computer Science Efficient Handling of Massive (Terrain) Datasets Professor Lars Arge University."

Similar presentations


Ads by Google