Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Efficient External Sorting Algorithm for Flash Memory Embedded Devices Tyler Cossentine - M.Sc. Thesis Defense.

Similar presentations

Presentation on theme: "An Efficient External Sorting Algorithm for Flash Memory Embedded Devices Tyler Cossentine - M.Sc. Thesis Defense."— Presentation transcript:

1 An Efficient External Sorting Algorithm for Flash Memory Embedded Devices Tyler Cossentine - M.Sc. Thesis Defense

2 Overview Introduction Previous work Flash MinSort Experimental Results Conclusions Tyler Cossentine - M.Sc. Thesis Defense1

3 Introduction Embedded systems are devices that perform a few simple functions. Embedded devices typically have limited power, memory and computational resources. Many embedded systems applications involve storing and querying large datasets. Sorting algorithms are commonly used in query processing. Tyler Cossentine - M.Sc. Thesis Defense2

4 Embedded Devices Not designed to be general purpose devices. o Wireless sensor networks, smart cards, etc. Can communicate with other devices through wired or wireless interfaces. Hardware constraints: o Battery powered o Low-power microcontroller o Limited memory (as little as a 1kB) o Small amount of local storage (Flash or EEPROM) Tyler Cossentine - M.Sc. Thesis Defense3

5 Sensor Networks Sensor networks are used in military, environmental, agricultural and industrial applications. A wireless sensor node contains a microcontroller, sensing system, local storage, battery and wireless radio. Devices may process data locally or send it to a common collection point (sink) for processing. On-device data storage and query processing has the potential to reduce communication and energy use [6][8]. Tyler Cossentine - M.Sc. Thesis Defense4

6 Flash Memory A type of EEPROM o Available in higher capacities o Organized as pages of data o A page is erased before it is written o Erase unit is typically a block of pages Two types: NOR and NAND o NOR memory supports byte-level reads o NAND requires error-correcting code (ECC) Unique performance characteristics o Asymmetric read and write costs (10-100 times faster reads) o Low-cost random reads o Memory wear Tyler Cossentine - M.Sc. Thesis Defense5

7 Flash Memory Tyler Cossentine - M.Sc. Thesis Defense6 Memory Array [1]

8 Flash Memory Tyler Cossentine - M.Sc. Thesis Defense7 Block Diagram [1]

9 Relation Tyler Cossentine - M.Sc. Thesis Defense8

10 Sorting Algorithms Sorting is a fundamental class of algorithms because it allows for efficient ordering of results, joins, grouping and aggregation. An in-place sort can be performed when the entire dataset fits into memory: o Merge sort o Quicksort External sorting: o Use external memory (hard disk) to sort the dataset o External merge sort is the standard in databases Tyler Cossentine - M.Sc. Thesis Defense9

11 Previous Work The most memory efficient external sorting algorithm is one key scan [2]. o Performs D+1 scans, where D is the #of distinct sort key values. o Keeps track of: current is the sort key value that is being output in this scan. split is the next smallest sort key value encountered. The algorithm needs an initial scan to determine the values of current and split. o Requires enough memory to store two sort key values. Tyler Cossentine - M.Sc. Thesis Defense10 One Key Scan

12 Previous Work A heap sort algorithm, called FAST(1) [7], uses a binary heap of size N tuples to store the next smallest tuples encountered during a scan. o Performs T/N scans, where T is the # of tuples and N is the number of tuples that fit into memory o Requires enough memory to store a tuple o May be slower than one key scan if there are few distinct sort key values, the tuple size is large or the dataset is large. Tyler Cossentine - M.Sc. Thesis Defense11 Heap Sort

13 Previous Work The external merge sort [5] algorithm is the standard sorting algorithm used in databases. o An initial read pass constructs sorted sub lists the size of the amount of RAM allocated to the operator. o The merge phase can consist of multiple passes. o Each pass buffers one page from each of the sub lists, performs a merge and writes a temporary result to flash. o The algorithm requires at least three pages of memory. Tyler Cossentine - M.Sc. Thesis Defense12 External Merge Sort

14 Previous Work External merge sort requires writing and a significant amount of memory that makes it non-executable in certain embedded applications. Existing sorting algorithms for datasets stored in flash memory favor reads over writes. Existing sorting algorithms do not take advantage of low-cost random reads. Performance depends on the properties of the input dataset. Data collected in applications such as sensor networks is often clustered spatially and temporally. Tyler Cossentine - M.Sc. Thesis Defense13 Summary

15 Flash MinSort Flash MinSort [3] uses low-cost random reads to retrieve only required pages during a scan of the relation. It builds a dynamic index over the relation that stores the minimum value in each region. A region represents one or more pages of data. The algorithm maintains a current minimum value and next minimum value. During a pass, only pages located in a region that has a minimum value equal to the current minimum are read. Tyler Cossentine - M.Sc. Thesis Defense14 Overview

16 Flash MinSort The algorithm keeps track of the next smallest value in a region as it is being read (nextIdx). After a region has been read, its minimum value in the index is updated. Adapts to the size of the input relation and caches pages when given additional memory. Tyler Cossentine - M.Sc. Thesis Defense15 Overview

17 Flash MinSort Tyler Cossentine - M.Sc. Thesis Defense16 Example Page 1 2 3 4 5 6 7 8 9 10 11 12 Data 1991 9999 9899 8877 6665 4432 2121 1111 2345 6789 9898 8999 Min 1 9 8 7 5 2 1 1 2 6 8 9 Output #1 Scan Min index Find 1 in region #1 Search page #1 Output tuple #1 next = 9, nextIdx = 4 Output #2 Output tuple #4 Region Min set to 9 Output #3 Find 1 in region #7 Search page #7 Output tuple #2 next = 2, nextIdx = 4 Output #4 Output tuple #4 Region Min set to 2 Output #5 Find 1 in region #8 Search page #8 Output tuple #1 next =, nextIdx = 2 Output #6 Output tuple #2 next =, nextIdx = 3 Output #7 Output tuple #3 next =, nextIdx = 4 Output 1 (from pg. 1, tuple 1) 1 (from pg. 1, tuple 4) 1 (from pg. 7, tuple 2) 1 (from pg. 7, tuple 4) 1 (from pg. 8, tuple 1) 1 (from pg. 8, tuple 2) 1 (from pg. 8, tuple 3) 1 (from pg. 8, tuple 4) 2 (from pg. 6, tuple 4) 2 (from pg. 7, tuple 1) 2 (from pg. 7, tuple 3).... 9 x x x 2 IndexDataset Page Buffer 1991 2121 1111

18 Flash MinSort In the ideal case, each region represents a single page. The amount of memory required to store the minimum value of each page is L K * P, where L K is the size of the sort key and P is the number of pages. If there is not enough memory, each region represents two or more adjacent pages. The minimum amount of memory required is 4*L K for two regions. Tyler Cossentine - M.Sc. Thesis Defense17 Performance

19 Flash MinSort If the flash chip supports direct byte reads, Flash MinSort is even more efficient as it only needs to read the sort key values. Performance: o P = # of pages, T = # of tuples, N P = # of pages in a region o D R = average # of distinct values in a region, R = # of regions o L K = size of key in bytes, L T = size of tuple in bytes Tyler Cossentine - M.Sc. Thesis Defense18 Direct Reads

20 Flash MinSort Considering only page reads Flash MinSort is: o Faster than one key sort in all cases. o Faster than heap sort unless input size is only a small multiple of the memory size (e.g. 2 to 5). o Faster than external merge sort for a large spectrum of the possible configurations even while using less memory and performing no writes. Tyler Cossentine - M.Sc. Thesis Defense19 Comparison AlgorithmPage I/OsNotes Flash MinSortP * (1 + D R ) One Key SortP * (1 + D)Perform scan for each distinct key Heap SortP * (T * L T ) / M# scans based on # tuples External Merge Sort (two pass) P * (2 + X)X is write-to-read ratio as algorithm must write as an intermediate step Two pass is not likely for small memory sizes

21 Experimental Evaluation Experimental evaluation compares: Flash MinSort, one key sort, heap sort, and external merge sort. 2kB of memory available to operators Sensor node hardware: o Atmel Mega644p (8 MHz) o 4KB SRAM o 2MB Atmel AT45DB161D serial flash (512 byte page size) o Node design was used for field measurement of soil moisture for use with an automated irrigation controller [4]. Dataset: o Three months of the live soil sensing data and generated ordered and random data sets. The real data set has 10,000 records (160KB) and 43 distinct values. o Record size is 16 bytes. Sort key is a 2 byte integer. Tyler Cossentine - M.Sc. Thesis Defense20

22 Raw Device Performance Time to read 50,000 tuples: 5.3 seconds Time to write 50,000 tuples: 23 seconds Write-to-read ratio: 4.7 Time to scan 50,000 sort keys: 2.1 seconds Notes: o Buffering a page in processor memory is more efficient than using on chip buffers due to bus communication and latency. o Bus speeds affect write-to-read ratio. Even though writing is considerably slower on the chip, this was masked due to the speed of the processor and bus. Tyler Cossentine - M.Sc. Thesis Defense21

23 Real Data Heap sort is not shown as time is order of magnitudes longer: o 100 bytes (5 tuple): 10,000 passes, 3,377 seconds o 1200 bytes (74 tuples): 302 seconds MinSortDR is a direct read version of MinSort. External merge: 1536 bytes (3 pages): 7 passes, 76 seconds Tyler Cossentine - M.Sc. Thesis Defense22

24 Random Data Data set with 10,000 records and 500 distinct values (1 to 500). Heap sort performs the same number of passes regardless of the data set (random, real, or ordered). External merge sort took 78 seconds as the sorting during initial run generation took slightly more time. Tyler Cossentine - M.Sc. Thesis Defense23

25 Ordered Data Sorted, real data set with 10,000 tuples and 43 distinct values. MinSort did not detect sorted regions but still gets a benefit by detecting duplicates of the same value in a region. External merge sort took 75 seconds. Tyler Cossentine - M.Sc. Thesis Defense24

26 Results Summary MinSort is faster than one key sort and heap sort with or without using direct byte reads from the device. o Especially good for sensor data that exhibits temporal clustering. o MinSort is a generalization of one key sort, and performance of both algorithms depends on the number of distinct values. Heap sort is not competitive for small memory sizes. o The ratio of available RAM versus dataset size is key. Tyler Cossentine - M.Sc. Thesis Defense25

27 Results Summary External merge sort performs well, but requires at least three pages (1,536 bytes) of memory. o For the real data set on this platform, external merge sort will never be faster assuming at least two passes. o For wireless sensing applications, dealing with the additional space and wear leveling complicates system design and performance. Tyler Cossentine - M.Sc. Thesis Defense26

28 Solid State Drives Solid state drives (SSD) have sophisticated controllers that support wear leveling, address translation and buffer management. Test system: o AMD Operton 2.1GHz o 32GB DDR3 o Intel X25 SSD (1.6 write-to-read ratio) Data: o 5,000,000 tuples (80MB) o 16B tuples Tyler Cossentine - M.Sc. Thesis Defense27 Experimental Setup

29 Solid State Drives Tyler Cossentine - M.Sc. Thesis Defense28 Real Data 43 distinct sort key values

30 Solid State Drives Tyler Cossentine - M.Sc. Thesis Defense29 Random Data 500 distinct sort key values

31 Conclusion Flash MinSort is a sorting algorithm designed for datasets stored in flash memory on computationally constrained embedded devices. Its performance is better than existing algorithms by exploiting low-cost random reads. Depending on the properties of the dataset, Flash MinSort can outperform External Merge Sort on SSDs. Tyler Cossentine - M.Sc. Thesis Defense30

32 References Tyler Cossentine - M.Sc. Thesis Defense31 [1]Atmel. Atmel Flash AT45DB161D Data Sheet, 2010. [2]N. Anciaux, L. Bouganim, and P. Pucheral. Memory Requirements for Query Execution in Highly Constrained Devices. In VLDB, pages 694–705, 2003. [3]T. Cossentine and R. Lawrence. Fast Sorting on Flash Memory Sensor Nodes. In IDEAS 2010, pages 105–113, 2010. [4]S. Fazackerley and R. Lawrence. Reducing Turfgrass Water Consumption Using Sensor Nodes and an Adaptive Irrigation controller. In Sensors Applications Symposium, Limerick, Ireland, 2010. [5]H. Garcia-Molina, J. D. Ullman, and J. Widom. Database Systems: The Complete Book. Prentice Hall Press, Upper Saddle River, NJ, USA, 1 edition, 2002. [6] G. Mathur, P. Desnoyers, D. Ganesan, and P. Shenoy. Ultra-Low Power Data Storage for Sensor Networks. In Proceedings of the 5th international conference on Information processing in sensor networks, IPSN 06, pages 374–381, New York, NY, USA, 2006. ACM.

33 References Tyler Cossentine - M.Sc. Thesis Defense32 [7]H. Park and K. Shim. FAST: Flash-Aware External Sorting for Mobile Database Systems. Journal of Systems and Software, 82(8):1298 – 1312, 2009. [8]G. J. Pottie and W. J. Kaiser. Wireless Integrated Network Sensors. Communications of the ACM, 43:51–58, May 2000.

Download ppt "An Efficient External Sorting Algorithm for Flash Memory Embedded Devices Tyler Cossentine - M.Sc. Thesis Defense."

Similar presentations

Ads by Google