SciDB Array Storage Mijung Kim 2/15/13
ArrayStore [Soroush et al. 2011] ArrayStore: A Storage Manager for Complex Parallel Array Processing, Emad Soroush, Magdalena Balazinska, and Daniel Wang. SIGMOD'11, June 12-16, 2011, Athens, Greece
Array slicing and dicing using a large chunk
Array slicing and dicing using small chunks
Array slicing and dicing using two-level chunks
Join on Misaligned Chunks
Join on Re-partitioned Chunks
Join on Misaligned Chunks
kNN
kNN with Overlap Chunk on Two-Level Storage Reduce I/O overhead!!
System catalog DB (Postgres) SciDB Storage System catalog DB (Postgres) Header Files Transaction Log Files Data Files (Segments) Storage Header Version Segment usage # Chunks Etc. Array Meta Data ID, name, dimensions, attributes, etc. …
SciDB Pipelined Array Processing A chunk is materialized into memory C11 A chunk at a time streamed into and out of operation Operation C11 C12 C23 …
Load Chunk Load In-memory chunk LRU-based In-memory cache Chunk … Swap chunk Temp file Query Lookup DB chunk Add chunk to cache Chunk Map Array ID Chunk Storage address … Chunk Header DiskPos SegmentNo Offset … Read chunk Segments
Store Array SourceChunkIterator C11 C12 C23 … Source Array Copy chunk Write chunk Segments C11 C12 C23 … RLEChunkIterator Empty Array (e.g., RLEBitmap)
Create Array CREATE ARRAY array_name <attributes> [dimensions] Dimension [name=start:end,chunk_size,chunk_overlap] E.g., CREATE ARRAY m4x4 <val1:double,val2:int32> [x=0:3,4,0,y=0:3,4,0];
CP ALS
CP ALS A C A B B C
CP ALS A C A B B C
Distributed CP ALS Worker 1 A C A B B C Worker 3 Worker 2