Presentation is loading. Please wait.

Presentation is loading. Please wait.

Large Scale Data Representation Erik Goodman Daniel Kapellusch Brennen Meland Hyunjae Park Michael Rogers.

Similar presentations


Presentation on theme: "Large Scale Data Representation Erik Goodman Daniel Kapellusch Brennen Meland Hyunjae Park Michael Rogers."— Presentation transcript:

1 Large Scale Data Representation Erik Goodman Daniel Kapellusch Brennen Meland Hyunjae Park Michael Rogers

2 What is Data Representation? ● Hardware ● Communication ● Data Generation Process ● Input-Output ● Data Sparsity ● Noise Many considerations go into the choice of a particular data representation:

3 The Taxonomy of Data Representation Basic Data Structures ● Hash Tables, Inverted Indices, Tables/Relations, etc. Mathematical Structures ● Sets, Vectors, Matrices, Graphs, Metric Spaces Derived Mathematical Structures ● Clusters, Linear Projections, Data Samples, etc.

4 Design Goals On the data acquisition side: ● Construct a structure that is sufficiently close to the data. On the data analysis side: ● Construct a structure that has both a flexible description and is tractable algorithmically.

5 Challenge of Architecture and Algorithms ● Smaller scale data analysis techniques are not optimized to work with very large scale data. ● Critical step moving forward is determining the best way to represent the data. Challenge of Combining Algorithmic and Statistical Perspectives ● Incompatibility with researchers and data use. ● Correlation does not equal causation.

6 Challenge of Primitives ● Provide a framework for a broad scope of computations. ● Allow programming for a reasonable amount of abstraction. ● Applicable for a large range of platforms. Challenge of Manipulation and Integration of Heterogeneous Data ● The difficulty of merging different datasets into a common representation. ● Tabular data vs free-form text-based data and images. ● It can be difficult to create a meaningful data visualization when dealing with many variables.

7 Challenge of Heavy-Tailed and High-Variance Data ● Big data naturally leads to big and heavy tails at the ends of the table especially for social and information networks. ● Extracting valuable data from amongst background noise. ● Heavy tail is often where new scientific phenomena manifest themselves. http://math.stackexchange.com/questions/754972/probability-distribution-morphing-from-gaussian-to-heavy-tail


Download ppt "Large Scale Data Representation Erik Goodman Daniel Kapellusch Brennen Meland Hyunjae Park Michael Rogers."

Similar presentations


Ads by Google