Presentation is loading. Please wait.

Presentation is loading. Please wait.

LECTURE 03: DATA COLLECTION AND MODELS February 4, 2015 COMP 150-04 Topics in Visual Analytics Note: slide deck adapted from R. Chang, Fall 2010.

Similar presentations


Presentation on theme: "LECTURE 03: DATA COLLECTION AND MODELS February 4, 2015 COMP 150-04 Topics in Visual Analytics Note: slide deck adapted from R. Chang, Fall 2010."— Presentation transcript:

1 LECTURE 03: DATA COLLECTION AND MODELS February 4, 2015 COMP 150-04 Topics in Visual Analytics Note: slide deck adapted from R. Chang, Fall 2010

2 Announcements Course location has moved: Halligan 102 Assignment 1 posted on course website If you haven’t yet installed RStudio: http://www.rstudio.com/products/rstudio/download/ To download the materials for today’s demo: http://www.cs.tufts.edu/comp/150VAN/demos/Stats-with-R.Rmd

3 Outline Reminder: post on “Illuminating the Path” Recap: Keim’s VA Model Data Foundations - Basic Data Types - Dimensionality Metadata: “data about data” Structure vs. Value - Value - Derived Value - Derived Structure - Structure

4 Reminder: thoughts on “Illuminating the Path” What did you think? Who is the intended audience? (…is it us?) Do the goals make sense to you? Is anything missing? From what you see in the world, how far along this agenda have we come since 2006?

5 Recap: Keim’s Visual Analytics Model input Pre-process interactions Image source: Keim, Daniel, et al. Visual analytics: Definition, process, and challenges. Springer Berlin Heidelberg, 2008. Data types Dimensionality Metadata Structure vs. Value Statistical Models in R

6 Data: a definition A typical dataset in visualization consists of n records: (r 1, r 2, r 3, …, r n ) Each record r i consists of (m >=1) observations or variables: (v 1, v 2, v 3, …, v m ) A variable may be either independent or dependent: - An independent variable (iv) is not controlled or affected by another variable (e.g., time in a time-series dataset) - A dependent variable (dv) is affected by a variation in one or more associated independent variables (e.g., temperature in a region) Formal definition: - r i = (iv 1, iv 2, iv 3, …, iv m i, dv 1, dv 2, dv 3, …, dv m d ) - where m = m i + m d

7 Basic Data Types Nominal Ordinal Scale / Quantitative Ratio Interval An unordered set of non-numeric values Examples: Categorical (finite) data -{apple, orange, pear} -{red, green, blue} Arbitrary (infinite) data -{“12 Main St. Boston MA”, “45 Wall St. New York NY”, …} -{“John Smith”, “Jane Doe”, …}

8 Basic Data Types Nominal Ordinal Scale / Quantitative Ratio Interval An ordered set (also known as a tuple) Examples: Numeric: Binary: Non-numeric:

9 Basic Data Types Nominal Ordinal Scale / Quantitative Ratio Interval A numeric range Ratios -Distance from “absolute zero” -Can be compared mathematically using division -For example: height, weight Intervals -Ordered numeric elements that can be mathematically manipulated, but cannot be compared as ratios -E.g.: date, current time

10 Basic Data Types (Formal) Nominal (N){…} Ordinal (O) Scale / Quantitative (Q)[…] Q → O [0, 100] → O → N → {C, B, F, D, A} N → O (??) {John, Mike, Bob} → {red, green, blue} → ?? O → Q (??) Hashing? Bob + John = ?? Readings in Information Visualization: Using Vision To Think. Card, Mackinglay, Schneiderman, 1999

11 Operations on Basic Data Types What are the operations that we can perform on these data types? Nominal (N) = and ≠ Ordinal (O) >, <, ≥, ≤ Scale / Quantitative (Q) everything else (+, -, *, /, etc.) Consider a distance function

12 Dimensionality Scalar: a single value (0D array) Vector: collection of scalars (1D array) Matrix: a collection of vectors (2D array) Tensor: a collection of matrices (3+D array) Think of a cube:

13 Operations on Multidimensional Data Slice Selects a subset of the original nD cube Result set could be of any dimensionality Roll up (consolidate) Creates a hierarchy based on the data Same as clustering Drill down Expand a cluster Pivot Changes the orientation of the cube Combine with the 4 basic SQL commands: SELECT, UPDATE, INSERT, DELETE Adapted from Wikipedia: OLAP Cube

14 Examples – Roll up and Drill down

15 Metadata Defined as “data about data” Introduced by Lisa Tweetie in CHI 1997 (“Characterizing Interactive Externalizations) Extends the original concept by Bertin of data values and data structures. Values (low-level): variables relevant to a problem Structures (high level): relations that characterize the data as a whole (e.g. links, equations, constraints)

16 Metadata – 4 Relationships 1. Values → Derived Values 2. Values → Derived Structure 3. Structure → Derived Values 4. Structure → Derived Structure Derived Values Example: average Derived Structure Example: sorting a list of variables

17 Values → Derived Values → Derived Structure Values: a (text) document corpus Derived values: compute the similarities between the documents Derived Structure: apply multi- dimensional scaling to plot the documents in a spatial view.

18 Values → Derived Values → Derived Structure IN-SPIRE by PNNL

19 Structure → Derived Structure → Derived Values Structure: a tabular layout of individuals’ relationships with each other Derived Structure: convert the tabular structure to a graph Derived Values: compute centrality to identify the importance of the individual in this social network

20 Structure → Derived Structure → Derived Values Image taken from: http://beth.typepad.com/beths_blog/2009/12

21 Questions / Comments?

22 Guest speaker Maja Milosavljevic “Statistical Analysis with R”

23 For next week Assignment 1 due before class on Monday Wednesday: Several VIPs coming in to pitch datasets for final projects Start thinking about a topic you might like to explore! Need help? Talk to Jordan


Download ppt "LECTURE 03: DATA COLLECTION AND MODELS February 4, 2015 COMP 150-04 Topics in Visual Analytics Note: slide deck adapted from R. Chang, Fall 2010."

Similar presentations


Ads by Google