Science Visualization

Science Visualization
Taxonomy

Terminology Scientific Visualization Data Visualization
Field in computer science that encompasses user interface, data representation and processing algorithms, visual representations, and other sensory presentation such as sound or touch (McCormick, 1987) Data Visualization More general than scientific visualization, since it implies treatment of data sources beyond the sciences and engineering, e.g., financial, marketing, numerical data generally Includes application of statistical methods and other standard data analysis techniques (Rosenblum, 1994) Information Visualization Concerned typically with more abstract, often semantic, information, e.g., hypertext documents, WWW, text documents From Shneiderman: ~ “use of interactive visual representations of abstract data to amplify cognition” (Ware, 2008; Card et al., 1999) SciVis page 2

Visualization Pipeline
“Mapping Data to Visual Form” Visualization stages Data transformations: Map raw data (idiosynchratic form) into data tables (relational descriptions including metatags) Raw Information Visual Form Dataset Views User - Task Data Transformations Mappings View F F -1 Interaction Perception SciVis page 3

“Mapping Data to Visual Form” Visualization stages Visual Mappings: Transform data tables into visual structures that combine spatial substrates, marks, and graphical properties. It is starting point for visualization design. Raw Information Visual Form Dataset Views User - Task Data Transformations Mappings View F F -1 Interaction Perception SciVis page 4

“Mapping Data to Visual Form” Visualization stages View Transformations: Create views of the Visual Structures by specifying graphical parameters such as position, scaling, and clipping Raw Information Visual Form Dataset Views User - Task Data Transformations Mappings View F F -1 Interaction Perception SciVis page 5

Representing Data Graphically
Data visualization comes from the convergence of computer science, statistics and design. The goal is to make data presentation interesting, aesthetically pleasing and hopefully informative. It is a marriage between science and art, between the left and right halves of the brain. Good data visualization goes further by revealing relationships in the data that might otherwise have gone unnoticed. With the absence of hypothesis tests it is easy to discount visualization as unscientific, but that would be a mistake. SciVis page 6

Representing Data Graphically
There are many uses of data visualization, and the reality is hypothesis testing can bore the audience, if not completely surpass their level of understanding. Data visualization then is a means to an end for statisticians who want to be better communicators. And it’s a pathway to a better understanding of the data for the designers amongst us. SciVis page 7

Too Much Data The problem we now face is filtering the valuable data from the invaluable data and determining how we use it to inform business decisions or research. The main questions are: What type of media will you use to make your presentation? How long does your audience have to take in the data? The longer the audience has the more data dense the visualization can and should be. The less time and autonomy your audience has to peruse the data the more simplified the visualization should be. The internet has led to an explosion in the amount of data we have collected, stored and easily accessible. SciVis page 8

Selecting the Right Graph
Selecting the appropriate display can be difficult because it involves a good understanding of the nature of your data, statistics, as well as a good understanding of design principles. There are many possibilities for a given variable or dataset, but you need a place to start. The challenge is that for any given data set the number of visual encodings – and thus the space of possible visualization designs – is extremely large. One must determine which questions to ask, identify the appropriate data, and select effective visual encodings to map data values to graphical features such as position, size, shape, and color. SciVis page 9

Visualization Taxonomy
Many types of taxonomy Jeffrey Heer, Michael Bostock, and Vadim Ogievetsky (Stanford University) "A Tour Through the Visualization Zoo” Christian Behrens' A list of Design Pattern categories Edward Lee "A Taxonomy of Data Visualization" SciVis page 10

Many types of taxonomy Jeffrey Heer, Michael Bostock, and Vadim Ogievetsky (Stanford University) "A Tour Through the Visualization Zoo” Christian Behrens' A list of Design Pattern categories Edward Lee "A Taxonomy of Data Visualization" SciVis page 11

Commonsense taxonomy: Chart Map Network Time series Hierarchy Flow Matrix Infographic SciVis page 12

Chart It is a graphical representation of data, in which "the data is represented by symbols, such as bars in a bar chart, lines in a line chart, or slices in a pie chart". It can represent tabular numeric data, functions or some kinds of qualitative structure and provides different info. They are often used to ease understanding of large quantities of data and the relationships between parts of the data. They can usually be read more quickly than the raw data that they are produced from. The term "chart" as a graphical representation of data has multiple meanings: A data chart is a type of diagram or graph, that organizes and represents a set of numerical or qualitative data. Maps that are adorned with extra information for some specific purpose are often known as charts, such as a nautical chart or aeronautical chart. Other domain specific constructs are sometimes called charts, such as the chord chart in music notation or a record chart for album popularity. They are used in a wide variety of fields, and can be created by hand (often on graph paper) or by computer using a charting application. Certain types of charts are more useful for presenting a given data set than others. SciVis page 13

Chart Main features: A large variety of forms
There are common features that provide the chart with its ability to extract meaning from data. Text is generally used only to annotate the data. Title It usually appears above the main graphic and provides a succinct description of what the data in the graph refers to. Axes and their scales Dimensions in the data are often displayed on axes. If a horizontal and a vertical axis are used, they are usually referred to as the x-axis and y-axis respectively. Each axis will have a scale, denoted by periodic graduations and usually accompanied by numerical or categorical indications. Each axis will typically also have a label displayed outside or beside it, briefly describing the dimension represented. If the scale is numerical, the label will often be suffixed with the unit of that scale in parentheses. SciVis page 14

Chart Main features: Grid Textual labels Legend
Within the graph a grid of lines may appear to aid in the visual alignment of data. Textual labels The data of a chart can appear in all manner of formats, and may include individual textual labels describing the datum associated with the indicated position in the chart. Legend A legend contains a list of the variables appearing in the chart and an example of their appearance. This information allows the data from each variable to be identified in the chart. The grid can be enhanced by visually emphasizing the lines at regular or significant graduations. The emphasized lines are then called major grid lines and the remainder are minor grid lines. When the data appearing in a chart contains multiple variables, the chart may include a legend. SciVis page 15

Charts Types Common types Common plots Histograms Bar chart Pie chart
Line chart Timeline chart Organizational chart Tree chart Flow chart Area chart Common plots Box plot Dot plot (statistics) Probability plot Scatterplot Biplot

Histograms It is a graphical representation of the distribution of data. It is an estimate of the probability distribution of a continuous variable and was first introduced by Karl Pearson. A histogram is a representation of tabulated frequencies, shown as adjacent rectangles, erected over discrete intervals (bins), with an area equal to the frequency of the observations in the interval. The height of a rectangle is also equal to the frequency density of the interval, i.e., the frequency divided by the width of the interval. The total area of the histogram is equal to the number of data. The total area of a histogram used for probability density is always normalized to 1. Histograms are used to plot the density of data, and often for density estimation: estimating the probability density function of the underlying variable.

Histograms The rectangles of a histogram are drawn so that they touch each other to indicate that the original variable is continuous. If the length of the intervals on the x-axis are all 1, then a histogram is identical to a relative frequency plot. A histogram may also be normalized displaying relative frequencies. It then shows the proportion of cases that fall into each of several categories, with the total area equaling. The categories are usually specified as consecutive, non- overlapping intervals of a variable. The categories (intervals) must be adjacent, and often are chosen to be of the same size.

Bar chart A bar graph is a chart that uses either horizontal or vertical bars to show comparisons among categories. One axis of the chart shows the specific categories being compared, and the other axis represents a discrete value.

Bar chart Some bar graphs present bars clustered in groups – they can be used for more complex comparisons of data: Grouped bar graphs They show more than one group: for each categorical group there are two or more bars (these bars are color-coded to represent a particular grouping) They are preferable where groups have negative values Stacked bar graphs

Bar chart Some bar graphs present bars clustered in groups – they can be used for more complex comparisons of data: Grouped bar graphs Stacked bar graphs They show the bars divided into subparts to show cumulate effect. They stacks bars that represent different groups on top of each other - the height of the resulting bar shows the combined result of the groups. They aren’t suited to show data where some groups have negative values.

Pie chart A pie chart is a circular chart divided into sectors, illustrating numerical proportion. The arc length of each sector (and consequently its central angle and area), is proportional to the quantity it represents. While it is named for its resemblance to a pie which has been sliced, there are variations on the way it can be presented. The earliest known pie chart is generally credited to William Playfair's Statistical Breviary of 1801. The problem: It is difficult to compare different sections of a given pie chart, or to compare data across different pie charts. Pie charts are very widely used in the business world and the mass media. But many experts recommend avoiding them, pointing out that research has shown. Pie charts can be replaced in most cases by other plots such as the bar chart.

Pie chart Exploded pie chart Multilevel pie chart
A chart with one or more sectors separated from the rest of the disk is known as an exploded pie chart: This effect is used to either highlight a sector, or to highlight smaller segments of the chart with small proportions. Multilevel pie chart It is used to visualize hierarchical data, depicted by concentric circles. The circle in the center represents the root node, with the hierarchy moving outward from the center. A segment of the inner circle bears a hierarchical relationship to those segments of the outer circle which lie within the angular sweep of the parent segment. The use of superfluous dimensions not used to display the data of interest is discouraged for charts in general, not only for pie charts.

Pie chart Spie chart 3D pie chart
A useful variant of the polar area chart is the spie chart designed by Feitelson. This superimposes a normal pie chart with a modified polar area chart to permit the comparison of a set of data at two different states. 3D pie chart It is used to give the chart a 3D look. Often used for aesthetic reasons, the third dimension does not improve the reading of the data; Bur these plots are difficult to interpret because of the distorted effect of perspective associated with the third dimension. Spie chart: The base pie chart represents the first state in the usual way, with different slice sizes. The second state is represented by the superimposed polar area chart, using the same angles as the base, and adjusting the radii to fit the data. This is useful, among other things, for visualizing hazards to different population groups. 3D pie chart: The use of superfluous dimensions not used to display the data of interest is discouraged for charts in general, not only for pie charts.

Line chart It is a type of chart which displays information as a series of data points connected by straight line segments. Its other name is line graph. It is a basic type of chart common in many fields. It is typically drawn bordered by two perpendicular lines, called axes: Typically the y-axis represents the dependent variable and the x-axis represents the independent variable. The measurement points are ordered and joined with straight line segments. It is often used to visualize a trend in data over intervals of time (a time series) thus the line is often drawn chronologically. It is similar to a scatter plot except that

Timelines A timeline is a way of displaying a list of events in chronological order, sometimes described as a project artifact. It is typically a graphic design showing a long bar labelled with dates alongside itself and (usually) events labelled on points where they would have happened.

Organizational chart It is a diagram that shows the structure of an organization and the relationships and relative ranks of its parts and positions/jobs. The term is also used for similar diagrams, for example ones showing the different elements of a field of knowledge or a group of languages. When an it grows too large it can be split into smaller charts for separate subjects (i.e. departments within the organization). The different types of organization charts include: Hierarchical Matrix Flat/Horizontal A company's organizational chart typically illustrates relations between people within an organization. Such relations might include managers to sub-workers, directors to managing directors, chief executive officer to various departments, and so forth.

Organizational chart Limitations
If updated manually, organizational charts can very quickly become out-of-date, especially in large organizations that change their staff regularly. They only show "formal relationships" and tell nothing of the pattern of human (social) relationships which develop. They also often do not show horizontal relationships. They provide little information about the managerial style adopted (e.g. "autocratic", "democratic" or an intermediate style) In some cases, it may be more appropriate, particularly if one wants to show non-linear, non-hierarchical relationships in an organization. A company's organizational chart typically illustrates relations between people within an organization. Such relations might include managers to sub-workers, directors to managing directors, chief executive officer to various departments, and so forth.

Tree chart A tree structure is a way of representing the hierarchical nature of a structure in a graphical form. The lines connecting elements are called "branches". The tree elements are called "nodes". Nodes without children are called leaf nodes, "end-nodes", or "leaves". The root is the starting node. Every finite tree structure has a member that has no superior. But the converse is not true: infinite tree structures may or may not have a root node. A tree structure is conceptual, and appears in several forms. It is named a "tree structure" because the classic representation resembles a tree, even though the chart is generally upside down compared to an actual tree, with the "root" at the top and the "leaves" at the bottom.

Tree chart Representing trees Classical node-link diagrams Nested sets
Layered diagrams Radial trees

Area chart It displays graphically quantitive data.
It is based on the line chart. The area between axis and line are commonly emphasized with colors, textures and hatchings. Commonly one compares with an area chart two or more quantities.

Open-high-low-close chart
OHLC chart is a type of chart typically used to illustrate movements in the price of a financial instrument over time. Each vertical line on the chart shows the price range (the highest and lowest prices) over one unit of time, e.g., one day or one hour. Tick marks project from each side of the line indicating the opening price (e.g., for a daily bar chart this would be the starting price for that day) on the left, and the closing price for that time period on the right. The bars may be shown in different hues depending on whether prices rose or fell in that period.

Box plot It is a convenient way of graphically depicting groups of numerical data through their quartiles. Box plots also have lines extending vertically from the boxes (whiskers) indicating variability outside the upper and lower quartiles, hence the terms box-and-whisker plot and box-and- whisker diagram. Outliers may be plotted as individual points. The boxplot is a quick way of examining one or more sets of data graphically. They are particularly useful for comparing distributions between several groups or sets of data

Box plot Types

Dot plot It is a statistical chart consisting of data points plotted on a fairly simple scale, typically using filled in circles. It is a representation of a distribution and it consists of group of data points plotted on a simple scale. They are used for continuous, quantitative, univariate data. Data points may be labelled if there are few of them. There are two common, yet very different, versions of the dot chart. The first is described by Leland Wilkinson as a graph that has been used in hand-drawn (pre-computer era) graphs to depict distributions. The other version is described by William Cleveland as an alternative to the bar chart, in which dots are used to depict the quantitative values (e.g. counts) associated with categorical variables.

Scatter plot It is a type of mathematical diagram using Cartesian coordinates to display values for two variables for a set of data. The data is displayed as a collection of points, each having the value of one variable determining the position on the horizontal axis and the value of the other variable determining the position on the vertical axis. This kind of plot is also called a scatter chart, scattergram, scatter diagram, or scatter graph. A scatter plot is used when a variable exists that is below the control of the experimenter. A scatter plot can suggest various kinds of correlations between variables with a certain confidence interval. It shows nonlinear relationships between variables. This is one of the most powerful aspects of a scatter plot.

Biplots They are a type of exploratory graph used in statistics, a generalization of the simple two-variable scatterplot. It allows information on both samples and variables of a data matrix to be displayed graphically. Samples are displayed as points while variables are displayed either as vectors, linear axes or nonlinear trajectories. In the case of categorical variables, category level points may be used to represent the levels of a categorical variable. A generalized biplot displays information on both continuous and categorical variables.

Probability plot It is a graphical technique for comparing:
two data sets, either two sets of empirical observations, one empirical set against a theoretical set, or (more rarely) two theoretical sets against each other. It commonly means one of: P–P plot; Q–Q plot: it is more commonly used; Normal probability plot.

Probability plot P–P plot
"Probability-Probability" or "Percent-Percent" plot It is a probability plot for assessing how closely two data sets agree, which plots the two cumulative distribution functions against each other.

Probability plot Q–Q plot "Quantile-Quantile" plot
It is a probability plot, which is a graphical method for comparing two probability distributions by plotting their quantiles against each other. It is used to compare the shapes of distributions, providing a graphical view of how properties are similar or different in the two distributions. Normal probability plot, a Q–Q plot against the standard normal distribution.

Maps While maps may seem a natural way to visualize geographical data, they have a long and rich history of design. Many maps are based upon a cartographic projection: a mathematical function that maps the 3D geometry of the Earth to a 2D image. Other maps knowingly distort or abstract geographic features to tell a richer story or highlight specific data. Types Flow Maps Choropleth Maps Graduated Symbol Maps Cartograms SciVis page 41

Flow Maps By placing stroked lines on top of a geographic map, a flow map can depict the movement of a quantity in space and (implicitly) in time. Flow lines typically encode a large amount of multivariate information: path points, direction, line thickness, and color can all be used to present dimensions of information to the viewer. Many of the greatest flow maps also involve subtle uses of distortion, as geography is bended to accommodate or highlight flows. SciVis page 42

Choropleth Maps Data is often collected and aggregated by geographical areas such as states. A standard approach to communicating this data is to use a color encoding of the geographic area, resulting in a choropleth map. Though this is a widely used visualization technique, some care is required. One common error is to encode raw data values (such as population) rather than using normalized values to produce a density map. Another issue is that one's perception of the shaded value can also be affected by the underlying area of the geographic region. SciVis page 43

Graduated Symbol Maps An alternative to choropleth maps is the graduated symbol map, which instead places symbols over an underlying map. This approach avoids confounding geographic area with data values and allows for more dimensions to be visualized (e.g., symbol size, shape, and color). In addition to simple shapes like circles, graduated symbol maps may use more complicated glyphs such as pie charts.. SciVis page 44

Cartogram It is a map in which some thematic mapping variable is substituted for land area or distance. The geometry or space of the map is distorted in order to convey the information of this alternate variable. Cartograms have a fairly long history, with examples from the mid-1800s. Examples: GNP, population, etc. GNP - Gross National Product

Cartograms There are two main types of cartograms: Area cartograms
Distance cartograms Dorling cartogram GNP - Gross National Product

Time Series Time-series (sets of values changing over time) are one of the most common forms of recorded data. One often needs to compare a large number of time-series simultaneously and can choose from a number of visualizations to do so. Time-varying phenomena are central to many domains: Finance (stock prices, exchange rates) Science (temperatures, pollution levels, electric potentials) Public policy (crime rates) etc. SciVis page 47

Index Charts With some forms of time-series data, raw values are less important than relative changes. Consider investors, who are more interested in a stock's growth rate than its specific price. Multiple stocks may have dramatically different baseline prices, but be meaningfully compared when normalized. An index chart is an interactive line chart that shows percentage changes for a collection of time-series based on a selected index point. SciVis page 48

Candlestick chart A candlestick chart is a style of bar-chart used primarily to describe price movements of a security, derivative, or currency over time. It is a combination of a line-chart and a bar-chart, in that each bar represents the range of price movement over a given time interval. It is most often used in technical analysis of equity and currency price patterns. They appear superficially similar to box plots, but are unrelated.

Candlestick chart Candlesticks are usually composed of the body (black or white), and an upper and a lower shadow (wick): The area between the open and the close is called the real body, price excursions above and below the real body are called shadows. The wick illustrates the highest and lowest traded prices of a security during the time interval represented. The body illustrates the opening and closing trades. If the security closed higher than it opened, the body is white or unfilled, with the opening price at the bottom of the body and the closing price at the top. If the security closed lower than it opened, the body is black, with the opening price at the top and the closing price at the bottom. A candlestick need not have either a body or a wick.

Kagi chart It is a chart used for tracking price movements and to make decisions on purchasing stock. It was originally developed in Japan during the 1870s when the Japanese stock market started trading. It differs from traditional stock charts, such as the Candlestick chart by being mostly independent of time. This feature aids in producing a chart that reduces random noise. Due to its effectiveness in showing a clear path of price movements, it is one of the various charts that investors use to make better decisions about stocks. The most important benefit of this chart is that it is independent of time and change of direction occurs only when a specific amount is reached.

Sparkline It is a very small line chart, typically drawn without axes or coordinates. It presents the general shape of the variation (typically over time) in some measurement, such as temperature or stock market price, in a simple and highly condensed way. They are small enough to be embedded in text, or several sparklines may be grouped together as elements of a small multiple. Whereas the typical chart is designed to show as much data as possible, and is set off from the flow of text, sparklines are intended to be succinct, memorable, and located where they are discussed.

Networks One aspect of data that we may wish to explore through visualization is relationship. Mathematicians use the formal term "graph" to describe a network. A central challenge in graph visualization is computing an effective layout. Layout techniques typically seek to position closely-related nodes (in terms of graph distance, such as the number of links between nodes, or other metrics) close in the drawing; critically, unrelated nodes must also be placed far enough apart to differentiate relationships. Some techniques may seek to optimize other visual features, for example by minimizing the number of edge crossings. SciVis page 53

Force-Directed Layout
A common and intuitive approach to network layout is to model the graph as a physical system: nodes are charged particles that repel each other, while links are dampened springs that pull related nodes together. A physical simulation of these forces then determines the node positions; approximation techniques that avoid computing all pairwise forces enable the layout of large numbers of nodes. In addition, interactivity allows the user to direct the layout and jiggle nodes to disambiguate links. These force-directed layouts are a good starting point for understanding the structure of a general undirected graph. Example: Node colors depict cluster memberships computed by a community detection algorithm. SciVis page 54

Arc Diagrams An arc diagram uses a one-dimensional layout of nodes, with circular arcs to represent links. While arc diagrams may not convey the overall structure of the graph as effectively as a two-dimensional layout, with a good ordering of nodes it is easy to identify cliques and bridges. And, as with the indented tree layout, multivariate data can easily be displayed alongside nodes. The problem of sorting the nodes in a manner that reveals underlying cluster structure is formally called seriation, and has diverse applications in visualization, statistics, and even archaeology! SciVis page 55

Questions ?

Science Visualization

Similar presentations

Presentation on theme: "Science Visualization"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Science Visualization

Similar presentations

Presentation on theme: "Science Visualization"— Presentation transcript:

Similar presentations

About project

Feedback