Presentation on theme: "Ethics of data representation v2.0. Collect Raw Data Process and Filter Data Clean Dataset Exploratory Analysis Generate Conclusion Generate Visualisation."— Presentation transcript:
Collect Raw Data Process and Filter Data Clean Dataset Exploratory Analysis Generate Conclusion Generate Visualisation Data Visualisation Process
What is Ethics when it comes to data visualisation? The figure/graph/image should show what is actually happening and not what you want to happen. Different ways of being unethical: – knowingly: deliberately showing the data in a misleading manner, choosing the ‘most representative’ image/experiment. – unknowingly: not exploring/getting to know the data well enough, misusing your chosen graphical representation.
Cheating knowingly: Choice of graph You know that what is going on Hypothesis (what you want to see): Applying a treatment will decrease the levels of a variable. Exp2 Exp1 Exp3 Exp4 You choose to plot your data like that
Cheating knowingly: Choice of axis/scale You know that what is going on You want to show an increase in salary in the last term. You choose to plot your data like that
Cheating knowingly: Choice of axis/scale Be careful with Linear vs. logarithmic scale.
Cheating knowingly: Choice of axis/scale If you want to cheat, a bar graph using a log axis is a great tool, as it lets you either exaggerate differences between groups or minimize them. Linear scale Logarithmic scale
Cheating knowingly: Choice of axis/scale Logarithmic axis should be used for: Lognormal data Logarithmically spaced values
OriginalBrightness and Contrast Adjusted Brightness and Contrast Adjusted Too Much: Oversaturation Cheating knowingly: Manipulating images: Western blot Presenting bands out of context ‘Playing’ too much with contrast ‘Rebuilding’ a Western blot from several cuts
Cheating unknowingly: Not exploring/getting to know the data well enough Hypothesis: increase from CondA to CondB. You run the experiment once and you choose to plot the data as a bar chart.
Cheating unknowingly: Not exploring/getting to know the data well enough p=0.04 p=0.32 p=0.001 Comparisons: Treatments vs. Control Exp3 Exp4 Exp1 Exp5 Exp2
Plot types – Distribution/Exploration Histograms Very good for exploring data. Better on big dataset. Rules: Number of intervals ≈√N and Interval width ≈ Range ÷√N Histograms are great but careful with the resolution (= number of bins) as it affects the shape of the distribution.
Be careful with the resolution … … and the type of data you are dealing with. Plot types – Distribution/Exploration Histograms Histograms are great but careful with discrete data.
Cutoff = Q1 – 1.5*IQR Median Maximum Interquartile Range (IQR): 50% of the data Lower Quartile (Q1) 25 th percentile (1 st quartile) Outlier Upper Quartile (Q3) 75 th percentile (3 rd quartile) Plot types – Distribution/Exploration Boxplots and Bean plots Minimum
Plot types – Distribution/Exploration Boxplots and Bean plots BimodalUniformNormal Distributions A bean= a ‘batch’ of data Data density mirrored by the shape of the polygon Scatterplot shows individual data Very good for exploring data. Better on medium size dataset. Boxplots are great but be careful with underlying distribution.
Plot types – Exploration/Comparison Stripcharts/Scatterplots Very good for exploring data. Better on small/medium dataset. Very informative: exploration AND comparison. Very hard to cheat with these. Stripcharts are great but they don’t work so well with big samples.
Plot types – Comparisons Barcharts Standard deviation Standard error Confidence interval Star wars (cool graph!)
Plot types – Comparisons Barcharts Be careful with the scale when plotting ratio Very good for presenting results and emphasizing differences. Effectiveness: most important info with the most effective channel. Barcharts are great but after data exploration and the y-axis needs to be chosen wisely.
Plot types – Relationship/Comparison Line graphs Except for exploration … 5 experiments Very good for presenting results of matched/paired/repeated data. Linecharts are great but careful with the axes.
Plot types – Relationships Scatterplot Very good for understanding the relationship between quantitative variables.
Plot types – Relationships Scatterplots Solution: smoothed densities colour representation Scatterplots are great but big data can be tricky.
Plot types – Relationships Heatmaps Great for big data sets, allow to plot a third quantitative value: colour scheme for grouping. Euclidean distance Correlation Colour scheme Heatmaps are great but plot data that are changing.
A heatmap is basically a table that has colors in place of numbers. Simon’s data from simple numbers to correlation
Plot types – Composition Stack charts/Pie charts Stack /pie charts are great but keep an eye on the sample size.