Download presentation
Presentation is loading. Please wait.
Published byKarina Pearce Modified over 9 years ago
1
Variant View Visualizing Sequence Variants in their Gene Context Joel A. Ferstay 1*, Cydney B. Nielsen 2, Tamara Munzner 1 University of British Columbia 1, now at AeroInfo Systems/Boeing Canada* BC Cancer Agency 2 1
2
Design study Real users, real tasks, real data Find out what users are doing Support with visualization tool Reflect and present guidelines Analys t Tool 2
3
Design Study The Design Process 3
4
Collaborated with analysts at the Genome Sciences Centre Study genetic basis of leukemia Needed help interpreting their data Major problems: –What do we show? –How do we show it? 4
5
Design cycle Intervie w 5
6
Design cycle Intervie w Data and Tasks 6
7
Design cycle Intervie w Data and Tasks Create Data Sketch 7
8
Design cycle Intervie w Data and Tasks Create Data Sketch Present Data Sketch 8
9
Design cycle Intervie w Data and Tasks Create Data Sketch Present Data Sketch REPE AT 9
10
Data sketches Alternative to paper prototyping Load and show real data Beneficial when the data is complex 10 [Lloyd and Dykes, InfoVis 2011]
11
Can identify features of interest in data 11 Emphasize some features over others
12
Can identify design dead ends early 12
13
Methods and durations Semi-structured interviews –7 months –Once per week –One hour in duration Presented data sketches –8 deployed over 5 months –Implemented with D3 toolkit [Bostock et al., InfoVis 2011 ] 13 Design study methodology [Sedlmair et al., 2012]
14
Problem characterization: Data 14
15
The data Data are sequence variants –Difference between reference genome and a given genome 15
16
The data Data are sequence variants –Difference between reference genome and a given genome 16 Reference Genome DNA:ATA TGA TCA ACA CTT Sample 1 Genome DNA:ATA TGG TCA ATA CTT Sample 2 Genome DNA:ATA TGA TGA ACA CCT
17
The data Data are sequence variants –Difference between reference genome and a given genome 17 Reference Genome DNA:ATA TGA TCA ACA CTT Sample 1 Genome DNA:ATA TGG TCA ATA CTT Sample 2 Genome DNA:ATA TGA TGA ACA CCT Harmful ? Harmles s?
18
Multi-scale data 18 DNA of the organism
19
Multi-scale data 19 Genes are functional units
20
Multi-scale data 20 Exons code for proteins
21
Multi-scale data 21 Regions within proteins perform activities
22
Related work: Genome scale 22
23
Ensembl genome browser 23 Explore genome and variant data [Chen et al., BMC Genomics 2010]
24
Genome scale shown at the top 24
25
User data is stacked in horizontal tracks 25
26
User data is stacked in horizontal tracks 26 Very flexible framework
27
Problem with the genome scale 27 Features of interest become small
28
Problem with the genome scale 28 Features of interest become small Varian t
29
Problem with the genome scale 29 Features of interest become small Varian t Exo n
30
Analysts must pan and zoom 30
31
Analysts must pan and zoom 31 Heavy interaction costs with zooming
32
Analysts must pan and zoom 32 Heavy interaction costs with zooming Varian t Must know where to look
33
Raw variant data (What they looked at before) 33
34
Raw data variant-centric 34 Tabular data format
35
Raw data variant-centric 35 Tabular data format Variant row 1
36
Raw data variant-centric 36 Tabular data format Variant row 2
37
Raw data variant-centric 37 Tabular data format Variant row 2 Difficult to reason about variants without their biological context
38
Multiple biological levels/scales What to show? 38
39
Many biological levels and scales 39
40
Only some levels and scales are beneficial for variant analysis 40
41
Filter out whole genome; keep genes 41
42
Filter out non-exon regions 42
43
Left with a filtered scope 43
44
Related work: Filtered scope 44
45
The Ensembl Variation Image 45 [Chen et al., BMC Genomics 2010]
46
First filtering step: per-gene view 46 [Chen et al., BMC Genomics 2010] One gene shown
47
Second filtering step: partial removal inter-exon regions 47 [Chen et al., BMC Genomics 2010] Partial removal
48
Problem: Extends multiple screens 48 [Chen et al., BMC Genomics 2010]
49
Problem: Extends multiple screens 49 [Chen et al., BMC Genomics 2010] 1 st Screen
50
Problem: Extends multiple screens 50 [Chen et al., BMC Genomics 2010] 2 nd Screen
51
Problem: Features of interest small 51 [Chen et al., BMC Genomics 2010] Exon regions small Color coding difficult to see
52
Our Goal: Show attributes necessary for variant analysis 52
53
Use information-dense visual encoding 53
54
Use information-dense visual encoding 54
55
Use information-dense visual encoding 55
56
Use information-dense visual encoding 56
57
Use information-dense visual encoding 57
58
Use information-dense visual encoding 58
59
Use information-dense visual encoding 59
60
The tool: Variant View 60
61
Variant View 61
62
Variant View 62 Information-dense single gene view
63
Variant View 63 Information-dense single gene view No need for pan and zoom
64
Variant View 64 Sorting metrics guide gene navigation
65
Variant View 65 Sorting metrics guide gene navigation Control what shows up here
66
Variant View 66 Peripheral supporting data
67
Related work: Targeted for variant analysis 67
68
MuSiC variant visualization plot 68 [Dees et al., Genome Research 2012]
69
Side-by-side comparison 69
70
Side-by-side comparison 70 Protein regions can overlap Regions get separate lanes
71
Side-by-side comparison 71 Large bloom of repeated elements: more salient Many collocated variants
72
Driving biological tasks 72
73
Task 1: Discover genes Tool originally designed to discover genes with harmful variants –Sorting metrics guide single gene navigation –Uncover new genes affected by variants in leukemia Want to see if Variant View exposed known genes in leukemia 73
74
74
75
Sorting by derived metric revealed known leukemia genes 75
76
Highly scored gene by sorting metric 76
77
Visual inspection reveals collocation of variants 77
78
Several functional protein regions affected 78
79
Highly scored by metric and not known 79
80
Protein chemical class change evident 80
81
In contrast, low scoring gene 81
82
No collocation of variants 82
83
Mostly unaffected protein regions 83
84
Variant tasks Started with the main task of discover gene Shared tool with analysts Identified two more tasks! –Patient compare –Debug pipeline 84
85
Task 2: Patient compare Clinical setting application Compare patient data to known harmful variants The challenge –Similarity is loosely understood rather than fully characterized –What constitutes a match requires visual inspection 85
86
Adapted Variant View with minimal changes 86
87
Navigate through patient data with list 87
88
Patient data emphasized with arrows 88
89
Patient has same harmful L to P mutation 89
90
These variants probably do not match 90
91
Task 3: Debug pipeline Debug data generation pipeline –Remove errors from data before analysis takes place Analysts originally did not think they needed this support 91
92
Tool revealed errors in the data 92 The tool exposed artifacts in the data that slid past at least two rounds of quality metric filtering … this type of problem would not have been caught by our previous, automated methods. - Analyst 3
93
Conclusions 93 Designed, implemented, and deployed tool for visual variant impact assessment Originally designed for Discover Genes task –Adapted to two others with minimal changes Methods –What to show Filtering data scope –How to show it Carefully selected visual encodings Goals –Navigation-free main overview at gene level –Reveal genes of interest; accomplished by sorting by new, derived metrics
94
Acknowledgements 94 Our collaborators at the GSC –Dr Aly Karsan –Rod Docking –Dr Linda Chang –Dr Gerben Duns –Simon Chang Funding –VIVA, AeroInfo Systems/Boeing Canada, MITACS
95
Questions? 95 Joel Ferstay: joel.ferstay@gmail.com Paper Page: http://www.cs.ubc.ca/labs/imager/tr/2013/ VariantView/ Software Available as Open Source
96
Future work: Other use contexts 96 Scaling up beyond the current design target –~50 variants at once Possibly integrate Variant View with MedSavant –Tool by Fiume et al., BioVis Posters 2012 –Focus on interactive filtering
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.