Variant View Visualizing Sequence Variants in their Gene Context Joel A. Ferstay 1*, Cydney B. Nielsen 2, Tamara Munzner 1 University of British Columbia.

Slides:

Advertisements

Similar presentations

1 Radio Maria World. 2 Postazioni Transmitter locations.

Advertisements

EcoTherm Plus WGB-K 20 E 4,5 – 20 kW.

Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.

Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.

AGVISE Laboratories %Zone or Grid Samples – Northwood laboratory

Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.

PDAs Accept Context-Free Languages

ALAK ROY. Assistant Professor Dept. of CSE NIT Agartala

Reflection nurulquran.com.

EuroCondens SGB E.

Slide 1Fig 26-CO, p.795. Slide 2Fig 26-1, p.796 Slide 3Fig 26-2, p.797.

Slide 1Fig 25-CO, p.762. Slide 2Fig 25-1, p.765 Slide 3Fig 25-2, p.765.

& dding ubtracting ractions.

Sequential Logic Design

Addition and Subtraction Equations

David Burdett May 11, 2004 Package Binding for WS CDL.

1 When you see… Find the zeros You think…. 2 To find the zeros...

Western Public Lands Grazing: The Real Costs Explore, enjoy and protect the planet Forest Guardians Jonathan Proctor.

EQUS Conference - Brussels, June 16, 2011 Ambros Uchtenhagen, Michael Schaub Minimum Quality Standards in the field of Drug Demand Reduction Parallel Session.

Add Governors Discretionary (1G) Grants Chapter 6.

CHAPTER 18 The Ankle and Lower Leg

ASCII stands for American Standard Code for Information Interchange

The 5S numbers game..

突破信息检索壁垒－SciFinder Scholar 介绍

A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.

Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)

Break Time Remaining 10:00.

The basics for simulations

© 2010 Concept Systems, Inc.1 Concept Mapping Methodology: An Example.

PP Test Review Sections 6-1 to 6-6

MM4A6c: Apply the law of sines and the law of cosines.

TCCI Barometer March “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”

1 Prediction of electrical energy by photovoltaic devices in urban situations By. R.C. Ott July 2011.

Dynamic Access Control the file server, reimagined Presented by Mark on twitter 1 contents copyright 2013 Mark Minasi.

TCCI Barometer March “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”

Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.

Progressive Aerobic Cardiovascular Endurance Run

Biology 2 Plant Kingdom Identification Test Review.

MaK_Full ahead loaded 1 Alarm Page Directory (F11)

Facebook Pages 101: Your Organization’s Foothold on the Social Web A Volunteer Leader Webinar Sponsored by CACO December 1, 2010 Andrew Gossen, Senior.

TCCI Barometer September “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”

When you see… Find the zeros You think….

2011 WINNISQUAM COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=1021.

Before Between After.

2011 FRANKLIN COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=332.

2.10% more children born Die 0.2 years sooner Spend 95.53% less money on health care No class divide 60.84% less electricity 84.40% less oil.

Foundation Stage Results CLL (6 or above) 79% 73.5%79.4%86.5% M (6 or above) 91%99%97%99% PSE (6 or above) 96%84%100%91.2%97.3% CLL.

Subtraction: Adding UP

Numeracy Resources for KS2

1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)

Static Equilibrium; Elasticity and Fracture

Converting a Fraction to %

Resistência dos Materiais, 5ª ed.

& dding ubtracting ractions.

Lial/Hungerford/Holcomb/Mullins: Mathematics with Applications 11e Finite Mathematics with Applications 11e Copyright ©2015 Pearson Education, Inc. All.

Biostatistics course Part 14 Analysis of binary paired data

UNDERSTANDING THE ISSUES. 22 HILLSBOROUGH IS A REALLY BIG COUNTY.

A Data Warehouse Mining Tool Stephen Turner Chris Frala

1 Dr. Scott Schaefer Least Squares Curves, Rational Representations, Splines and Continuity.

Chart Deception Main Source: How to Lie with Charts, by Gerald E. Jones Dr. Michael R. Hyman, NMSU.

1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)

Introduction Embedded Universal Tools and Online Features 2.

úkol = A 77 B 72 C 67 D = A 77 B 72 C 67 D 79.

Schutzvermerk nach DIN 34 beachten 05/04/15 Seite 1 Training EPAM and CANopen Basic Solution: Password * * Level 1 Level 2 * Level 3 Password2 IP-Adr.

Presentation transcript:

Variant View Visualizing Sequence Variants in their Gene Context Joel A. Ferstay 1*, Cydney B. Nielsen 2, Tamara Munzner 1 University of British Columbia 1, now at AeroInfo Systems/Boeing Canada* BC Cancer Agency 2 1

Design study Real users, real tasks, real data Find out what users are doing Support with visualization tool Reflect and present guidelines Analys t Tool 2

Design Study The Design Process 3

Collaborated with analysts at the Genome Sciences Centre Study genetic basis of leukemia Needed help interpreting their data Major problems: –What do we show? –How do we show it? 4

Design cycle Intervie w 5

Design cycle Intervie w Data and Tasks 6

Design cycle Intervie w Data and Tasks Create Data Sketch 7

Design cycle Intervie w Data and Tasks Create Data Sketch Present Data Sketch 8

Design cycle Intervie w Data and Tasks Create Data Sketch Present Data Sketch REPE AT 9

Data sketches Alternative to paper prototyping Load and show real data Beneficial when the data is complex 10 [Lloyd and Dykes, InfoVis 2011]

Can identify features of interest in data 11 Emphasize some features over others

Can identify design dead ends early 12

Methods and durations Semi-structured interviews –7 months –Once per week –One hour in duration Presented data sketches –8 deployed over 5 months –Implemented with D3 toolkit [Bostock et al., InfoVis 2011 ] 13 Design study methodology [Sedlmair et al., 2012]

Problem characterization: Data 14

The data Data are sequence variants –Difference between reference genome and a given genome 15

The data Data are sequence variants –Difference between reference genome and a given genome 16 Reference Genome DNA:ATA TGA TCA ACA CTT Sample 1 Genome DNA:ATA TGG TCA ATA CTT Sample 2 Genome DNA:ATA TGA TGA ACA CCT

The data Data are sequence variants –Difference between reference genome and a given genome 17 Reference Genome DNA:ATA TGA TCA ACA CTT Sample 1 Genome DNA:ATA TGG TCA ATA CTT Sample 2 Genome DNA:ATA TGA TGA ACA CCT Harmful ? Harmles s?

Multi-scale data 18 DNA of the organism

Multi-scale data 19 Genes are functional units

Multi-scale data 20 Exons code for proteins

Multi-scale data 21 Regions within proteins perform activities

Related work: Genome scale 22

Ensembl genome browser 23 Explore genome and variant data [Chen et al., BMC Genomics 2010]

Genome scale shown at the top 24

User data is stacked in horizontal tracks 25

User data is stacked in horizontal tracks 26 Very flexible framework

Problem with the genome scale 27 Features of interest become small

Problem with the genome scale 28 Features of interest become small Varian t

Problem with the genome scale 29 Features of interest become small Varian t Exo n

Analysts must pan and zoom 30

Analysts must pan and zoom 31 Heavy interaction costs with zooming

Analysts must pan and zoom 32 Heavy interaction costs with zooming Varian t Must know where to look

Raw variant data (What they looked at before) 33

Raw data variant-centric 34 Tabular data format

Raw data variant-centric 35 Tabular data format Variant row 1

Raw data variant-centric 36 Tabular data format Variant row 2

Raw data variant-centric 37 Tabular data format Variant row 2 Difficult to reason about variants without their biological context

Multiple biological levels/scales What to show? 38

Many biological levels and scales 39

Only some levels and scales are beneficial for variant analysis 40

Filter out whole genome; keep genes 41

Filter out non-exon regions 42

Left with a filtered scope 43

Related work: Filtered scope 44

The Ensembl Variation Image 45 [Chen et al., BMC Genomics 2010]

First filtering step: per-gene view 46 [Chen et al., BMC Genomics 2010] One gene shown

Second filtering step: partial removal inter-exon regions 47 [Chen et al., BMC Genomics 2010] Partial removal

Problem: Extends multiple screens 48 [Chen et al., BMC Genomics 2010]

Problem: Extends multiple screens 49 [Chen et al., BMC Genomics 2010] 1 st Screen

Problem: Extends multiple screens 50 [Chen et al., BMC Genomics 2010] 2 nd Screen

Problem: Features of interest small 51 [Chen et al., BMC Genomics 2010] Exon regions small Color coding difficult to see

Our Goal: Show attributes necessary for variant analysis 52

Use information-dense visual encoding 53

Use information-dense visual encoding 54

Use information-dense visual encoding 55

Use information-dense visual encoding 56

Use information-dense visual encoding 57

Use information-dense visual encoding 58

Use information-dense visual encoding 59

The tool: Variant View 60

Variant View 61

Variant View 62 Information-dense single gene view

Variant View 63 Information-dense single gene view No need for pan and zoom

Variant View 64 Sorting metrics guide gene navigation

Variant View 65 Sorting metrics guide gene navigation Control what shows up here

Variant View 66 Peripheral supporting data

Related work: Targeted for variant analysis 67

MuSiC variant visualization plot 68 [Dees et al., Genome Research 2012]

Side-by-side comparison 69

Side-by-side comparison 70 Protein regions can overlap Regions get separate lanes

Side-by-side comparison 71 Large bloom of repeated elements: more salient Many collocated variants

Driving biological tasks 72

Task 1: Discover genes Tool originally designed to discover genes with harmful variants –Sorting metrics guide single gene navigation –Uncover new genes affected by variants in leukemia Want to see if Variant View exposed known genes in leukemia 73

74

Sorting by derived metric revealed known leukemia genes 75

Highly scored gene by sorting metric 76

Visual inspection reveals collocation of variants 77

Several functional protein regions affected 78

Highly scored by metric and not known 79

Protein chemical class change evident 80

In contrast, low scoring gene 81

No collocation of variants 82

Mostly unaffected protein regions 83

Variant tasks Started with the main task of discover gene Shared tool with analysts Identified two more tasks! –Patient compare –Debug pipeline 84

Task 2: Patient compare Clinical setting application Compare patient data to known harmful variants The challenge –Similarity is loosely understood rather than fully characterized –What constitutes a match requires visual inspection 85

Adapted Variant View with minimal changes 86

Navigate through patient data with list 87

Patient data emphasized with arrows 88

Patient has same harmful L to P mutation 89

These variants probably do not match 90

Task 3: Debug pipeline Debug data generation pipeline –Remove errors from data before analysis takes place Analysts originally did not think they needed this support 91

Tool revealed errors in the data 92 The tool exposed artifacts in the data that slid past at least two rounds of quality metric filtering … this type of problem would not have been caught by our previous, automated methods. - Analyst 3

Conclusions 93 Designed, implemented, and deployed tool for visual variant impact assessment Originally designed for Discover Genes task –Adapted to two others with minimal changes Methods –What to show Filtering data scope –How to show it Carefully selected visual encodings Goals –Navigation-free main overview at gene level –Reveal genes of interest; accomplished by sorting by new, derived metrics

Acknowledgements 94 Our collaborators at the GSC –Dr Aly Karsan –Rod Docking –Dr Linda Chang –Dr Gerben Duns –Simon Chang Funding –VIVA, AeroInfo Systems/Boeing Canada, MITACS

Questions? 95 Joel Ferstay: Paper Page: VariantView/ Software Available as Open Source

Future work: Other use contexts 96 Scaling up beyond the current design target –~50 variants at once Possibly integrate Variant View with MedSavant –Tool by Fiume et al., BioVis Posters 2012 –Focus on interactive filtering