Presentation is loading. Please wait.

Presentation is loading. Please wait.

Variant View Visualizing Sequence Variants in their Gene Context Joel A. Ferstay 1*, Cydney B. Nielsen 2, Tamara Munzner 1 University of British Columbia.

Similar presentations


Presentation on theme: "Variant View Visualizing Sequence Variants in their Gene Context Joel A. Ferstay 1*, Cydney B. Nielsen 2, Tamara Munzner 1 University of British Columbia."— Presentation transcript:

1 Variant View Visualizing Sequence Variants in their Gene Context Joel A. Ferstay 1*, Cydney B. Nielsen 2, Tamara Munzner 1 University of British Columbia 1, now at AeroInfo Systems/Boeing Canada* BC Cancer Agency 2 1

2 Design study Real users, real tasks, real data Find out what users are doing Support with visualization tool Reflect and present guidelines Analys t Tool 2

3 Design Study The Design Process 3

4 Collaborated with analysts at the Genome Sciences Centre Study genetic basis of leukemia Needed help interpreting their data Major problems: –What do we show? –How do we show it? 4

5 Design cycle Intervie w 5

6 Design cycle Intervie w Data and Tasks 6

7 Design cycle Intervie w Data and Tasks Create Data Sketch 7

8 Design cycle Intervie w Data and Tasks Create Data Sketch Present Data Sketch 8

9 Design cycle Intervie w Data and Tasks Create Data Sketch Present Data Sketch REPE AT 9

10 Data sketches Alternative to paper prototyping Load and show real data Beneficial when the data is complex 10 [Lloyd and Dykes, InfoVis 2011]

11 Can identify features of interest in data 11 Emphasize some features over others

12 Can identify design dead ends early 12

13 Methods and durations Semi-structured interviews –7 months –Once per week –One hour in duration Presented data sketches –8 deployed over 5 months –Implemented with D3 toolkit [Bostock et al., InfoVis 2011 ] 13 Design study methodology [Sedlmair et al., 2012]

14 Problem characterization: Data 14

15 The data Data are sequence variants –Difference between reference genome and a given genome 15

16 The data Data are sequence variants –Difference between reference genome and a given genome 16 Reference Genome DNA:ATA TGA TCA ACA CTT Sample 1 Genome DNA:ATA TGG TCA ATA CTT Sample 2 Genome DNA:ATA TGA TGA ACA CCT

17 The data Data are sequence variants –Difference between reference genome and a given genome 17 Reference Genome DNA:ATA TGA TCA ACA CTT Sample 1 Genome DNA:ATA TGG TCA ATA CTT Sample 2 Genome DNA:ATA TGA TGA ACA CCT Harmful ? Harmles s?

18 Multi-scale data 18 DNA of the organism

19 Multi-scale data 19 Genes are functional units

20 Multi-scale data 20 Exons code for proteins

21 Multi-scale data 21 Regions within proteins perform activities

22 Related work: Genome scale 22

23 Ensembl genome browser 23 Explore genome and variant data [Chen et al., BMC Genomics 2010]

24 Genome scale shown at the top 24

25 User data is stacked in horizontal tracks 25

26 User data is stacked in horizontal tracks 26 Very flexible framework

27 Problem with the genome scale 27 Features of interest become small

28 Problem with the genome scale 28 Features of interest become small Varian t

29 Problem with the genome scale 29 Features of interest become small Varian t Exo n

30 Analysts must pan and zoom 30

31 Analysts must pan and zoom 31 Heavy interaction costs with zooming

32 Analysts must pan and zoom 32 Heavy interaction costs with zooming Varian t Must know where to look

33 Raw variant data (What they looked at before) 33

34 Raw data variant-centric 34 Tabular data format

35 Raw data variant-centric 35 Tabular data format Variant row 1

36 Raw data variant-centric 36 Tabular data format Variant row 2

37 Raw data variant-centric 37 Tabular data format Variant row 2 Difficult to reason about variants without their biological context

38 Multiple biological levels/scales What to show? 38

39 Many biological levels and scales 39

40 Only some levels and scales are beneficial for variant analysis 40

41 Filter out whole genome; keep genes 41

42 Filter out non-exon regions 42

43 Left with a filtered scope 43

44 Related work: Filtered scope 44

45 The Ensembl Variation Image 45 [Chen et al., BMC Genomics 2010]

46 First filtering step: per-gene view 46 [Chen et al., BMC Genomics 2010] One gene shown

47 Second filtering step: partial removal inter-exon regions 47 [Chen et al., BMC Genomics 2010] Partial removal

48 Problem: Extends multiple screens 48 [Chen et al., BMC Genomics 2010]

49 Problem: Extends multiple screens 49 [Chen et al., BMC Genomics 2010] 1 st Screen

50 Problem: Extends multiple screens 50 [Chen et al., BMC Genomics 2010] 2 nd Screen

51 Problem: Features of interest small 51 [Chen et al., BMC Genomics 2010] Exon regions small Color coding difficult to see

52 Our Goal: Show attributes necessary for variant analysis 52

53 Use information-dense visual encoding 53

54 Use information-dense visual encoding 54

55 Use information-dense visual encoding 55

56 Use information-dense visual encoding 56

57 Use information-dense visual encoding 57

58 Use information-dense visual encoding 58

59 Use information-dense visual encoding 59

60 The tool: Variant View 60

61 Variant View 61

62 Variant View 62 Information-dense single gene view

63 Variant View 63 Information-dense single gene view No need for pan and zoom

64 Variant View 64 Sorting metrics guide gene navigation

65 Variant View 65 Sorting metrics guide gene navigation Control what shows up here

66 Variant View 66 Peripheral supporting data

67 Related work: Targeted for variant analysis 67

68 MuSiC variant visualization plot 68 [Dees et al., Genome Research 2012]

69 Side-by-side comparison 69

70 Side-by-side comparison 70 Protein regions can overlap Regions get separate lanes

71 Side-by-side comparison 71 Large bloom of repeated elements: more salient Many collocated variants

72 Driving biological tasks 72

73 Task 1: Discover genes Tool originally designed to discover genes with harmful variants –Sorting metrics guide single gene navigation –Uncover new genes affected by variants in leukemia Want to see if Variant View exposed known genes in leukemia 73

74 74

75 Sorting by derived metric revealed known leukemia genes 75

76 Highly scored gene by sorting metric 76

77 Visual inspection reveals collocation of variants 77

78 Several functional protein regions affected 78

79 Highly scored by metric and not known 79

80 Protein chemical class change evident 80

81 In contrast, low scoring gene 81

82 No collocation of variants 82

83 Mostly unaffected protein regions 83

84 Variant tasks Started with the main task of discover gene Shared tool with analysts Identified two more tasks! –Patient compare –Debug pipeline 84

85 Task 2: Patient compare Clinical setting application Compare patient data to known harmful variants The challenge –Similarity is loosely understood rather than fully characterized –What constitutes a match requires visual inspection 85

86 Adapted Variant View with minimal changes 86

87 Navigate through patient data with list 87

88 Patient data emphasized with arrows 88

89 Patient has same harmful L to P mutation 89

90 These variants probably do not match 90

91 Task 3: Debug pipeline Debug data generation pipeline –Remove errors from data before analysis takes place Analysts originally did not think they needed this support 91

92 Tool revealed errors in the data 92 The tool exposed artifacts in the data that slid past at least two rounds of quality metric filtering … this type of problem would not have been caught by our previous, automated methods. - Analyst 3

93 Conclusions 93 Designed, implemented, and deployed tool for visual variant impact assessment Originally designed for Discover Genes task –Adapted to two others with minimal changes Methods –What to show Filtering data scope –How to show it Carefully selected visual encodings Goals –Navigation-free main overview at gene level –Reveal genes of interest; accomplished by sorting by new, derived metrics

94 Acknowledgements 94 Our collaborators at the GSC –Dr Aly Karsan –Rod Docking –Dr Linda Chang –Dr Gerben Duns –Simon Chang Funding –VIVA, AeroInfo Systems/Boeing Canada, MITACS

95 Questions? 95 Joel Ferstay: Paper Page: VariantView/ Software Available as Open Source

96 Future work: Other use contexts 96 Scaling up beyond the current design target –~50 variants at once Possibly integrate Variant View with MedSavant –Tool by Fiume et al., BioVis Posters 2012 –Focus on interactive filtering


Download ppt "Variant View Visualizing Sequence Variants in their Gene Context Joel A. Ferstay 1*, Cydney B. Nielsen 2, Tamara Munzner 1 University of British Columbia."

Similar presentations


Ads by Google