Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hepatitis C Analysis of Sequence Data from ARUP and NCBI databases By Ian Odell.

Similar presentations


Presentation on theme: "Hepatitis C Analysis of Sequence Data from ARUP and NCBI databases By Ian Odell."— Presentation transcript:

1 Hepatitis C Analysis of Sequence Data from ARUP and NCBI databases By Ian Odell

2 What Information can we get from ARUP sequencing data? Data is from January 2002 – July 2004. 5’ Un-translated Region of types 1 – 6: Number of unique sequences by type. Frequency of unique sequences for each type. Frequency of each base in each type seen in a position weight matrix. Regions of high and low variation seen in graphs of a Position Weight Matrix. Data is from January 2002 – July 2004. 5’ Un-translated Region of types 1 – 6: Number of unique sequences by type. Frequency of unique sequences for each type. Frequency of each base in each type seen in a position weight matrix. Regions of high and low variation seen in graphs of a Position Weight Matrix.

3 Unique Sequences by type: HCV TotalUniqueUnambiguous TypeSequencesSequencesUnique Sequences 1161511320750 22862585373 32430404232 42849968 5754 6442017 total2177824341444 ***Ambiguous bases causes unique sequences to be overrepresented. HCV TotalUniqueUnambiguous TypeSequencesSequencesUnique Sequences 1161511320750 22862585373 32430404232 42849968 5754 6442017 total2177824341444 ***Ambiguous bases causes unique sequences to be overrepresented.

4 Frequency of unique sequences for type 1:

5

6 Frequency of unique sequences for type 2:

7

8 Frequency of unique sequences for type 3:

9

10 Frequency of unique sequences for type 4:

11

12 Frequency of unique sequences for type 5:

13 Frequency of unique sequences for type 6:

14

15 Conclusions 1. Each type has a ‘profile’ sequence. 2. Do the log v log graphs give us insight into the distribution of mutations within the Hepatitis C population? NEXT: Look for variation between and within types from the unique sequences that are highly represented in the population (i.e. those that have many duplicates). Open ProfilesProfiles 1. Each type has a ‘profile’ sequence. 2. Do the log v log graphs give us insight into the distribution of mutations within the Hepatitis C population? NEXT: Look for variation between and within types from the unique sequences that are highly represented in the population (i.e. those that have many duplicates). Open ProfilesProfiles

16 Stuyver et al. 1996. “Second-generation line probe assay for hepatitis C virus genotyping.” J. Clin. Microbiol. 34:2259-2266. In R5, the six selected probes were used for types 1 (line 4), 3 (line 15), 4 and 10 (line 18), and 5 (line 20), as well as for subtypes 2a/2c (line 11), 2b (line 12), and 3b (line 18).

17 Weight Matrices From Profiles, we can see areas of variation between types and their conservation within each type. Next, we want to see what these look like for all sequences in each type. From Profiles, we can see areas of variation between types and their conservation within each type. Next, we want to see what these look like for all sequences in each type.

18 Example Weight Matrix First 10 base positions of Type 2 HCV 12345678910 A0.0004330.9997520.0001240.000186000.999629000.000124 C006.20E-05 0 0 G 00.0001240.9995670.00049506.20E-050.0001866.20E-050.000186 T0.9992576.20E-05000.9988246.20E-050.0002486.20E-050.9996296.20E-05 DASH0.0001860.0001240.999690.0001246.20E-050.9998766.20E-050.999690.0001240.999629 AMBIG6.20E-05 0 0.0005570000.0001240 This allows us to see the variation within a type at each nucleotide.

19 Graphical Type 1 Weight Matrix [ R5 ]  ] Sum of all points at each x-value = 1. Y-value tells us percentage each base is found at that index. We are looking for a region of conservation in all types; later we can look for variation between types.

20 Graphical Type 2 Weight Matrix

21 Graphical Type 3 Weight Matrix

22 Graphical Type 4 Weight Matrix

23 Graphical Type 5 Weight Matrix

24 Graphical Type 6 Weight Matrix

25 What information can we get from NCBI data? Look at Complete HCV Genome publications because blasting 5’ UTR primers biases towards what those primers amplify (i.e. Blast returns most similar hits and we want to look for variation). Are there mismatches under the ARUP primers?  Do ARUP primers bias the sequence data by not amplifying a certain group? Regions of low and high variation in the complete genome. Compare to 5’ UTR.  alignment not good enough for an accurate analysis. Look at Complete HCV Genome publications because blasting 5’ UTR primers biases towards what those primers amplify (i.e. Blast returns most similar hits and we want to look for variation). Are there mismatches under the ARUP primers?  Do ARUP primers bias the sequence data by not amplifying a certain group? Regions of low and high variation in the complete genome. Compare to 5’ UTR.  alignment not good enough for an accurate analysis.

26 Graphical Weight Matrix of ARUP (5’ UTR) Amplicon [ Rev Primer ] [ For Primer ]  ] [ Rev Primer ] [ For Primer ]  ] Data is from 239 aligned complete HCV genomes downloaded from GenBank.

27 Graphical Weight Matrix ARUP forward primer region in Blast complete genome alignment 1532 Ins17SNP’s / 239 Sequences SNP’s and insertions under ARUP Forward Primer

28 Graphical Weight Matrix ARUP reverse primer in Blast complete genome alignment 3SNP’s / 239 Sequences SNP’s and insertions under ARUP Reverse Primer


Download ppt "Hepatitis C Analysis of Sequence Data from ARUP and NCBI databases By Ian Odell."

Similar presentations


Ads by Google