Presentation is loading. Please wait.

Presentation is loading. Please wait.

Considerations for Analyzing Targeted NGS Data HLA

Similar presentations

Presentation on theme: "Considerations for Analyzing Targeted NGS Data HLA"— Presentation transcript:

1 Considerations for Analyzing Targeted NGS Data HLA
Tim Hague, CTO 1

2 Introduction Human leukocyte antigen (HLA) is the major histocompatibility complex (MHC) in humans. Group of genes ('superregion') on chromosome 6 Essentially encodes cell-surface antigen- presenting proteins. 2

3 Functions HLA genes have functions in: combating infectious diseases
graft/transplant rejection autoimmunity cancer 3

4 Alleles Large number of alleles (and proteins).
Many alleles are already known. The number of known alleles is increasing 4

5 Gene DRA DRB* DQA1 DQB1 DPA1 DPB1 Alleles 7 1260 47 176 34 155
HLA Class I Gene A B C Alleles Proteins HLA Class II Gene DRA DRB* DQA1 DQB1 DPA1 DPB1 Alleles Proteins HLA Class II - DRB Alleles Gene DRB1 DRB3 DRB4 DRB5 Alleles Proteins 5

6 Analysis Challenges HLA genes have specific analysis challenges regardless of the sequencing technology. 6

7 High Polymorphism High rate of polymorphism – up to 100 times the average human mutation rate. The HLA-DRB1 and HLA-B loci have the highest sequence variation rate within the human genome. High degree of heterozygosity – homozygotes are the exception in this region. 7

8 8

9 Duplications High level of segmental duplications
Lots of similar genes and lots of very similar pseudegenes. Duplicated segments can be more similar to each other within an individual than they are similar to the corresponding segments of the reference genome. 9

10 10

11 Complex Genetics Particularly HLA-DRB*
The DR β-chain is encoded by 4 loci, however only no more than 3 functional loci are present in a single individual, and only a maximum of 2 per chromosome. 11

12 12

13 Mitigating Factors It's not all bad news:
Many HLA alleles are already well known – both in terms of sequence and frequencies within the population. The HLA region is fairly small so there a high degree of linkage disequilibrium, and therefore lots of known haplotypes. 13

14 Traditional Typing SSO – low resolution, high throughput, cheap
SSP – very fast results, low resolution SBT – sequence-based typing, high resolution, usually done by Sanger sequencing. 14

15 High resolution, an alternative to Sanger- based SBT
NGS Typing High resolution, an alternative to Sanger- based SBT Why is it needed? 15

16 Sanger and HLA Sanger data is still the gold standard in the genomic sequencing industry, even though it is very expensive compared to NGS. 1 in 1'000 base error rate, if forward and reverse typing are done, error rate drops to 1 in 1'000'000. So why is it bad for HLA? 16

17 Phase Resolution 2x chromosome 6 Many loci, many alleles
Lots of heterozygosity 17

18 Allele Phasing problem
reference sequence consensus sequence T / A G / T OR??? Allele 1 Allele 2 T Allele 1 A T A Allele 2 18

19 The Problem with Sanger
There is only one signal High degree of heterozygosity = high degree of ambiguity Requires statistical techniques based on known allele frequencies, plus manual intervention by trained operators Ambiguity can only be resolved statistically, which can lead to wrong assignment for rare types 19

20 20

21 Number of potential alleles

22 NGS Advantages Can reduce ambiguity
Phase resolution - two signals, but lots of short reads Cheaper and faster than Sanger Less manual intervention required 22

23 NGS Data - Unphased 23

24 NGS Data - Phased 24

25 NGS Approaches HLA*IMP – chip based imputation engine
Reference-based alignment, followed by a HLA call based on the variants detected during alignment Search against database of known alleles 25

26 Has been attempted by Broad Institute (HLA Caller) and Roche
NGS Reference-based Fraught with difficulties Very hard to align reads to this region The variant/HLA call is only as good as the alignment No coverage = no call Has been attempted by Broad Institute (HLA Caller) and Roche 26

27 Alignment Efforts RainDance provide a targeted HLA amplification kit call HLAseq. Target: the whole MHC superregion (except for some tandem repeat regions) Goal: align this data, before doing variant/HLA call. 27

28 Diverse variant “density” in the MHC superregion
Based on a single sample 28

29 Default BWA alignment – No coverage at an exon of HLA-DMB

30 Low coverage and orphaned reads at a HLA-DRB1 exon

31 BWA vs more permissive alignment: higher coverage = higher noise

32 Large targeted region without usable coverage

33 Not providing enough coverage everywhere
NGS Reference-based Not providing enough coverage everywhere What about de novo? 33

34 De novo assembly (MIRA)
287 contigs (longest contig: 2199 bp) Mean contig size: 268 bp Median contig size: 209 bp Total consensus: bp RainDance target: ~ bp 34

35 De novo assembly (MIRA)

36 NGS De Novo Alignment Not enough contigs produced, not enough coverage of the target region. What about a hybrid approach? 36

37 De novo assembly with “backbone”
First, alignment to backbone, then de novo assembly Backbone: 2220 contigs from HG19 chr 6 (sum: bps) → almost whole RainDance target Results: Max reads / backbone contig: 197 Max coverage: 71 37

38 De novo assembly with “backbone”

39 NGS Typing - Alignment Based
We tried: Burrows Wheeler aligner More sensitive, seed and extend aligner De novo aligner 'Hybrid' de novo aligner The variant/HLA call is only as good as the alignment The alignments were not good enough 39

40 NGS Database Based Search against 'database' of known alleles
Such as IMGT/HLA database, available from EBI web site Stanford, Connexio, JSI Medical, BC Cancer Agency and Omixon have all tried this approach. 40

41 41

42 DB Based Approach Advantages Less mapping headaches
Unambiguous results Potential to be fast Difficulties Novel allele detection Homozygous alleles 42

43 43

44 44

45 Results with Exome data

46 Exon level detail 46

47 Detailed results - short read pileup

48 Conclusions DB based approach to HLA typing is new but very promising
NGS approaches can resolve much of the ambiguity of Sanger SBT DB based approach can also overcome the limitations of NGS reference-based alignment 48

49 Conclusions Available DB based HLA typing tools differ in: Speed
Sequencers supported Types of sequencing data supported (targeted, exome, whole genome) Ease of use Ambiguity of results Degree of manual intervention required Novel allele detection capabilities 49

Download ppt "Considerations for Analyzing Targeted NGS Data HLA"

Similar presentations

Ads by Google