Presentation is loading. Please wait.

Presentation is loading. Please wait.

Considerations for Analyzing Targeted NGS Data HLA Tim Hague, CTO.

Similar presentations

Presentation on theme: "Considerations for Analyzing Targeted NGS Data HLA Tim Hague, CTO."— Presentation transcript:

1 Considerations for Analyzing Targeted NGS Data HLA Tim Hague, CTO

2 Introduction  Human leukocyte antigen (HLA) is the major histocompatibility complex (MHC) in humans.  Group of genes ('superregion') on chromosome 6  Essentially encodes cell-surface antigen- presenting proteins.

3 Functions HLA genes have functions in:  combating infectious diseases  graft/transplant rejection  autoimmunity  cancer

4 Alleles  Large number of alleles (and proteins).  Many alleles are already known. The number of known alleles is increasing

5 HLA Class I Gene A B C Alleles Proteins HLA Class II Gene DRADRB* DQA1 DQB1 DPA1 DPB1 Alleles Proteins HLA Class II - DRB Alleles Gene DRB1 DRB3 DRB4 DRB5 Alleles Proteins

6 Analysis Challenges HLA genes have specific analysis challenges regardless of the sequencing technology.

7 High Polymorphism High rate of polymorphism – up to 100 times the average human mutation rate.  The HLA-DRB1 and HLA-B loci have the highest sequence variation rate within the human genome.  High degree of heterozygosity – homozygotes are the exception in this region.


9 Duplications  High level of segmental duplications  Lots of similar genes and lots of very similar pseudegenes.  Duplicated segments can be more similar to each other within an individual than they are similar to the corresponding segments of the reference genome.


11 Complex Genetics  Particularly HLA-DRB*  The DR β-chain is encoded by 4 loci, however only no more than 3 functional loci are present in a single individual, and only a maximum of 2 per chromosome.


13 Mitigating Factors It's not all bad news:  Many HLA alleles are already well known – both in terms of sequence and frequencies within the population.  The HLA region is fairly small so there a high degree of linkage disequilibrium, and therefore lots of known haplotypes.

14 Traditional Typing  SSO – low resolution, high throughput, cheap  SSP – very fast results, low resolution  SBT – sequence-based typing, high resolution, usually done by Sanger sequencing.

15 NGS Typing High resolution, an alternative to Sanger- based SBT Why is it needed?

16 Sanger and HLA  Sanger data is still the gold standard in the genomic sequencing industry, even though it is very expensive compared to NGS.  1 in 1'000 base error rate, if forward and reverse typing are done, error rate drops to 1 in 1'000'000. So why is it bad for HLA?

17 Phase Resolution  2x chromosome 6  Many loci, many alleles  Lots of heterozygosity

18 reference sequence A T Allele 1 Allele 2 AT Allele 1 Allele 2 OR??? Allele Phasing problem T/AT/A G/TG/T consensus sequence

19 The Problem with Sanger  There is only one signal  High degree of heterozygosity = high degree of ambiguity  Requires statistical techniques based on known allele frequencies, plus manual intervention by trained operators  Ambiguity can only be resolved statistically, which can lead to wrong assignment for rare types



22 NGS Advantages  Can reduce ambiguity  Phase resolution - two signals, but lots of short reads  Cheaper and faster than Sanger  Less manual intervention required

23 NGS Data - Unphased

24 NGS Data - Phased

25 NGS Approaches  HLA*IMP – chip based imputation engine  Reference-based alignment, followed by a HLA call based on the variants detected during alignment  Search against database of known alleles

26 NGS Reference-based  Fraught with difficulties  Very hard to align reads to this region  The variant/HLA call is only as good as the alignment  No coverage = no call Has been attempted by Broad Institute (HLA Caller) and Roche

27 Alignment Efforts RainDance provide a targeted HLA amplification kit call HLAseq. Target: the whole MHC superregion (except for some tandem repeat regions) Goal: align this data, before doing variant/HLA call.

28 Diverse variant “density” in the MHC superregion Based on a single sample

29 Default BWA alignment – No coverage at an exon of HLA-DMB

30 Low coverage and orphaned reads at a HLA-DRB1 exon

31 BWA vs more permissive alignment: higher coverage = higher noise

32 Large targeted region without usable coverage

33 NGS Reference-based Not providing enough coverage everywhere What about de novo?

34 De novo assembly (MIRA) 287 contigs (longest contig: 2199 bp) Mean contig size: 268 bp Median contig size: 209 bp Total consensus: bp RainDance target: ~ bp

35 De novo assembly (MIRA)

36 NGS De Novo Alignment Not enough contigs produced, not enough coverage of the target region. What about a hybrid approach?

37 De novo assembly with “backbone” First, alignment to backbone, then de novo assembly Backbone: 2220 contigs from HG19 chr 6 (sum: bps) → almost whole RainDance target Results: Max reads / backbone contig: 197 Max coverage: 71

38 De novo assembly with “backbone”

39 NGS Typing - Alignment Based We tried:  Burrows Wheeler aligner  More sensitive, seed and extend aligner  De novo aligner  'Hybrid' de novo aligner  The variant/HLA call is only as good as the alignment  The alignments were not good enough

40 NGS Database Based  Search against 'database' of known alleles  Such as IMGT/HLA database, available from EBI web site Stanford, Connexio, JSI Medical, BC Cancer Agency and Omixon have all tried this approach.


42 DB Based Approach Advantages  Less mapping headaches  Unambiguous results  Potential to be fast Difficulties  Novel allele detection  Homozygous alleles



45 Results with Exome data

46 Exon level detail

47 Detailed results - short read pileup

48 Conclusions  DB based approach to HLA typing is new but very promising  NGS approaches can resolve much of the ambiguity of Sanger SBT  DB based approach can also overcome the limitations of NGS reference-based alignment

49 Conclusions Available DB based HLA typing tools differ in:  Speed  Sequencers supported  Types of sequencing data supported (targeted, exome, whole genome)  Ease of use  Ambiguity of results  Degree of manual intervention required  Novel allele detection capabilities

Download ppt "Considerations for Analyzing Targeted NGS Data HLA Tim Hague, CTO."

Similar presentations

Ads by Google