Student: Tyler J. Daddio (CSE, Mathematics)

Pairing T cell receptor α and β sequences using combinatorial pooling and min-cost flows
Student: Tyler J. Daddio (CSE, Mathematics) Advisor: Dr. Ion Măndoiu (CSE) Good morning everyone! My name is Tyler Daddio. I am pursuing a dual degree in CSE and Mathematics. My advisor is Dr. Ion Mandoiu who is a professor in the CSE department and a great teacher. And this is my project. Don’t worry if you don’t know many of those words, we’ll hopefully get to all of them.

The αβ T Cell And this little cell is the core of my project: the alpha beta T cell. T cells are a type of white blood cell which means they’re the ones that seek out and eliminate infected cells.

The αβ T Cell They do this using these marvelous protein receptors on their surface, aptly named T cell receptors. These receptors are composed of two primary parts: a variable region and a constant region. This variable region is also composed of two parts: an alpha segment and a beta segment. These are the primary elements that characterize the receptor, so they are what I focused on. So how are these receptors formed and how do they work?

How are TCRs formed? In biological terms, both the alpha and beta are formed by a type of genetic recombination. For the rest of us, this means they are both formed by selecting gene from a few different groups and putting them together.

How are TCRs formed? ~1015 possible unique TCRS!
It’s estimated this process can form about 10 quadrillion unique TCRs. This is about 30 times more cells than in your entire body, so obviously you only have a small portion of these possible cells at once. And this is reduced even more because a lot of these would recognize your own cells are dangerous.

How are T cells selected for/against?
So to ensure these cells don’t recognize your own cells as dangerous, they go through a process of maturation. This essentially just involves testing the receptor against your own body cells. If they recognize your body cells too well or they don’t recognize them at all, they are eliminated. And if they’re right in the middle, they’re perfect. T cells that pass this are then released into the body to do their job. *adapted from

How do T cell work? So once in the body, how do they work?
Cells in your body, colored purple here, create proteins which eventually wear out. When they do, they are broken down into smaller pieces called peptides. These peptides then combine with a larger molecule called MHC class 1. This peptide-MHC complex then gets presented on the surface of cells.

How do T cell work? And when a T cell happens by, their receptor brushes up against this complex and interacts with it. Here you can see sort of how this works. So how exactly does this all come together to create an immune response?

What do T cells do? Let’s say this body cell is infected with a virus and this T cell recognizes this virus.

What do T cells do? When the T cell brushes up against this cell, its receptor interacts with the peptide-MHC complex.

What do T cells do? The T cell will then clonally expand, creating many copies of itself with the same receptor.

What do T cells do? The T cell will then release toxins and kill the target cell. So why is this useful?

Imagine this is a cancer cell
Why is this useful? Imagine this is a cancer cell Well imagine if this were a cancer cell instead. Cancer is caused by mutations in the cell’s genome, so it follows that there may be peptides that aren’t recognized as ‘self’ as a consequence. If a T cell has the right receptor, it should then be able to usher an immune response against the tumor. Knowing what T cells a person has could help us in an effort to target cancer cells.

Sequencing the T cell repertoire
So understanding TCRs first comes from sequencing them whereby we can figure out the exact DNA sequence coding for them. Sequencing become incredibly cheap and fast over the course of the last two decades, and it can be used to identify the alpha and beta individually. However, the alpha and beta function as a pair. This finally leads us to the question my project investigated.

The Pairing Problem

The Pairing Problem There are two types of methods we could use: physical and computational. Physical methods require modification of the fast and cheap sequencing techniques widely available already. Computational methods allow us to take advantage of this sequencing.

The Pairing Problem As I am a CS student, I am much more partial to computational methods. This is what I did for my project. So what did I do exactly?

Experimental Design T cells are first retrieved from the patient.

Experimental Design They are then distributed uniformly across 96 wells. Each well is then sequenced to get a list of alphas and betas in each well.

Experimental Design Suppose we focus only on a single TCR.
It may be found in a multitude of wells. However, the alpha and beta may not always appear together due to sequencing errors. This is called the recovery rate. So a recovery rate of 1 means that the alpha and beta are always found together, 0 would mean never. The true rate is somewhere in the middle, more towards 1.

Experimental Design So let’s first consider the betas.
We number the wells so we can identify. And this beta happens to appear in these six wells. We call this a well set.

Experimental Design We then abstract this well set by using a circle to represent it. These are called nodes. And we do this for every unique beta sequence in all 96 wells. This gives us some number of beta nodes.

Experimental Design And we again do this but for the alphas.
In this example, the alpha that actually pairs with the previous beta appears in only 5 wells instead of 6. Again, this is due to the recovery rate not being 100%.

Experimental Design And we do the same thing will the remainder of the unique alphas. This gives us a set of alpha nodes representing the distinct well sets alpha sequences appear in, and the same with betas. But now we actually want to match these nodes together. Doing so will tell us which alpha and beta sequences are most likely a pair. Logically, it would make sense to pair alpha and beta sequences that appear in the most wells together.

Arcs and Weights We can do this by using the hamming distance.
This distance metric simply measures the number of mismatches between the two well sets. So here we see alpha appears in all the same wells as the beta, but not in 58. So this has a hamming distance of 1.

Arcs and Weights And the same idea with these other possible sequences. This alpha and this beta have a hamming distance of 5 because they don’t appear with each other in wells 5 times.

Adding the Arcs We take these pairwise hamming distances and then add these lines between the respective nodes. We call these lines arcs. And the number on each arc is referred to as the cost.

Adding Arcs and Finding a Matching
We could then do this for all pairs of alphas and betas. We would then want to find a match for each alpha and each beta so that the total cost (sum over all used arcs) is minimized. This would give us the optimal pairings of alphas with betas. Although we would have problems with just adding arcs for every pair of alpha and beta nodes. For 10^6 nodes on each side, this would mean a trillion arcs. That would require way too much time and way too many resources to do in practice. We’ll come back to this though.

Min-Cost Max-Flow So we can find a matching the previous way, but there is a very well-studied group of problems in CS called Network Flows. These have very efficient solvers available. We just have to build the network correctly, and it’ll give us the matching. They work by putting a certain amount of supply at this new node s.

Min-Cost Max-Flow Then you push the maximum amount of flow through the network to t as possible. Each arc allows an amount of flow over it equal to it cost (the hamming distance). We then minimize the cost of this. This will give us our matching. But, again, using the complete graph is infeasible on modern machines. We need a way to reduce the number of arcs in the network without compromising the quality of the matching.

Relative Radius Sparsification
Using this method of sparsification, arcs are only added between nodes within a hamming distance of r* from each other. r* is just the max of the size of the two well sets times some fractional number r. We specify r. If the alpha sequence appears in 10 wells and the beta sequence appears in 15 wells, then this would be 15 times r.

Datasets So before we see some results, we have to discuss the datasets used. Only one of five experimental datasets was analyzed using this method thus far. It has more than 10,000 alpha and beta nodes in the network. Using a similar distribution of sequences among the 96 wells, I then simulated 11 other datasets but with different recovery rates. This allowed us to investigate the effect of different recovery rates on the matching accuracy.

Results – How Good is the Matching?
So how good was the matching using RR sparsification? You can see the graph of the matching cost (the sum over the costs of all the arcs used), and the number of identified pairs. The most striking aspect of this is that the effectiveness of the method increases almost instantly as soon as some critical value of r is surpassed. This is evident in both graphs. You can see that once it does pass this threshold, the effectiveness of the method plateaus. It doesn’t get any better. You can also see that the critical value of r varies depending on the recovery rate.

Total Arcs in Graph And how well did the sparsification work?
*explain how well it worked* Regardless of the recovery rate, most methods maximized their effectiveness at r >= 0.9. Don’t have that graph, but we achieve near optimal matching with only 0.5% of the arcs.

Pairing Precision And being as data was simulated, we know what the answers are supposed to be. This is a plot of the accuracy, which is measured by the number of identified pairs that are actually pairs by design. As would be expected, lower recovery rates translates to lower possible accuracy, and vice versa. However the critical values of r for this phase transition are very evident in this graph.

Pairing Recall And being as data was simulated, we know what the answers are supposed to be. This is a plot of the accuracy, which is measured by the number of identified pairs that are actually pairs by design. As would be expected, lower recovery rates translates to lower possible accuracy, and vice versa. However the critical values of r for this phase transition are very evident in this graph.

Pairing F-measure And being as data was simulated, we know what the answers are supposed to be. This is a plot of the accuracy, which is measured by the number of identified pairs that are actually pairs by design. As would be expected, lower recovery rates translates to lower possible accuracy, and vice versa. However the critical values of r for this phase transition are very evident in this graph.

Review In review….

Future Work Here is some future work.

Questions?

Thank you!

References J.D. Ashwell, A. Weissman. Clinical Immunology: Principles and Practice, R.R. Rich, T.A. Fleisher, W.T. Shearer, B.L. Kotzin, H.W. Schroeder Jr., Eds (Mosby International Limited, London, 2001), chap. 5, pp M.M. Davis, P.J. Bjorkman. T-cell antigen receptor genes and T-cell recognition. Nature, pp , 744, 1988. C.E. Busse, I. Czogiel, P. Braun, P.F. Arndt, H. Wardemann. Single-cell based highthroughput sequencing of full-length immunoglobulin heavy and light chain genes. Eur. J. Immunol. 44, pp , 2014. B. Dezs, A. Ju¨ttner, and P. Kovács. LEMON - an Open Source C++ Graph Template Library. Electronic Notes in Theoretical Computer Science (ENTCS). 264(5), pp , July 2011. B. Howie, A.M. Sherwood, A.D. Berkebile, J. Berka, R.O. Emerson, D.W. Williamson, I. Kirsch, M. Vignali, M.J. Rieder, C.S. Carlson, and H.S. Robins. High-throughput pairing of T cell receptor α and β sequences. Sci. Transl. Med. 7, 301ra131, 2015. M. Kalos, B.L. Levine, D.L. Porter, S. Katz, S.A. Grupp, and A. Bagg. T cells with chimeric antigen receptors have potent antitumor effects and can establish memory in patients with advanced leukemia. Sci. Transl. Med. 95ra73, 2011. A. Khan, A. Pothen, M. Ali Patwary, N. Satish, N. Sundaram, F. Manne, M. Halappanavar, and P. Dubey. Efficient Approximation Algorithms For Weighted B-Matching. D. Laydon, C. Bangham, and B. Asquith. Estimating T-cell repertoire diversity: Limitations of classical estimators and a new approach. Phil. Trans. R. Soc. Philosophical Transactions of the Royal Society B: Biological Sciences, pii M. Norouzi, and A. Punjani, D.J. Fleet. Fast Search in Hamming Space with Multi-Index Hashing. IEEE Computer Vision and Pattern Recognition (CVPR), 2012. F. Rajabi-Alni, A. Bagheri, B. Minaei-Bidgoli. An O(n3) time algorithm for the maximum weight b-matching problem on bipartite graphs. CoRR, 2014.

Student: Tyler J. Daddio (CSE, Mathematics)

Similar presentations

Presentation on theme: "Student: Tyler J. Daddio (CSE, Mathematics)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Student: Tyler J. Daddio (CSE, Mathematics)

Similar presentations

Presentation on theme: "Student: Tyler J. Daddio (CSE, Mathematics)"— Presentation transcript:

Similar presentations

About project

Feedback