Introduction to Microarry Data Analysis - II BMI 730

Slides:



Advertisements
Similar presentations
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Advertisements

Supervised and unsupervised analysis of gene expression data Bing Zhang Department of Biomedical Informatics Vanderbilt University
From the homework: Distribution of DNA fragments generated by Micrococcal nuclease digestion mean(nucs) = bp median(nucs) = 110 bp sd(nucs+ = 17.3.
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Regression Part II One-factor ANOVA Another dummy variable coding scheme Contrasts Multiple comparisons Interactions.
Data mining with the Gene Ontology Josep Lluís Mosquera April 2005 Grup de Recerca en Estadística i Bioinformàtica GOing into Biological Meaning.
© 2010 Pearson Prentice Hall. All rights reserved Single Factor ANOVA.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.
Microarray Data Preprocessing and Clustering Analysis
Part I – MULTIVARIATE ANALYSIS
Gene Co-expression Network Analysis BMI 730 Kun Huang Department of Biomedical Informatics Ohio State University.
10 Hypothesis Testing. 10 Hypothesis Testing Statistical hypothesis testing The expression level of a gene in a given condition is measured several.
Gene Expression Data Analyses (3)
Differentially expressed genes
Gene Set Analysis 09/24/07. From individual gene to gene sets Finding a list of differentially expressed genes is only the starting point. Suppose we.
. Differentially Expressed Genes, Class Discovery & Classification.
Introduction to Microarry Data Analysis - II BMI 730 Kun Huang Department of Biomedical Informatics Ohio State University.
Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression.
GCB/CIS 535 Microarray Topics John Tobias November 8th, 2004.
Lecture 9: One Way ANOVA Between Subjects
Two Groups Too Many? Try Analysis of Variance (ANOVA)
1 Test of significance for small samples Javier Cabrera.
Lecture 12 One-way Analysis of Variance (Chapter 15.2)
GCB/CIS 535 Microarray Topics John Tobias November 15 th, 2004.
On Comparing Classifiers: Pitfalls to Avoid and Recommended Approach Published by Steven L. Salzberg Presented by Prakash Tilwani MACS 598 April 25 th.
Statistical Methods in Computer Science Hypothesis Testing II: Single-Factor Experiments Ido Dagan.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 14: Non-parametric tests Marshall University Genomics.
Gene Set Enrichment Analysis Petri Törönen petri(DOT)toronen(AT)helsinki.fi.
Different Expression Multiple Hypothesis Testing STAT115 Spring 2012.
Microarray Data Analysis Illumina Gene Expression Data Analysis Yun Lian.
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
AM Recitation 2/10/11.
1Module 2: Analyzing Gene Lists Canadian Bioinformatics Workshops
Probability Distributions and Test of Hypothesis Ka-Lok Ng Dept. of Bioinformatics Asia University.
Multiple testing in high- throughput biology Petter Mostad.
Gene Set Enrichment Analysis (GSEA)
Essential Statistics in Biology: Getting the Numbers Right
Chapter 15 Data Analysis: Testing for Significant Differences.
Jesse Gillis 1 and Paul Pavlidis 2 1. Department of Psychiatry and Centre for High-Throughput Biology University of British Columbia, Vancouver, BC Canada.
CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
ANOVA (Analysis of Variance) by Aziza Munir
Biostatistics, statistical software VII. Non-parametric tests: Wilcoxon’s signed rank test, Mann-Whitney U-test, Kruskal- Wallis test, Spearman’ rank correlation.
Microarray data analysis David A. McClellan, Ph.D. Introduction to Bioinformatics Brigham Young University Dept. Integrative Biology.
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Introduction to analysis of microarray data David Edwards.
Statistical analysis of expression data: Normalization, differential expression and multiple testing Jelle Goeman.
Introduction to Microarrays Dr. Özlem İLK & İbrahim ERKAN 2011, Ankara.
Analysis of Variance (ANOVA) Brian Healy, PhD BIO203.
MRNA Expression Experiment Measurement Unit Array Probe Gene Sequence n n n Clinical Sample Anatomy Ontology n 1 Patient 1 n Disease n n ProjectPlatform.
Statistics for Differential Expression Naomi Altman Oct. 06.
Inferential Statistics. The Logic of Inferential Statistics Makes inferences about a population from a sample Makes inferences about a population from.
Application of Class Discovery and Class Prediction Methods to Microarray Data Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics.
Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels.
Three Broad Purposes of Quantitative Research 1. Description 2. Theory Testing 3. Theory Generation.
Cluster validation Integration ICES Bioinformatics.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
Getting the story – biological model based on microarray data Once the differentially expressed genes are identified (sometimes hundreds of them), we need.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Hypothesis Testing. Suppose we believe the average systolic blood pressure of healthy adults is normally distributed with mean μ = 120 and variance σ.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Comparing ≥ 3 Groups Analysis of Biological Data/Biometrics
Presentation transcript:

Introduction to Microarry Data Analysis - II BMI 730 Kun Huang Department of Biomedical Informatics Ohio State University

Elements of Gene Expression Data Analysis Comparative study Clustering Review of Microarray Elements of Gene Expression Data Analysis Comparative study Clustering Introduction to Pathway and Gene Ontology Enrichment Analysis The difference between the microarrays that are currently available must be understood before proceeding in this tutorial. Microarrays can be generalised into two categories: cDNA (oligonucleotide) spotted arrays High-density synthetic oligonucleotide (Affymetrix-like) arrays Broadly speaking this catergorisation is based mainly on the way the arrays are manufactured and the way they are used. cDNA arrays are mostly used in 2-dye (also known as 2-channel) experiments. This means that 2 samples of RNA are taken from 2 seperate sources and cDNA is created from each by using a technique called RT-PCR (Reverse Transcriptase Polymerase Chain Reaction). Each cDNA sample is labeled with its own specific dye. Both cDNA samples are then washed over a single array, after which dye intensities on each spot is calculated using a fluorescence camera. cDNA arrays are typically less dense than Affymetrix-like arrays, containing approximately 20,000 sequences, each of which is about 60 to 80 nucleotides long. In contrast to the 2-dye approach, Affymetrix-like arrays can only be treated with a single sample (i.e. RNA from a single source). No RT-PCR is necessary. These arrays can potentially have a density of up to 100,000 sequences, each of which being no longer than 40 nucleotides.

How does two-channel microarray work? Printing process introduces errors and larger variance Comparative hybridization experiment

How does microarray work? Fabrication expense and frequency of error increases with the length of probe, therefore 25 oligonucleotide probes are employed. Problem: cross hybridization Solution: introduce mismatched probe with one position (central) different with the matched probe. The difference gives a more accurate reading.

How do we use microarray? Inference Clustering

Normalization Which normalization algorithm to use Inter-slide normalization Not just for Affymetrix arrays

Elements of Gene Expression Data Analysis Comparative study Clustering Review of Microarray Elements of Gene Expression Data Analysis Comparative study Clustering Introduction to Pathway and Gene Ontology Enrichment Analysis The difference between the microarrays that are currently available must be understood before proceeding in this tutorial. Microarrays can be generalised into two categories: cDNA (oligonucleotide) spotted arrays High-density synthetic oligonucleotide (Affymetrix-like) arrays Broadly speaking this catergorisation is based mainly on the way the arrays are manufactured and the way they are used. cDNA arrays are mostly used in 2-dye (also known as 2-channel) experiments. This means that 2 samples of RNA are taken from 2 seperate sources and cDNA is created from each by using a technique called RT-PCR (Reverse Transcriptase Polymerase Chain Reaction). Each cDNA sample is labeled with its own specific dye. Both cDNA samples are then washed over a single array, after which dye intensities on each spot is calculated using a fluorescence camera. cDNA arrays are typically less dense than Affymetrix-like arrays, containing approximately 20,000 sequences, each of which is about 60 to 80 nucleotides long. In contrast to the 2-dye approach, Affymetrix-like arrays can only be treated with a single sample (i.e. RNA from a single source). No RT-PCR is necessary. These arrays can potentially have a density of up to 100,000 sequences, each of which being no longer than 40 nucleotides.

Hypothesis Testing Two set of samples sampled from two distributions (N=2)

Hypothesis Testing Two set of samples sampled from two distributions (N=2) Hypothesis m1 and m2 are the means of the two distributions. Null hypothesis Alternative hypothesis

Student’s t-test

Student’s t-test p-value can be computed from t-value and number of freedom (related to number of samples) to give a bound on the probability for type-I error (claiming insignificant difference to be significant) assuming normal distributions.

Student’s t-test Dependent (paired) t-test

Permutation (t-)test T-test relies on the parametric distribution assumption (normal distribution). Permutation tests do not depend on such an assumption. Examples include the permutation t-test and Wilcoxon rank-sum test. Perform regular t-test to obtain t-value t0. The randomly permute the N1+N2 samples and designate the first N1 as group 1 with the rest being group 2. Perform t-test again and record the t-value t. For all possible permutations, count how many t-values are larger than t0 and write down the number K0.

Multiple Classes (N>2) F-test The null hypothesis is that the distribution of gene expression is the same for all classes. The alternative hypothesis is that at least one of the classes has a distribution that is different from the other classes. Which class is different cannot be determined in F-test (ANOVA). It can only be identified post hoc.

Example GEO Dataset Subgroup Effect

Gene Discovery and Multiple T-tests Controlling False Positives p-value cutoff = 0.05 (probability for false positive - type-I error) 22,000 probesets False discovery 22,000X0.05=1,100 Focus on the 1,100 genes in the second speciman. False discovery 1,100X0.05 = 55

Gene Discovery and Multiple T-tests Controlling False Positives State the set of genes explicitly before the experiments Problem: not always feasible, defeat the purpose of large scale screening, could miss important discovery Statistical tests to control the false positives

Gene Discovery and Multiple T-tests Controlling False Positives Statistical tests to control the false positives Controlling for no false positives (very stringent, e.g. Bonferroni methods) Controlling the number of false positives ( Controlling the proportion of false positives Note that in the screening stage, false positive is better than false negative as the later means missing of possibly important discovery.

Gene Discovery and Multiple T-tests Controlling False Positives Statistical tests to control the false positives Controlling for no false positives (very stringent) Bonferroni methods and multivariate permutation methods Bonferroni inequality Area of union < Sum of areas

Gene Discovery and Multiple T-tests Bonferroni methods Bonferroni adjustment If Ei is the event for false positive discovery of gene I, conservative speaking, it is almost guaranteed to have false positive for K > 19. So change the p-value cutoff line from p0 to p0/K. This is called Bonferroni adjustment. If K=20, p0=0.05, we call a gene i is significantly differentially expressed if pi<0.0025.

Gene Discovery and Multiple T-tests Bonferroni methods Bonferroni adjustment Too conservative. Excessive stringency leads to increased false negative (type II error). Has problem with metaanalysis. Variations: sequential Bonferroni test (Holm-Bonferroni test) Sort the K p-values from small to large to get p1p2…pK. So change the p-value cutoff line for the ith p-value to be p0/(K-i+1) (ie, p1p0/K, p2p0/(K-1), …, pKp0. If pjp0/(K-j+1) for all ji but pi+1>p0/(K-i+1+1), reject all the alternative hypothesis from i+1 to K, but keep the hypothesis from 1 to i.

Gene Discovery and Multiple T-tests Controlling False Positives Statistical tests to control the false positives Controlling the number of false positives Simple approach – choose a cutoff for p-values that are lower than the usual 0.05 but higher than that from Bonferroni adjustment More sophisticated way: a version of multivariate permutation.

Gene Discovery and Multiple T-tests Controlling False Positives Statistical tests to control the false positives Controlling the proportion of false positives Let g be the portion (percentage) of false positive in the total discovered genes. False positive Total positive pD is the choice. There are other ways for estimating false positives. Details can be found in Tusher et. al. PNAS 98:5116-5121.

Elements of Gene Expression Data Analysis Comparative study Clustering Review of Microarray Elements of Gene Expression Data Analysis Comparative study Clustering Introduction to Pathway and Gene Ontology Enrichment Analysis The difference between the microarrays that are currently available must be understood before proceeding in this tutorial. Microarrays can be generalised into two categories: cDNA (oligonucleotide) spotted arrays High-density synthetic oligonucleotide (Affymetrix-like) arrays Broadly speaking this catergorisation is based mainly on the way the arrays are manufactured and the way they are used. cDNA arrays are mostly used in 2-dye (also known as 2-channel) experiments. This means that 2 samples of RNA are taken from 2 seperate sources and cDNA is created from each by using a technique called RT-PCR (Reverse Transcriptase Polymerase Chain Reaction). Each cDNA sample is labeled with its own specific dye. Both cDNA samples are then washed over a single array, after which dye intensities on each spot is calculated using a fluorescence camera. cDNA arrays are typically less dense than Affymetrix-like arrays, containing approximately 20,000 sequences, each of which is about 60 to 80 nucleotides long. In contrast to the 2-dye approach, Affymetrix-like arrays can only be treated with a single sample (i.e. RNA from a single source). No RT-PCR is necessary. These arrays can potentially have a density of up to 100,000 sequences, each of which being no longer than 40 nucleotides.

How do we process microarray data (clustering)? Unsupervised Learning – Hierarchical Clustering

Distance Measure (Metric?) What do you mean by “similar”? Euclidean Uncentered correlation Pearson correlation

Distance Metric Euclidean dE(Lip1, Ap1s1) = 12883 102123_at Lip1 1596.000 2040.900 1277.000 4090.500 1357.600 1039.200 1387.300 3189.000 1321.300 2164.400 868.600 185.300 266.400 2527.800 160552_at Ap1s1 4144.400 3986.900 3083.100 6105.900 3245.800 4468.400 7295.000 5410.900 3162.100 4100.900 4603.200 6066.200 5505.800 5702.700 dE(Lip1, Ap1s1) = 12883

Distance Metric Pearson Correlation Ranges from 1 to -1. r = 1 r = -1

How do we process microarray data (clustering)? Unsupervised Learning – Hierarchical Clustering Single linkage: The linking distance is the minimum distance between two clusters.

How do we process microarray data (clustering)? Unsupervised Learning – Hierarchical Clustering Complete linkage: The linking distance is the maximum distance between two clusters.

How do we process microarray data (clustering)? Unsupervised Learning – Hierarchical Clustering Average linkage/UPGMA: The linking distance is the average of all pair-wise distances between members of the two clusters. Since all genes and samples carry equal weight, the linkage is an Unweighted Pair Group Method with Arithmetic Means (UPGMA).

Elements of Gene Expression Data Analysis Comparative study Clustering Review of Microarray Elements of Gene Expression Data Analysis Comparative study Clustering Introduction to Pathway and Gene Ontology Enrichment Analysis The difference between the microarrays that are currently available must be understood before proceeding in this tutorial. Microarrays can be generalised into two categories: cDNA (oligonucleotide) spotted arrays High-density synthetic oligonucleotide (Affymetrix-like) arrays Broadly speaking this catergorisation is based mainly on the way the arrays are manufactured and the way they are used. cDNA arrays are mostly used in 2-dye (also known as 2-channel) experiments. This means that 2 samples of RNA are taken from 2 seperate sources and cDNA is created from each by using a technique called RT-PCR (Reverse Transcriptase Polymerase Chain Reaction). Each cDNA sample is labeled with its own specific dye. Both cDNA samples are then washed over a single array, after which dye intensities on each spot is calculated using a fluorescence camera. cDNA arrays are typically less dense than Affymetrix-like arrays, containing approximately 20,000 sequences, each of which is about 60 to 80 nucleotides long. In contrast to the 2-dye approach, Affymetrix-like arrays can only be treated with a single sample (i.e. RNA from a single source). No RT-PCR is necessary. These arrays can potentially have a density of up to 100,000 sequences, each of which being no longer than 40 nucleotides.

Where do I get the gene list? Comparative study e.g., microarray experiments between two types of samples or two disease states (can also be from RT-PCA, proteomics, …) Clustering / classification of genes e.g., co-expressed genes Homologue analysis e.g., genes from BLAST Other sources The difference between the microarrays that are currently available must be understood before proceeding in this tutorial. Microarrays can be generalised into two categories: cDNA (oligonucleotide) spotted arrays High-density synthetic oligonucleotide (Affymetrix-like) arrays Broadly speaking this catergorisation is based mainly on the way the arrays are manufactured and the way they are used. cDNA arrays are mostly used in 2-dye (also known as 2-channel) experiments. This means that 2 samples of RNA are taken from 2 seperate sources and cDNA is created from each by using a technique called RT-PCR (Reverse Transcriptase Polymerase Chain Reaction). Each cDNA sample is labeled with its own specific dye. Both cDNA samples are then washed over a single array, after which dye intensities on each spot is calculated using a fluorescence camera. cDNA arrays are typically less dense than Affymetrix-like arrays, containing approximately 20,000 sequences, each of which is about 60 to 80 nucleotides long. In contrast to the 2-dye approach, Affymetrix-like arrays can only be treated with a single sample (i.e. RNA from a single source). No RT-PCR is necessary. These arrays can potentially have a density of up to 100,000 sequences, each of which being no longer than 40 nucleotides.

GO enrichment analysis What do I do with the gene list – enrichment analysis? Find commonality among the gene Common molecular functions (GO) Common biological processes (GO) Common cellular components (GO) Common pathways Interact with common genes Common sequences / molecular structures Regulated by common Transcription Factors Targeted by common microRNAs Involved in the same disease … Generate new hypothesis based on the commonality GO enrichment analysis The difference between the microarrays that are currently available must be understood before proceeding in this tutorial. Microarrays can be generalised into two categories: cDNA (oligonucleotide) spotted arrays High-density synthetic oligonucleotide (Affymetrix-like) arrays Broadly speaking this catergorisation is based mainly on the way the arrays are manufactured and the way they are used. cDNA arrays are mostly used in 2-dye (also known as 2-channel) experiments. This means that 2 samples of RNA are taken from 2 seperate sources and cDNA is created from each by using a technique called RT-PCR (Reverse Transcriptase Polymerase Chain Reaction). Each cDNA sample is labeled with its own specific dye. Both cDNA samples are then washed over a single array, after which dye intensities on each spot is calculated using a fluorescence camera. cDNA arrays are typically less dense than Affymetrix-like arrays, containing approximately 20,000 sequences, each of which is about 60 to 80 nucleotides long. In contrast to the 2-dye approach, Affymetrix-like arrays can only be treated with a single sample (i.e. RNA from a single source). No RT-PCR is necessary. These arrays can potentially have a density of up to 100,000 sequences, each of which being no longer than 40 nucleotides.

How do I find commonality from my gene list? Using a priori knowledge (e.g., gene ontology, pathway, annotation, etc.) Fisher’s exact test, hypergeometric test, Bayesian-based methods, etc. Good news – most of the time you can use software to do it How significant is the intersection? The difference between the microarrays that are currently available must be understood before proceeding in this tutorial. Microarrays can be generalised into two categories: cDNA (oligonucleotide) spotted arrays High-density synthetic oligonucleotide (Affymetrix-like) arrays Broadly speaking this catergorisation is based mainly on the way the arrays are manufactured and the way they are used. cDNA arrays are mostly used in 2-dye (also known as 2-channel) experiments. This means that 2 samples of RNA are taken from 2 seperate sources and cDNA is created from each by using a technique called RT-PCR (Reverse Transcriptase Polymerase Chain Reaction). Each cDNA sample is labeled with its own specific dye. Both cDNA samples are then washed over a single array, after which dye intensities on each spot is calculated using a fluorescence camera. cDNA arrays are typically less dense than Affymetrix-like arrays, containing approximately 20,000 sequences, each of which is about 60 to 80 nucleotides long. In contrast to the 2-dye approach, Affymetrix-like arrays can only be treated with a single sample (i.e. RNA from a single source). No RT-PCR is necessary. These arrays can potentially have a density of up to 100,000 sequences, each of which being no longer than 40 nucleotides.

What softwares are available? DAVID (http://david.abcc.ncifcrf.gov/) TOPPGene Cytoscape GOTerm BiNGO GSEA GenMapp (Free) Pathway Architect (Commercial) Pathway Studio (Commercial) Ingenuity Pathway Analysis (Commercial) Manually curated On-demand computation The difference between the microarrays that are currently available must be understood before proceeding in this tutorial. Microarrays can be generalised into two categories: cDNA (oligonucleotide) spotted arrays High-density synthetic oligonucleotide (Affymetrix-like) arrays Broadly speaking this catergorisation is based mainly on the way the arrays are manufactured and the way they are used. cDNA arrays are mostly used in 2-dye (also known as 2-channel) experiments. This means that 2 samples of RNA are taken from 2 seperate sources and cDNA is created from each by using a technique called RT-PCR (Reverse Transcriptase Polymerase Chain Reaction). Each cDNA sample is labeled with its own specific dye. Both cDNA samples are then washed over a single array, after which dye intensities on each spot is calculated using a fluorescence camera. cDNA arrays are typically less dense than Affymetrix-like arrays, containing approximately 20,000 sequences, each of which is about 60 to 80 nucleotides long. In contrast to the 2-dye approach, Affymetrix-like arrays can only be treated with a single sample (i.e. RNA from a single source). No RT-PCR is necessary. These arrays can potentially have a density of up to 100,000 sequences, each of which being no longer than 40 nucleotides.

Genes Functions, pathways and networks

Pathway – What’s out there? 240

Ingenuity Pathway Analysis (IPA)

Demo DAVID (http://david.abcc.ncifcrf.gov/) TOPPGene Ingenuity Pathway Analysis Gene List1: AURKA BIRC5 ASPM BUB1 CCNA2 CCNB2 CDC2 ACOT7 CDC20 CDC45L CDCA8 CENPE CENPF CEP55 CKS2 CHEK1 DKFZp762E1312 DLG7 DNA2L E2F8 EPR1 FANCI HMMR KIF4A LMNB1 MAD2L1 MELK NCAPG RANBP1RRM2 SPAG5 STIL TACC3 TPX2 TRIP13 TTK UBE2C UBE2S Gene List2: AI445650 CD2 CCR5 CD247 CD27 CD38 CD3D CD3E CD3G CD79A CD8A CRTAM CST7 CTSW CXCR6 DENND2D FAIM3 FMNL1 GZMA GZMB GZMH GZMK HLA-DOB IL21R IL2RB IL2RG IL7R KLRK1 LAG3 LAT LAX1 MIRN650 NKG7 NM_014792 PTPN7 RASGRP1 RUNX3 SELPLG SEPT6 SERPINB9 SH2D1A SIRPG SLAMF7 SOCS1 TBX21 TRBC1 WAS XCL1 CCL4 XCL2 ZAP70