Mining the functional genomics data III Data integration: Gene Ontology, PPI, URLMAP Jaak Vilo Havana, Cuba, 21.11.2003.

Slides:



Advertisements
Similar presentations
ArrayExpress A public database for microarray based gene expression data European Bioinformatics Institute EMBL-EBI Alvis.
Advertisements

AP STUDY SESSION 2.
1
STATISTICS INTERVAL ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
David Burdett May 11, 2004 Package Binding for WS CDL.
1 Gene Ontology and Functional Annotation Donghui Li ASPB Plant Biology, June 29, 2008, Merida.
Annotation of Gene Function …and how thats useful to you.
CALENDAR.
Chapter 7 Sampling and Sampling Distributions
1 00/XXXX © Crown copyright Carol Roadnight, Peter Clark Met Office, JCMM Halliwell Representing convection in convective scale NWP models : An idealised.
Biostatistics Unit 5 Samples Needs to be completed. 12/24/13.
Break Time Remaining 10:00.
PP Test Review Sections 6-1 to 6-6
1 IMDS Tutorial Integrated Microarray Database System.
Molecular Biomedical Informatics Machine Learning and Bioinformatics Machine Learning & Bioinformatics 1.
The genetic dissection of complex traits
Lexical Analysis Arial Font Family.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
GIS Lecture 8 Spatial Data Processing.
Graphs, representation, isomorphism, connectivity
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
Overview of Genevestigator
1 Using Bayesian Network for combining classifiers Leonardo Nogueira Matos Departamento de Computação Universidade Federal de Sergipe.
Applications of GO. Goals of Gene Ontology Project.
Center on Knowledge Translation for Disability and Rehabilitation Research Information Retrieval for International Disability and Rehabilitation Research.
Systems Analysis and Design in a Changing World, Fifth Edition
DTU Informatics Introduction to Medical Image Analysis Rasmus R. Paulsen DTU Informatics TexPoint fonts.
1 Let’s Recapitulate. 2 Regular Languages DFAs NFAs Regular Expressions Regular Grammars.
Converting a Fraction to %
ANSC644 Bioinformatics-Database Mining 1 ANSC644 Bioinformatics §Carl J. Schmidt §051 Townsend Hall §
Copyright © 2013 Pearson Education, Inc. All rights reserved Chapter 11 Simple Linear Regression.
Select a time to count down from the clock above
9. Two Functions of Two Random Variables
Bioinformatics Programming 1 EE, NCKU Tien-Hao Chang (Darby Chang)
Asking translational research questions using ontology enrichment analysis Nigam Shah
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
. Inferring Subnetworks from Perturbed Expression Profiles D. Pe’er A. Regev G. Elidan N. Friedman.
Gene Ontology John Pinney
Gene function analysis Stem Cell Network Microarray Course, Unit 5 May 2007.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
COG and GO tutorial.
Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break 14:45 – 15:15Regulatory pathways lecture 15:15 – 15:45Exercise.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Large-scale organization of metabolic networks Jeong et al. CS 466 Saurabh Sinha.
Using The Gene Ontology: Gene Product Annotation.
Networks and Interactions Boo Virk v1.0.
Reconstructing gene networks Analysing the properties of gene networks Gene Networks Using gene expression data to reconstruct gene networks.
Monday, November 8, 2:30:07 PM  Ontology is the philosophical study of the nature of being, existence or reality as such, as well as the basic categories.
Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.
Functional Annotation and Functional Enrichment. Annotation Structural Annotation – defining the boundaries of features of interest (coding regions, regulatory.
1 Gene function annotation. 2 Outline  Functional annotation  Controlled vocabularies  Functional annotation at TAIR  Resources and tools at TAIR.
Rice Proteins Data acquisition Curation Resources Development and integration of controlled vocabulary Gene Ontology Trait Ontology Plant Ontology
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Biological Networks & Systems Anne R. Haake Rhys Price Jones.
Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009.
Introduction to biological molecular networks
Proteomics, the next step What does each protein do? Where is each protein located? What does each protein interact with, if anything? What role does it.
1 Annotation EPP 245/298 Statistical Analysis of Laboratory Data.
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
1 Genomics Advances in 1990 ’ s Gene –Expressed sequence tag (EST) –Sequence database Information –Public accessible –Browser-based, user-friendly bioinformatics.
Annotating with GO: an overview
Department of Genetics • Stanford University School of Medicine
Building and Analyzing Genome-Wide Gene Disruption Networks
What is an Ontology An ontology is a set of terms, relationships and definitions that capture the knowledge of a certain domain. (common ontology ≠ common.
Gene expression analysis
Schedule for the Afternoon
Volume 9, Issue 5, Pages (May 2002)
Presentation transcript:

Mining the functional genomics data III Data integration: Gene Ontology, PPI, URLMAP Jaak Vilo Havana, Cuba,

EPCLUST Expression data GENOMES sequence, function, annotation SPEXS discover patterns URLMAP provide links Components of Expression Profiler Expression data External data, tools pathways, function, etc. PATMATCH visualise patterns EP:GO GeneOntology EP:PPI Prot-Prot ia. SEQLOGO

Expression Profiler: EPCLUST DATASELECT/ FILTER FOLDER ANALYZE A CLUSTER URLMAP GeneOntology Pathways Databases SPEXS Other tools

URLMAP Given a cluster of genes - many web based tools and databases to consult/follow up. How to link to them? How to manage many links, many tools? Answer: Centralize that linking

URLMAP - no need to cut & paste KEGG: SRS/InterPro Generates all links/forms dynamically Maintain links in one place Handle renaming of gene ids by synonyms Allow domain-specific link pages

A Simple Metabolic Pathway Shoshanna Wodak, Jacques van Helden

Links for each item type Yeast S. cerevisiae gene ID-s (ORFname, SP id, SGD ID, …) Pattern collections, e.g. substrings to profile generation by SEQLOGO Keyword searches from web based search engines

Management of links Hierarchies of link collections One can point to any (sub)hierarchy directly LINK = URL, title, form parameters modifications/code DB lookups for synonyms

Screen scraping – doable with a little perl programming g1,g2,g356 g1 g2 g356 Report

Gene Ontology TM GO is a systematic effort for data annotation Three independent ontologies Molecular Function Biological Process Cellular component How to integrate that into analysis tools?

DAG Structure Annotate to any level within DAG mitosis S.c. NNF1 mitotic chromosome condensation S.c. BRN1, D.m. barren

Database object: gene or gene product GO term ID Reference publication or computational method Evidence supporting annotation GO Annotation: Data

IDA - Inferred from Direct Assay IMP - Inferred from Mutant Phenotype IGI - Inferred from Genetic Interaction IPI - Inferred from Physical Interaction IEP - Inferred from Expression Pattern GO Evidence Codes TAS - Traceable Author Statement NAS - Non-traceable Author Statement IC - Inferred by Curator ISS - Inferred from Sequence or structural Similarity IEA - Inferred from Electronic Annotation ND - Not Determined

IDA - Inferred from Direct Assay IMP - Inferred from Mutant Phenotype IGI - Inferred from Genetic Interaction IPI - Inferred from Physical Interaction IEP - Inferred from Expression Pattern GO Evidence Codes TAS - Traceable Author Statement NAS - Non-traceable Author Statement IC - Inferred by Curator ISS - Inferred from Sequence or structural Similarity IEA - Inferred from Electronic Annotation ND - Not Determined From reviews or introductions From primary literature automated

Example (GoMiner)

EP:GO tool for GeneOntology Browse Search by keywords; EC, term. etc.. Get associated genes Submit associated genes to URLMAP Annotate gene clusters using GO terms

URLMAP => Look up expression data EP:GO EPCLUST

Annotate Clusters (EP:GO) A,D B,C E F,G,H J I F,G B,E B,A F,G,I B,E,F,I

Set overlap GO term CLUSTER A: |G C| / min( |G|, |C|) B: P( choose |C| from N with |G|, observe |G C|+) N genes G C

Annotation of clusters GO: Process: ribosome biogenesis and assembly (+2:15) (depth=7) [sgd:2:187] GO: : 47 from cluster (size 98) vs 187 in this class (including subclasses) GO: Process: rRNA processing (+3:3) (depth=8) [sgd:50:126] GO: : 35 from cluster (size 98) vs 126 in this class (including subclasses) GO: Process: transcription from Pol I promoter (+6:14) (depth=8) [sgd:23:155] GO: : 38 from cluster (size 98) vs 155 in this class (including subclasses) GO: Component: nucleolus (+10:17) (depth=6) [sgd:154:210] GO: : 45 from cluster (size 98) vs 210 in this class (including subclasses) GO: Function: snoRNA binding (depth=6) [sgd:23:23] GO: : 17 from cluster (size 98) vs 23 in this class (including subclasses) GO: Process: processing of 20S pre-rRNA (depth=9) [sgd:33:33] GO: : 18 from cluster (size 98) vs 33 in this class (including subclasses) GO: Component: small nucleolar ribonucleoprotein complex (depth=6) [sgd:30:30] GO: : 16 from cluster (size 98) vs 30 in this class (including subclasses) GO: Process: RNA processing (+7:52) (depth=7) [sgd:7:370] GO: : 40 from cluster (size 98) vs 370 in this class (including subclasses) GO: UL2187 GO: UL50126 GO: UL23155 GO: UL GO: UL23 GO: UL33 GO: UL30 GO: UL7370 …

>YAL036C chromo=1 coord=( (C)) start=-600 end=+2 seq=( ) TGTTCTTTCTTCTTCTGCTTCTCCTTTTCCTTTTTTTCCTTCTCCTTTTCCTTCTTGGACTTTAGTATAGGCTTACCATCCTTCTTCTCTTCAATAACCTTCTTTTCTTG CTTCTTCTTCGATTGCTTCAAAGTAGACATGAAGTCGCCTTCAATGGCCTCAGCACCTTCAGCACTTGCACTTGCTTCTCTGGAAGTGTCATCTGCACCTGCGCTGCTTT CTGGATTTGGAGTTGGCGTGGCACTGATTTCTTCGTTCTGGGCGGCGTCTTCTTCGAATTCCTCATCCCAGTAGTTCTGTTGGTTCTTTTTACTCTTTTTCGCCATCTTT CACTTATCTGATGTTCCTGATTGCCCTTCTTATCCCCTCAAAGTTCACCTTTGCCACTTATTCTAGTGCAAGATCTCTTGCTTTCAATGGGCTTAAAGCTTGAAAAATTT TTTCACATCACAAGCGACGAGGGCCCGTTTTTTTCATCGATGAGCTATAAGAGTTTTCCACTTTTAAGATGGGATATTACGGTGTGATGAGGGCGCAATGATAGGAAGTG TTTGAAGCTAGATGCAGTAGGTGCAAGCGTAGAGTTGTTGATTGAGCAAA_ATG_ >YAL025C chromo=1 coord=( (C)) start=-600 end=+2 seq=( ) CTTAGAAGATAAAGTAGTGAATTACAATAAATTCGATACGAACGTTCAAATAGTCAAGAATTTCATTCAAAGGGTTCAATGGTCCAAGTTTTACACTTTCAAAGTTAACC ACGAATTGCTGAGTAAGTGTGTTTATATTAGCACATTAACACAAGAAGAGATTAATGAACTATCCACATGAGGTATTGTGCCACTTTCCTCCAGTTCCCAAATTCCTCTT GTAAAAAACTTTGCATATAAAATATACAGATGGAGCATATATAGATGGAGCATACATACATGTTTTTTTTTTTTTAAAAACATGGACTCGAACAGAATAAAAGAATTTAT AATGATAGATAATGCATACTTCAATAAGAGAGAATACTTGTTTTTAAATGAGAATTGCTTTCATTAGCTCATTATGTTCAGATTATCAAAATGCAGTAGGGTAATAAACC TTTTTTTTTTTTTTTTTTTTTTTTGAAAAATTTTCCGATGAGCTTTTGAAAAAAAATGAAAAAGTGATTGGTATAGAGGCAGATATTGCATTGCTTAGTTCTTTCTTTTG ACAGTGTTCTCTTCAGTACATAACTACAACGGTTAGAATACAACGAGGAT_ATG_... >YBR084W chromo=2 coord=( ) start=-600 end=+2 seq=( ) CCATGTATCCAAGACCTGCTGAAGATGCTTACAATGCCAATTATATTCAAGGTCTGCCCCAGTACCAAACATCTTATTTTTCGCAGCTGTTATTATCATCACCCCAGCAT TACGAACATTCTCCACATCAAAGGAACTTTACGCCATCCAACCAATCGCATGGGAACTTTTATTAAATGTCTACATACATACATACATCTCGTACATAAATACGCATACG TATCTTCGTAGTAAGAACCGTCACAGATATGATTGAGCACGGTACAATTATGTATTAGTCAAACATTACCAGTTCTCGAACAAAACCAAAGCTACTCCTGCAACACTCTT CTATCGCACATGTATGGTTCTTATTGTTTCCCGAGTTCTTTTTTACTGACGCGCCAGAACGAGTAAGAAAGTTCTCTAGCGCCATGCTGAAATTTTTTTCACTTCAACGG ACAGCGATTTTTTTTCTTTTTCCTCCGAAATAATGTTGCAGCGGTTCTCGATGCCTCAAGAATTGCAGAAGTAAACCAGCCAATACACATCAAAAAACAACTTTCATTAC TGTGATTCTCTCAGTCTGTTCATTTGTCAGATATTTAAGGCTAAAAGGAA_ATG_ 101 Sequences relative to ORF start GATGAG.T 1:52/70 2:453/508 R: BP: e-33 G.GATGAG.T 1:39/49 2:193/222 R: BP: e-33 AAAATTTT 1:63/77 2:833/911 R: BP: e-32 TGAAAA.TTT 1:45/53 2:333/350 R: BP: e-31 TG.AAA.TTT 1:53/61 2:538/570 R: BP: e-31 TG.AAA.TTTT 1:40/43 2:254/260 R: BP: e-30 TGAAA..TTT 1:54/65 2:608/645 R: BP:1.0887e GATGAG.T TGAAA..TTT YGR128C + 100

EP:PPI Protein-protein interaction There are high-throughput technologies for identifying hypothetical protein-protein interactions Which ones of these are more likely to be true? Can these predictions help predicting gene function?

PPI pairs

We have expression data

Cluster

Trust those within the same cluster

PPI are enriched within clusters Ge, Liu, Church, Vidal: Nature Genetics Nov. 2001

Protein-protein interactions: which to trust more? Answer: Use the distance measure alone

Kemmeren et.al. Randomized expression data Yeast 2-hybrid studies Known (literature) PPI MPK1 YLR350w SNF4 YCL046W SNF7 YGR122W Molecular Cell, Vol. 9, 1133–1143, May, 2002

d 0 Interacting pairs of proteins A and B; C and D Which would you trust? A B 1 0 d C D

EP:PPI – combine PPI and expression

Results Confidence in 973 out of 5342 putative two-hybrid interactions from S. cerevisiae is increased. Besides verification, integration of expression and interaction data is employed to provide functional annotation for over 300 previously uncharacterized genes. The robustness of these approaches is demonstrated by experiments that test the in silico predictions made. This study shows how integration improves the utility of different types of functional genomic data and how well this contributes to functional annotation.

promoter coding DNA GENE 1GENE 2GENE 3GENE 4 DNA transcription factors G1 G2G4 G3 Gene regulation by transcription factors

Networks Graphical models Directed labelled graph Nodes genes Arcs/Edges relationships Labels types of relationships

Start node (gene) End node (gene) Connection weight, w Graph drawing AB W

Different interpretation of arcs Edges can have different meanings, hence different networks Binding site for A is in front of B Proteins A and B interact Deletion of gene A affects expression of B (is somewhere in regulation cascade) Literature mentions genes together

promoter coding DNA GENE 1GENE 2GENE 3GENE 4 DNA transcription factors G1 G2G4 G3 Gene regulation by transcription factors

A B C gene B gene C gene D gene A AD BC Deletion mutants (gene knockouts)

Hughes, T. R. et al: Functional Discovery via a Compendium of Expression Profiles, Cell 102 (2000),

Green arrows - upregulation Red arrows - downregulation Thickness of arrow represents certainty of direction (up/down)

A complete graph

Features/distributions that do not depend on discretisation thresholds Visual inspection, biological interpretation General statistics and features of the graphs Indegree/Outdegree Complexity of the networks What is the modularity? How many components? Deletion of hot-spots, does it break the net?

Filter choose a list of genes (MATING, marked in red) filter for these genes plus neighbouring genes from the graph Mutation network =4

Mutation network =2

lacZ... PromoterOperator Repressor lacIPromoter Activator Glucose LactoseGlucoseGalactose + Galactosidase Lac-Operon Thomas Schlitt

Gene regulatory networks What formalisms to use to describe them? When does model correspond to biological reality? How to simulate models on computer Is it possible to verify models by experiments? How to restore networks from raw data without knowing the structure or parameters?

Most genes have only a few incoming / outgoing edges, but some have high numbers (>500) number of outgoing edges count... Number of incoming/outgoing edges

ARG5,6(108,28) SST2(60,25) TEC1 HPT1 GCN4 ERG3(164,15) GAS1 FUS3 ERG28 QCR2 YER083C GLN3 SPF1 MRT4 CLB2 YHL029C Rank of outdegree Rank of indegree

High outdegreeHigh indegree Regulation Metabolism

Is there one big dominant connected component and possibly a number of small components, or several components of comparable sizes? Can the network be broken down in several components of comparable size by removing nodes of high degree (i.e., nodes with many incoming or outgoing edges)? Network modularity

network modularity Number of connected components in the networks

network modularity

componentfull network 1% removed 5% removed 10% removed 2.0largest second total largest second total largest second total Number of connected components in the networks network modularity

Wagner, Genome Research 2002 – there exist many independent modules Featherstone and Broadie, Bioessays there is only one giant module All depends on the definition of the module Modularity other opinions

Gene disruption network for Saccharomyces cerevisiae

a closer look

Filter choose a list of genes (MATING, marked in red) filter for these genes plus neighbouring genes from the graph Mutation network =4

This subnetwork is the result of filtering the full network at =4.0 for the core set marked in red and their next neighbours (red arcs: downregulation, green arcs: upregulation). Mating subnetwork

This subnetwork is the result of filtering the full network at =2.0 for the core set marked in red and their next neighbours (red arcs: down- regulation, green arcs: upregulation). Mating subnetwork

more information than randomised networks no optimal powerlaw distribution of arcs no obvious modules local networks make sense Conclusion

lacZ... PromoterOperator Repressor lacIPromoter Activator Glucose LactoseGlucoseGalactose + Galactosidase Lac-Operon Thomas Schlitt

A gene network(?) b1b1 b2b2 b3b3 F1F1 F2F2 r1r1 r2r2

Of transcription factors

Of transcription factors and KOs

Hughes, T. R. et al: Functional Discovery via a Compendium of Expression Profiles, Cell 102 (2000),

All genes Effectual set and regulation set All genes Transcription factors Disrupted genes t Regulation set of t h Effectual set of h

All genes Effectual set and regulation set All genes Transcription factors Disrupted genes g Regulation set of t Effectual set of h

How to estimate that the overlap is more than expected by random? G R E R E We assume that the elements of the set E are marked, and pick the set of size |R| at random. Then the size x=| R E| of the intersection are distributed according to hypergeometric distribution. The probability of observing an intersection of size k or larger can be computed according to formula:

Data Disrupted genes – 263 disrupted genes excluding drug treatments and haploid states (Hughes et al) Transcription factor binding sites – 356 binding sites, from these 37 experimentally proved (Pilpel et al, 2001)

Disrupted TF Only 5 transcription factors from our set (of known binding sites) were disrupted on the experiments – mbp1, yap1, yaf1, swi5, gcn4 For three of them – mbp1, yap1, gcn4 –the regulation and effectual sets were highly correlating yaf1 is activated with oleate, while in oleate free environment Yaf1 (alias OAF1) disruption does not have significant effect swi5 affects only haploid state, while we use only diploid

Effectual sets correlating with other TF binding sites From 37 of the experimentally proven binding sites, 20 correlate with one or more effectual sets If the disrupted gene correlate with a regulation set of a different gene, the correlation should be explained

Possible explanations why disruption of gene A may correlate with regulation set of a different gene (TF) T: T belongs to the disruption set of A (cascade)

Gene regulation cascade

Possible explanations why disruption of gene A may correlate with regulation set of a different gene (TF) T: T belongs to the disruption set of A (cascade) T is regulated by A (transcription or translation) or by a gene on the cascade of A T is modified (e.g., phosphorylated) by A or a cascade of A T and A belongs to the same protein complex A and T are functionally related

Binding site/disruption correlation summary

Conclusion Most of the binding site/disruption set correlations can be explained via Regulation cascades Protein complexes (K. Palin et al, to appear in ECCB 2002, special issue of Bioinformatics)

or SAME SYMPTOMS SAME DRUG RESPONSE VARIATION… SNP ACCTG A CGTGGACCTG T CGTGG PHARMACOGENETICS = NEW OPPORTUNITIES SNP = Single nucleotide polymorphisms, 0.1% = 3million

SNPs make us unique ~0.1%, Goal: Associate SNPs with diseases A C G T G A C G T A - AA C T

Genotyping: select few Measure Goal: Associate SNPs with diseases, i.e. identify areas of interest

Association analysis: p Identifies MANY, if not all contributing genes p Links genes to disease pathways for optimal target selection FROM DISEASE GENES TO DRUG TARGETS

Internet GP DNA, Plasma, storage Data + Analysis = Value LIMS Informed consent Personal data Unique code Genotypes Medical information EGV: Process of data collection and handling SNPs

Bioinformatics: Where does IT stand? Data modelling, storage, access Inference from data Hypotheses generation and testing Allow novel types of questions to be asked by providing analysis methods that are able to cope with all the information that is available today

Compute Infrastructure

Bioinformatics: Challenges Knowledge representation, data semantics Data size and its speed of growth New/emerging data collection technologies Integration of different data types Discovery of useful knowledge Modeling living systems as a whole Improved health care products Medical informatics – bringing the knowledge to doctors bench

References for this talk Jaak Vilo, Misha Kapushesky, Patrick Kemmeren, Ugis Sarkans, Alvis Brazma. Expression Profiler. In Parmigiani,G., Garrett,E.S., Irizarry,R. and Zeger,S.L. (eds), The Analysis of Gene Expression Data: Methods and Software, Springer Verlag, New York, NY. Patrick Kemmeren, Nynke L. van Berkum, Jaak Vilo, Theo Bijma, Rogier Donders, Alvis Brazma, and Frank C.P. Holstege Protein Interaction Verification and Functional Annotation by Integrated Analysis of Genome-Scale Data Molecular Cell 2002, May 24; 9(5) pp Johan Rung, Thomas Schlitt, Alvis Brazma, Karlis Freivalds, Jaak Vilo Building and analysing genome-wide gene disruption networks Bioinformatics 2002 Oct;18 Suppl 2:S European Conference on Computational Biology (ECCB 2002) Kimmo Palin, Esko Ukkonen, Alvis Brazma, Jaak Vilo Correlating gene promoters and expression in gene disruption experiments Bioinformatics 2002 Oct;18 Suppl 2:S ;European Conference on Computational Biology (ECCB, 2002)

Acknowledgements Alvis Brazma Patrick Kemmeren, EBI, UMC Utrecht Frank Holstege, UMC Utrecht Thomas Schlitt, Johan Rung EBI Kimmo Palin, Esko Ukkonen, U. Helsinki + the rest of the EBI microarray team