Presentation is loading. Please wait.

Presentation is loading. Please wait.

Integrative data mining and visualization of genome-wide SNP profiles in childhood acute lymphoblastic leukaemia. Ahmad Aloqaily Faculty of IT University.

Similar presentations


Presentation on theme: "Integrative data mining and visualization of genome-wide SNP profiles in childhood acute lymphoblastic leukaemia. Ahmad Aloqaily Faculty of IT University."— Presentation transcript:

1 Integrative data mining and visualization of genome-wide SNP profiles in childhood acute lymphoblastic leukaemia. Ahmad Aloqaily Faculty of IT University of Technology, Sydney

2 2 Introduction Data: Biomedical data Case study: Acute Lymphoblastic Leukaemia (ALL) ALL is a heterogenous disease More than clinical data required to find the best treatment protocol.

3 3 Data Clinical data Patient outcome data Gene expression profile Domain ontology data Proteomics data Single Nucleotide Polymorphism (SNP)

4 4 Data Clinical data, such as Age, sex, height, weight, white blood cell count, risk category (whether the child died or not) etc…. Gene expression data There are around 10,000 – 20,000 attributes Several for each gene on the microarray Real-valued, log values of the ratio of the red dye to green dye.

5 5 SNP data Most of human genetic variations exist in the form of polymorphisms SNP are the simplest but most abundant type of genetics variations Some of these polymorphisms that occur within coding region produce an amino acid change Such SNPs are known to affect the functional efficiency of genes

6 6 Single Nucleotide Polymorphisms Chromosome 1: TGCATATGCAAGTAACCGTAAACC Chromosome 2: TGCATATGCAACTAACCGTAAACC Chromosome 1: TGCATATGCAAGTAACCGTATACC Chromosome 2: TGCATATGCAAGTAACCGTATACC Chromosome 1: TGCATATGCAACTAACCGTAAACC Chromosome 2: TGCATATGCAACTAACCGTATACC SNP1SNP2SNP3…….. Individual 1 BothAllele1 …… Individual 2 Allele1Allele2Both….. Individual 3 Allele2BothAllele2…….. Individual 1 Individual 2 Individual 3

7 7 Hypothesis We hypothesize that the genetic background of childhood ALL patients, as assessed by genome-wide SNP profiles, will be informative of a patient response to therapy and eventual clinical outcome.

8 8 Data mining and knowledge discovery The aims of the project are: i. Construct a model based on SNP data. How to deal with high-dimensionality problem induced by this data? ii. Integrate SNP data with other datasets in order to have a better understanding of the problem iii. Patient-to-patient comparison based on genome- wide SNP data and integrated dataset iv. Generation of knowledge – try to identify the genetic markers which correlate with poor patient response to therapy

9 9 Data mining approach Pattern recognition problems in system biology are characterized by high dimensionality and noisy data, limited sample size, etc. Affected by the curse of dimensionality. Focus on clustering and visualization of patients in the space of low-dimensional projection of the original data. Find a low dimensional (3-D) projection of the integrated datasets so that distance between points in the projection is similar to the distance in the kernel-induced feature space. Using kernel-based methods such as KPCA and Laplacian eigenmaps. Kernel methods can deal with high-dimensional data.

10 10 Approach

11 11 Acknowledgment The Australian Rotary Health Research Fund Oncology Children’s Foundation University Technology, Sydney Paul Kennedy Simeon Simoff The Children's Hospital at Westmead Daniel Catchpoole

12 12 Thank you Questions ? Ahmad Aloqaily Email : aaoqaily@it.uts.edu.au Faculty of Information Technology University of Technology, Sydneyaaoqaily@it.uts.edu.au


Download ppt "Integrative data mining and visualization of genome-wide SNP profiles in childhood acute lymphoblastic leukaemia. Ahmad Aloqaily Faculty of IT University."

Similar presentations


Ads by Google