Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Biostatistics and Statistical Bioinformatics Setia Pramana Universitas Brawijaya Malang, 7 October 2011.

Similar presentations

Presentation on theme: "1 Biostatistics and Statistical Bioinformatics Setia Pramana Universitas Brawijaya Malang, 7 October 2011."— Presentation transcript:

1 1 Biostatistics and Statistical Bioinformatics Setia Pramana Universitas Brawijaya Malang, 7 October 2011


3 Who Need Statisticians? Can only become a lecturer/teacher? NO…… More applied fields: My classmates work in: –Information and Communication Technology. –Research and Developments –Governments: Ministry of Finance, PLN, Bank Indonesia, Danareksa, etc. –Entrepreneur –Many more... Writer.... Read the book: 9 Summers 10 Autumns 3

4 4


6 Biostatistics The study of statistics as applied to biological areas such as Biological laboratory experiments, medical research (including clinical research), and public health services research. Biostatistics, far from being an unrelated mathematical science, is a discipline essential to modern medicine – a pillar in its edifice (Journal of the American Medical Association (1966) 6

7 Biostatistics Public Health: –Epidemiology –Modeling Infectious Diseases: HIV, HCV –Disease Mapping –Genetics: family related disease Bioinformatics –Image Processing –Data Mining –Pattern recognition –etc 7

8 Biostatistics Agriculture –Experimental Design –Genetics Biomedical Research Evidence-based medicine Clinical studies Drug Development 8

9 Statistical Methods? t-test ANOVA Regression Cluster analysis Discriminant analysis Non-Linear Modeling Multiple comparison Linear Mixed Model Bayesian Etc, z 9


11 Drugs Development Takes 10-15 years Cost more than 1 million USD To ensure that only the drugs that are that are both safe and effective can be marketed. Stages: - Drug Discovery - Pre-clinical Development - Clinical Development -> 4 Phases Statisticians are involved in all stages (a must) 11

12 12 Pharmaceutical development Pre-clinical (animal) studies Investigational New Drug application Phase I clinical trials Phase II clinical trials Phase III clinical trials New Drug Application Phase IV clinical trials pharmacological profile; acute toxicity; effects of long-term usage discovery of compound; synthesis and purification of drug substance; manufacturing procedures small; focus on safety medium size; focus on safety and short-term efficacy; large and comparative; focus on efficacy and cost benefits real world experience; demonstrate cost benefits; rare adverse reactions

13 International Conference on Harmonization (ICH) The international harmonization of requirements for drug research and development so that information generated in one country or area would be acceptable to other countries or areas. Regions: Europe, USA, Japan. All clinical trials must follow ICH regulations. Statistics plays important role. Statistical Principles for Clinical Trials (ICH E9). 13

14 Preclinical and Clinical Development Statisticians are involved from the beginning of the study Planning the study –Formulating the hypothesis –Choosing the endpoint –Choosing the design and sample size Conduct of the study –Patient accrual –Data collection Data Quality control, Data analysis Publication of results 14


16 Bioinformatics Bioinformatics is a science straddling the domains of biomedical, informatics, mathematics and statistics. Applying computational techniques to biology data Functional Genomics Proteomics Sequence Analysis Phylogenetic Etc,. 16

17 Informatics in Bioinformatics Databases –Building, Querying –Object DB Text String Comparison –Text Search Finding Patterns –AI / Machine Learning –Clustering –Data mining etc 17

18 Central Dogma of Molecular Biology Genes contain construction information All structure and function is made up by proteins 18

19 Genomics Premise: Physiological changes -> Gene expression changes -> mRNA abundance level changes Objective: Use gene expression levels measured via DNA microarrays to identify a set of genes that are differentially expressed across two sets of samples (e.g., in diseased cells compared to normal cells) 19

20 Microarrays Technology DNA microarrays are a new and promising biotechnology which allow the monitoring of expression of thousand genes simultaneously 20

21 Gene Expression Analysis Overview of the process of generating high throughput gene expression data using microarrays. 21

22 Preprocessed data 22 Genes C1 C2 C3 T1 T2 T3 G8521 6.89 7.18 6.60 7.40 7.15 7.40 G8522 6.78 6.55 6.37 6.89 6.78 6.92 G8523 6.52 6.61 6.72 6.51 6.59 6.46 G8524 5.67 5.69 5.88 7.43 7.16 7.31 G8525 5.64 5.91 5.61 7.41 7.49 7.41 G8526 4.63 4.85 5.72 5.71 5.47 5.79 G8527 8.28 7.88 7.84 8.12 7.99 7.97 G8528 7.81 7.58 7.24 7.79 7.38 8.60 G8529 4.26 4.20 4.82 3.11 4.94 3.08 G8530 7.36 7.45 7.31 7.46 7.53 7.35 G8531 5.30 5.36 5.70 5.41 5.73 5.77 G8532 5.84 5.48 5.93 5.84 5.73 5.75

23 Applications High efficacy and low/no side effect drug Personalized medicine. Genes related disease. Biological discovery –new and better molecular diagnostics –new molecular targets for therapy –finding and refining biological pathways Molecular diagnosis of leukemia, breast cancer, Appropriate treatment for genetic signature Potential new drug targets 23

24 Challenges Mega data, difficult to visualize Too few records (columns/samples), usually < 100 Too many rows(genes), usually > 1,000 Too many columns likely to lead to False positives for exploration, a large set of all relevant genes is desired for diagnostics or identification of therapeutic targets, the smallest set of genes is needed model needs to be explainable to biologists 24

25 Microarray Data Analysis Types Gene Selection –find genes for therapeutic targets Classification (Supervised) –identify disease (biomarker study) –predict outcome / select best treatment Clustering (Unsupervised) –find new biological classes / refine existing ones –Understanding regulatory relationship/pathway –exploration 25

26 Gene Selection Modified t-test Significance Analysis of Microarray (SAM) Limma (Linear model for microarrays ) Random forest Lasso (least absolute selection and shrinkage operator) Linear Mixed model Elastic-net Etc, 26

27 Visualization Dimensionality reduction PCA (Principal Component Analysis) Biplot Multi dimensional scaling Etc 27

28 Clustering Cluster the genes Cluster the arrays/conditions Cluster both simultaneously K-means Hierarchical Biclustering algorithms 28

29 Clustering Cluster or Classify genes according to tumors Cluster tumors according to genes 29

30 Biclustering A biclustering method is an unsupervised learning method which looks for sub-matrices in a data matrix with a high similarity of elements. Algorithms: Statistical based, AI, machine learning. BiclustGUI: A User Friendly Interface for Biclustering Analysis 30

31 Bicluster Structure 31

32 Software/Statistical Packages Minitab SAS SPSS R S-Plus Matlab Stata 32

33 R now is growing, especially in bioinformatics –Statistics, data analysis, machine learning –Free –High Quality –Open Source –Extendable (you can submit and publish your own package!!) –Can be integrated with other languages (C/C++, Java, Python)JavaPython –Large active user community –Command-based (-) 33

34 34 Summary Statisticians can flexibly get involved in many fields. Only tools, applications are widely range. Biostatisticians have many opportunities in public health services ( Centers for Disease Control and Prevention, CDC), pharmaceutical companies, research institutions etc. Statistical Bioinformatics: cutting edge technology -> methods are growing -> many more developments in future.

35 35 Thank you for your attention...

Download ppt "1 Biostatistics and Statistical Bioinformatics Setia Pramana Universitas Brawijaya Malang, 7 October 2011."

Similar presentations

Ads by Google