First of all: “Darnit Jim, I’m a doctor not a bioinformatician!”

Slides:



Advertisements
Similar presentations
IMGS 2012 Bioinformatics Workshop: RNA Seq using Galaxy
Advertisements

Processing of miRNA samples and primary data analysis
Stat-JR: eBooks Richard Parker. Quick overview To recap… Stat-JR uses templates to perform specific functions on datasets, e.g.: – 1LevelMod fits 1-level.
Dahlia Nielsen North Carolina State University Bioinformatics Research Center.
High Throughput Sequencing
RNA-seq analysis case study Anne de Jong 2015
Genomic Arrays: Tools for cancer gene discovery Ian Roberts MRC Cancer Cell Unit Hutchison MRC Research Centre
NHGRI/NCBI Short-Read Archive: Data Retrieval Gabor T. Marth Boston College Biology Department NCBI/NHGRI Short-Read.
RNA-seq Analysis in Galaxy
High Throughput Sequencing
Regression testing Tor Stållhane. What is regression testing – 1 Regression testing is testing done to check that a system update does not re- introduce.
NGS data format and General Quality Control. Data format “Flowchart” Sequencer raw data FastqSAM/BAM.
NGS Data Generation Dr Laura Emery. Overview The NGS data explosion Sequencing technologies An example of a sequencing workflow Bioinformatics challenges.
Before we start: Align sequence reads to the reference genome
NGS Analysis Using Galaxy
Confidence Intervals and Hypothesis Testing - II
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING.
ArrayExpress and Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
By Louise Kelly Candidate Number:9577 Centre Name: Sacred Heart High school Centre Number: 10160
Bioinformatics Institute work with ASAS Genomics Centre By Dan Jones.
Statistical Fundamentals: Using Microsoft Excel for Univariate and Bivariate Analysis Alfred P. Rovai Estimation PowerPoint Prepared by Alfred P. Rovai.
RNAseq analyses -- methods
Introduction to RNA-Seq & Transcriptome Analysis
NGS data analysis CCM Seminar series Michael Liang:
RNA-Seq in Galaxy Igor Makunin QAAFI, Internal Workshop, April 17, 2015.
Eran Yanowski, Eran Hornstein’s: Monitor drug impact on the transcriptome of mouse beta cells (primary and cell-line) using Transeq/RNA-Seq Report.
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
Chip – Seq Peak Calling in Galaxy Lisa Stubbs Chip-Seq Peak Calling in Galaxy | Lisa Stubbs | PowerPoint by Casey Hanson.
1 Removing Sequential Bottlenecks in Analysis of Next-Generation Sequencing Data Yi Wang, Gagan Agrawal, Gulcin Ozer and Kun Huang The Ohio State University.
Current Challenges in Metagenomics: an Overview Chandan Pal 17 th December, GoBiG Meeting.
Statistical Fundamentals: Using Microsoft Excel for Univariate and Bivariate Analysis Alfred P. Rovai Estimation PowerPoint Prepared by Alfred P. Rovai.
IPlant Collaborative Discovery Environment RNA-seq Basic Analysis Log in with your iPlant ID; three orange icons.
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
My CoGe Comparing our genomes. Background and Introduction  Decreases in sequencing costs, coupled with increases in speed have paved the way for “Personal.
Analysis of GEO datasets using GEO2R Parthav Jailwala CCR Collaborative Bioinformatics Resource CCR/NCI/NIH.
KNR 445 Statistics t-tests Slide 1 Introduction to Hypothesis Testing The z-test.
Databases.  A database is simply a collection of information stored in an orderly manner.  A database can be as simple as a birthday book, address book.
Tutorial 6 High Throughput Sequencing. HTS tools and analysis Review of resequencing pipeline Visualization - IGV Analysis platform – Galaxy Tuning up.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
Analyzing digital gene expression data in Galaxy Supervisors: Peter-Bram A.C. ’t Hoen Kostas Karasavvas Students: Ilya Kurochkin Ivan Rusinov.
Investigate Variation of Chromatin Interactions in Human Tissues Hiren Karathia, PhD., Sridhar Hannenhalli, PhD., Michelle Girvan, PhD.
The iPlant Collaborative
RNA-Seq in Galaxy Igor Makunin DI/TRI, March 9, 2015.
Non-parametric Approaches The Bootstrap. Non-parametric? Non-parametric or distribution-free tests have more lax and/or different assumptions Properties:
Lars Ailo Bongo NBS meeting Tromsø, Jan 23, 2016 NeLS Norwegian e-Infrastructure for Life Sciences Overview and recent developments
Chip – Seq Peak Calling in Galaxy Lisa Stubbs Lisa Stubbs | Chip-Seq Peak Calling in Galaxy1.
HOMER – a one stop shop for ChIP-Seq analysis
Introduction to Oncomine Xiayu Stacy Huang. Oncomine is a cancer-specific microarray database and has a web-based data-mining platform aimed at facilitating.
Using Galaxy to build and run data processing pipelines Jelle Scholtalbers / Charles Girardot GBCS Genome Biology Computational Support.
Canadian Bioinformatics Workshops
CCRC Cancer Conference November 8, 2015.
JAX: Exploring The Galaxy Glen Beane, Senior Software Engineer.
From Reads to Results Exome-seq analysis at CCBR
NCRI Cancer Conference November 1, 2015.
How to get from a pile of unprocessed data to knowledge: The user’s perspective Guido Jenster, Ph.D. Professor of Experimental Urological Oncology Department.
Misleading bioinformatics: Mistakes, Biases, Mis-interpretations and how to avoid them Festival of Genomics 2017 Course Exercise Material:
Using command line tools to process sequencing data
An Artificial Intelligence Approach to Precision Oncology
RNA-Seq analysis in R (Bioconductor)
How to store and visualize RNA-seq data
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
The FASTQ format and quality control
Misleading Bioinformatics Mistakes, Biases, Mis-Interpretations and how to avoid them Festival of Genomics 2017.
Garbage In, Garbage Out: Quality control on sequence data
Galaxy course EMC TraIT Nov 2014_Jenster
Introduction to RNA-Seq & Transcriptome Analysis
Objectives 6.1 Estimating with confidence Statistical confidence
Gene expression profiles of T cells.
Presentation transcript:

First of all: “Darnit Jim, I’m a doctor not a bioinformatician!”

Researcher interested in gene expression I have obtained raw RNAseq files (FASTQ) for a set of cell lines. How can I process this data and examine my gene(s) of interest? – Do it yourself using TraIT tools: run available NGS workflow in Galaxy – Ask a bioinformatician

First time experience of Galaxy

Looks like RNA expression analysis… But, I have something called a FASTQ file I don’t know about this format, where do I get such a reference?

Looks like RNA expression analysis… How do I know that the settings here are correct for my type of data? And many more options…

Instead of a BAM file I have a FASTQ file. How do I process this?

Solution: readily available workflow And other pipelines in progress

Gene expression: input parameters Ideally metadata on these parameters was provided by original data owners and/or can be traced back (own data  known; from other person  trace back)

Trial run For 4 colorectal cancer cell lines the FASTQ files were provided. Data owner could provide: platform adapter sequences library type Wanted to compare these to the processed RNAseq data of prostate cell lines (same experimental platform was used). Ran workflow and obtained readcounts/measure of expression for the new cell lines.

Comparison: colon and prostate Possible for non/little-informed user to run Galaxy workflow and obtain results in a format that can be used in downstream analysis.

Further analysis… Usually, comparison is tumour sample vs normal sample. –EdgeR is available to perform this comparison. Comparison of expression between groups is possible (e.g. colorectal cell lines vs prostate cell lines), however, when I have only cell lines: –how to solve the question: “does my gene of interest show altered expression in a particular sample compared to a reference sample?”

Issues When not in possession of normal/reference in the dataset (T only, cell lines), how to determine altered expression of a gene of interest? –Use a general normal reference that needs to be provided for comparison? (standard cut-off for increased or decreased expression) xxx reads = increased exp? –Calculate a median expression for all genes of the platform and then compare expression of one gene to median expression of all genes (significant outliers?) –Distiguish expression of a gene in diploid vs aneuploid cells  trouble, in most cases no ploidy status known

Issues When investigating data in the data-integration platform, query for the gene AURKA will give certain results. If one study had T/N and the other only T – and different manners for determining altered expression were applied – can this data be compared? –Pro: it’s processed and called data you’re comparing in this platform, trust the called data –Con: I don’t think it’s fair to compare differently called data – if comparing such datasets, start from the beginning and treat in the same manner  convert the data of the T/N analysed data to T-only or cell line only analysed