Presentation is loading. Please wait.

Presentation is loading. Please wait.

First of all: “Darnit Jim, I’m a doctor not a bioinformatician!”

Similar presentations


Presentation on theme: "First of all: “Darnit Jim, I’m a doctor not a bioinformatician!”"— Presentation transcript:

1 First of all: “Darnit Jim, I’m a doctor not a bioinformatician!”

2 Researcher interested in gene expression I have obtained raw RNAseq files (FASTQ) for a set of cell lines. How can I process this data and examine my gene(s) of interest? – Do it yourself using TraIT tools: run available NGS workflow in Galaxy – Ask a bioinformatician

3 First time experience of Galaxy

4 Looks like RNA expression analysis… But, I have something called a FASTQ file I don’t know about this format, where do I get such a reference?

5 Looks like RNA expression analysis… How do I know that the settings here are correct for my type of data? And many more options…

6 Instead of a BAM file I have a FASTQ file. How do I process this?

7 Solution: readily available workflow And other pipelines in progress

8 Gene expression: input parameters Ideally metadata on these parameters was provided by original data owners and/or can be traced back (own data  known; from other person  trace back)

9 Trial run For 4 colorectal cancer cell lines the FASTQ files were provided. Data owner could provide: platform adapter sequences library type Wanted to compare these to the processed RNAseq data of prostate cell lines (same experimental platform was used). Ran workflow and obtained readcounts/measure of expression for the new cell lines.

10 Comparison: colon and prostate Possible for non/little-informed user to run Galaxy workflow and obtain results in a format that can be used in downstream analysis.

11 Further analysis… Usually, comparison is tumour sample vs normal sample. –EdgeR is available to perform this comparison. Comparison of expression between groups is possible (e.g. colorectal cell lines vs prostate cell lines), however, when I have only cell lines: –how to solve the question: “does my gene of interest show altered expression in a particular sample compared to a reference sample?”

12 Issues When not in possession of normal/reference in the dataset (T only, cell lines), how to determine altered expression of a gene of interest? –Use a general normal reference that needs to be provided for comparison? (standard cut-off for increased or decreased expression) xxx reads = increased exp? –Calculate a median expression for all genes of the platform and then compare expression of one gene to median expression of all genes (significant outliers?) –Distiguish expression of a gene in diploid vs aneuploid cells  trouble, in most cases no ploidy status known

13 Issues When investigating data in the data-integration platform, query for the gene AURKA will give certain results. If one study had T/N and the other only T – and different manners for determining altered expression were applied – can this data be compared? –Pro: it’s processed and called data you’re comparing in this platform, trust the called data –Con: I don’t think it’s fair to compare differently called data – if comparing such datasets, start from the beginning and treat in the same manner  convert the data of the T/N analysed data to T-only or cell line only analysed


Download ppt "First of all: “Darnit Jim, I’m a doctor not a bioinformatician!”"

Similar presentations


Ads by Google