Download presentation
Presentation is loading. Please wait.
1
Biopop, Seoul National University
Han Kyung University GWAS ( Practice ) Hyeon soo Jeong December 2014 Biopop, Seoul National University
2
Contents Basic statistics Linux system PLINK Quality control
Linear regression Manhattan plot Kinship analysis IBS matrix using EMMAX Mixed linear model for dealing inbred QQ-plot Comparison
3
Contents Basic Statistics
4
Basic Statistics 𝑦~𝑋β+ε Regression analysis
Conall M. O'Seaghdha & Caroline S. Fox, 2012, Nat Rev Neprolo
5
Basic Statistics Hypothesis Null Hypothesis : β=0
Alternative Hypothesis : β≠0 (Two tails)
6
Basic Statistics Type I and Type II error
7
Contents Linux system
8
Linux system Ubuntu LTS Password :
9
Linux system 리눅스 명령어 (Ubuntu ) 명령어 기능 cd <dir>
ls 파일 또는 폴더 리스트를 출력 ll ls 보다 더욱 상세한 파일 또는 폴더 정보 출력 mkdir 디렉토리 생성 cp 파일 또는 폴더 복사 mv 파일 이동 rm 파일 삭제 mv -r 디렉토리 이동 rm -r 디렉토리 삭제
10
Data Structure Pedigree file (PED file) 1 2 3 4 5 6 7 ~~
IID FID father Mother sex phenotype genotype~ ~~ Fam1 Ind G T Fam2 Ind G G Fam2 Ind G T Fam2 Ind T T Fam3 Ind T T Fam4 Ind G T
11
Data management MAP file 1 2 3 4 chr SNP_ID recom position
1 SNP 1 SNP 1 SNP 2 SNP4 0 13 2 SNP 2 SNP
12
Data management PED & MAP 1 SNP1 0 13435 1 SNP2 0 25256 1 SNP3 0 28242
Fam1 Ind G T A A G G Fam2 Ind G G A A G T Fam2 Ind G T T A G G Fam2 Ind T T A A T T Fam3 Ind T T T T T T Fam4 Ind G T A A T T
13
Contents PLINK : Whole Genome Data Analysis Toolkit
14
PLINK ( Download & install )
Program download
15
PLINK ( Download & install )
Program download $ wget
16
PLINK ( Download & install )
installation $ unzip plink-1.07-x86_64.zip
17
PLINK ( Download & install )
$ wget $ unzip plink-1.07-x86_64.zip $ cd plink-1.07-x86_64 $ sudo cp plink /usr/local/bin/
18
Data management How to convert ped file to bed file?
$ plink --cow --file Testfile --make-bed --out TestData --noweb .bed : genotype information (열 수 없음.) .fam : family information (ped file 1~6 columns) .bim : SNP information (map file + allele types)
19
Data management 각종 여러 가지 데이터 변형 관련 명령어
20
Contents Quality Control
21
Quality control Hardy-Weinberg Equilibrium
$ plink --noweb --cow --bfile TestData --hardy2 --out TestData_hardy $ vi TestData_hardy.hwe
22
Quality control Hardy-Weinberg Equilibrium filteration
$ plink --noweb --cow --bfile TestData --hwe make-bed --out TestData_hwe
23
Quality control Minor allele frequency
$ plink --cow --bfile TestData --freq --out TestData_Freq --noweb
24
Quality control Minor allele frequency filteration
$ plink --cow --bfile TestData --maf make-bed --out TestData_maf --noweb
25
Quality control Missing data filteration
$ plink --cow --bfile TestData --missing --out TestData_missing --noweb imiss lmiss
26
Quality control --geno : genotype filter (missing > 0.9)
--mind : individual filter (missing > 0.9) $ plink --cow --bfile TestData --geno make-bed --out TestData_geno --noweb $ plink --cow --bfile TestData --mind make-bed --out TestData_mind --noweb $ plink --cow --bfile TestData --mind geno maf hwe out 1_QC_TestData --noweb
27
Quality control
28
Contents Linear Regression Analysis
29
Linear Regression Analysis
##Linear model plink --bfile 1_QC_TestData --pheno Data/nBFzH.pheno --linear --ci out 2_LM_QC_TestData --noweb --cow plink --bfile 1_QC_TestData --pheno Data/nBFzH.pheno --linear --ci dominant --out 2_LM_QC_dom_TestData --noweb --cow plink --bfile 1_QC_TestData --pheno Data/nBFzH.pheno --linear --ci recessive --out 2_LM_QC_rec_TestData --noweb --cow
30
Linear Regression Analysis
31
Linear Regression Analysis
Convert “Space” to “Tab” determination #1.LM_Convert.py #!/usr/bin/python inputs = raw_input("input file : ") data = open(inputs, 'r') outf = open('QQinput.txt', 'w') for i in data: ip = i.strip().split(' ') co = ip.count('') for i2 in range(co): ip.remove('') for i3 in ip: outf.write(i3+'\t') outf.write('\n') data.close() outf.close()
32
Linear Regression Analysis
Convert “Space” to “Tab” determination #1.LM_Convert.py python 1.LM_Conver.py
33
Manhattan plot #2.Manhattan.R
Data <- read.table("QQinput.txt", sep= '\t', header = T) # file import library(gap) # import gap library png("Manhattan.png", height = 6000, width=7000, res=900) # Save as png format mhtdata <- with(Data,cbind(CHR,BP,P)) # Select columns color <- rep(c("blue","red"),15) # Manhattan plot color par(cex=0.5) ops <- mht.control(colors=color,yline=1.5,xline=1) mhtplot(mhtdata,ops,pch=25) axis(2,pos=1,at=1:25) bon <- -log10(1.0E-05) abline(h=bon, col="black") abline(h=0) title("Main title",cex.main=2.5) dev.off()
34
Manhattan plot #2.Manhattan.R Rscript 2.Manhattan.R
35
Manhattan plot
36
Contents Mixed Linear Model
37
Mixed Linear Model QQ-plot #3.QQplot.R Input: QQinput.txt
38
Mixed Linear Model QQ-plot Significant loci Real??
39
Mixed Linear Model Kinship Analysis
$ wget
40
Mixed Linear Model Kinship Analysis and its problem Identical twins
Family king -b 1_QC_TestData.bed --kinship
41
Mixed Linear Model EMMAX
42
Mixed Linear Model IBS Matrix using EMMAX
$ emmax-kin-intel64 -v -s -M 1 -d 10 3_Trans_QC_TestData
43
Mixed Linear Model Mixed Linear Model using EMMAX
$ emmax-intel64 -v -d 10 -t 3_Trans_QC_TestData -p Data/nBFzH.pheno -k 3_Trans_QC_TestData.aIBS.kinf -o 4_MLL_QC_TestData
44
Contents QQ-plot Comparison (PLINK vs EMMAX)
45
QQ-plot Comparison PLINK vs. EMMAX
46
Contents Functional Analysis
47
Functional Analysis Coremine
48
Functional Analysis DAVID
49
THANK YOU
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.