Presentation is loading. Please wait.

Presentation is loading. Please wait.

Converting Large NCBI Databases into SAS Rosa SJ Lin Division of Statistical Genomics Washington University in Saint Louis June 30, 2008.

Similar presentations


Presentation on theme: "Converting Large NCBI Databases into SAS Rosa SJ Lin Division of Statistical Genomics Washington University in Saint Louis June 30, 2008."— Presentation transcript:

1 Converting Large NCBI Databases into SAS Rosa SJ Lin Division of Statistical Genomics Washington University in Saint Louis June 30, 2008

2 NCBI (http://www.ncbi.nlm.nih.gov)  Contains a large number of databases  Most important are: - GenBank - PubMed - RefSeq - Online Mendelian Inheritance in Man (OMIM) - dbSNP

3 dbSNP Database

4 NCBI dbSNP  Contains information about SNPs  Submitted data is given an ss number (e.g. ss52079780)  If data meets criteria a reference SNP is created which had an rs number (e.g. rs530)

5 dbSNP Data (1) - Each record with various lines and each line with various lengths

6 dbSNP Data (2)

7 dbSNP Data (3)

8  Various uses of the SCAN, INDEX functions to assist in reading data (1) data ncbisnp ; length rs $12 ; infile din firstobs=1 missover pad; input snpline $132. ; if index(snpline,"updated")>0 then do; rs=compress(scan(snpline,1,"|")); output; end; run;

9  Various uses of the SCAN, INDEX functions to assist in reading data (2) if index(snpline,"alleles=")>0 then do; alleles=substr(compress(scan(snpline,2,"|")),9); output; end; if index(snpline,"assembly=reference")>0 then do chrom=input(substr(compress(scan(snpline,3,"|")),5),8.); posc=compress(scan(snpline,4,"|")); output; end;

10  Use RETAIN statement - cause a variable to keep its value from one iteration of the DATA step to the next. retain markname rs alleles;

11 dbSNP Data (4)

12 Output SAS Dataset

13 Readings:  Kim L Kolbe etc., SUGI 22: “Advanced Techniques for Reading Difficult and Unusual Flat Files”.  Clinton S Rickards, SUGI 24: “Reading External Files Using SAS ® Software ”.


Download ppt "Converting Large NCBI Databases into SAS Rosa SJ Lin Division of Statistical Genomics Washington University in Saint Louis June 30, 2008."

Similar presentations


Ads by Google