Presentation is loading. Please wait.

Presentation is loading. Please wait.

Prepare data for Microdeletion

Similar presentations


Presentation on theme: "Prepare data for Microdeletion"— Presentation transcript:

1 Prepare data for Microdeletion
Jianfang Chen

2 1. Original Data Set.

3 (1) snp_homozygosity_data. (2) snp_location_data. (3) Parameter file.
2. Objective Data Sets. (1) snp_homozygosity_data. (2) snp_location_data. (3) Parameter file.

4 snp_homozygosity_data -- The first row is the title line
snp_homozygosity_data -- The first row is the title line. The first column is affection_status (0 for controls, 1 for cases). The remaining columns are homozygosity data at each site ( 0 for missing, 1 for homozygotes, 2 for heterozygotes). Example of "snp_homozygosity_data" (with two controls, two cases, 6 SNPs): indicator v1 v2 v3 v4 v5 v6

5 snp_location_data -- The first row is the title line
snp_location_data -- The first row is the title line. The first column is SNP index number. Second col is SNP location. The locations are sorted in increasing order. Example of "snp_location_data" (with 6 SNPs): order position

6 Parameter file -- It needs the following inputs (one input per line):
snp_homozygosity_data_name, snp_location_data_name, output_file_name, num_cont, num_case, num_site, maximum_window_size, num_rep1

7 3. Algorithm sort orginal data by FamilyID, Position and Marker_name.
remove one marker with duplicate position. for each family within a marker (3 individuals) leave child as case

8 combine father and mother into one line as control, based on the following algorithm:
suppose father (a,b) mother (c,d) and child (e,f) if e=a and f=c then control will be (b,d) else if e=a and f=d then control will be (b,c) else if e=b and f=c then control will be (a,d) else if e=b and f=d then control will be (a,c) else if e=c and f=a then control will be (d,b) else if e=c and f=b then control will be (d,a) else if e=d and f=a then control will be (c,b) else if e=d and f=b then control will be (c,a)

9 else if a=1 and b=1 and c=1 and d=1 and e=2 and f=2 then control will be (1,1) else if a=2 and b=2 and c=2 and d=2 and e=1 and f=1 then control will be (2,2) else if a=1 and b=1 and c=1 and d=1 and e=2 and f=2 then control will be (1,1) else if a=2 and b=2 and c=2 and d=2 and e=1 and f=1 then control will be (2,2) else if a=1 and b=1 and c=2 and d=2 and e=1 and f=1 then control will be (1,2) else if a=2 and b=2 and c=1 and d=1 and e=1 and f=1 then control will be (1,2) else if a=1 and b=1 and c=2 and d=2 and e=2 and f=2 then control will be (1,2)

10 else if a=2 and b=2 and c=1 and d=1 and e=2 and f=2 then control will be (1,2)
else if a=2 and b=2 and c=2 and d=2 and e=1 and f=2 then control will be (2,2) else if a=1 and b=1 and c=1 and d=1 and e=1 and f=2 then control will be (1,1) else control will be (0,0)

11 recode any combination of a,b,c,d pair(x,y) as
if x*y=0 then output 0 else if x*y=2 then output 1 else output 2 dump out Middle Step Output as I put in the website. for each family "0" + line up of all parents recode_number got from step4. "1" + line up of all children recode_number got from step4.

12 data_all.txt data_clean.txt
4. Data sets. data_all.txt data_clean.txt


Download ppt "Prepare data for Microdeletion"

Similar presentations


Ads by Google