Presentation is loading. Please wait.

Presentation is loading. Please wait.

; 1 10 20 25 ; | | | | GM01 A A A A A A A A A A A A A A A A B B B B B B B B B GM02 A A A A A A A A A A A A A A A B B B B B B B B B B GM03 A A A A A A A.

Similar presentations


Presentation on theme: "; 1 10 20 25 ; | | | | GM01 A A A A A A A A A A A A A A A A B B B B B B B B B GM02 A A A A A A A A A A A A A A A B B B B B B B B B B GM03 A A A A A A A."— Presentation transcript:

1 ; 1 10 20 25 ; | | | |ocus file Mapping using Recombinant Inbred Lines Genetic Cross Genotyping Raw Marker Scores Mapping – Inference of linear order of markers using raw scores MadMapper_RECBIT – Quality control of genetic markers and group analysis MadMapper_XDELTA – Inference of linear order of markers on linkage groups CheckMatrix (py_matrix_2D_V248_RECBIT.py) – Visualization and validation of genetic maps using two- dimensional heat-plots and graphical genotyping MadMapper and CheckMatrix are multi-platform Python programs that can be used on UNIX, Windows, and Mac OS X; Detailed analysis (quality control and clustering) can be done on a set of ~2,000 markers; Map construction works in a reasonable timeframe with up to ~500 markers; Large images (up to 10,000 x 10,000 pixels) can visualize up to ~2 million pairwise scores simultaneously MadMapper_RECBIT input and output files Group Info: [ *.group_info ] one file per iteration 16 iterations with different cutoff values Adjacency List: [ *.adj_list ] one file per iteration 16 iterations with different cutoff values Recombination Distance Scores: [ *.pairs_all ]................... GM01 GM07 0.36 GM01 GM08 0.40 GM01 GM09 0.48 GM01 GM10 0.52 GM01 GM11 0.60 GM01 GM12 0.68 GM02 GM01 0.04 GM02 GM02 0.00 GM02 GM03 0.08 GM02 GM04 0.16 GM02 GM05 0.20 GM02 GM06 0.24................... Group Info Summary: file [ *.x_tree_clust ] Summary for clustering results for all 16 iterations Distinct linkage groups can be inferred by analysis of this clustering / grouping information Non-Redundant Marker Scores: [ *.z_nr_scores.loc ] locus file with non-redundant raw marker scores ; 1 10 20 25 ; | | | | GM01 A A A A A A A A A A A A A A A A B B B B B B B B B GM02 A A A A A A A A A A A A A A A B B B B B B B B B B GM03 A A A A A A A A A A A A A B B B B B B B B B B B B GM04 A A A A A A A A A A A B B B B B B B B B B B B B B GM05 A A A A A A A A A A B B B B B B B B B B B B B B B GM06 A A A A A A A A A B B B B B B B B B B B B B B B B GM07 A A A A A A A A A B B B B B B B B B B B B B B A A GM08 A A A A A A A A A B B B B B B B B B B B B B A A A GM09 A A A A A A A A A B B B B B B B B B B B A A A A A GM10 B A A A A A A A A A B B B B B B B B B A A A A A A GM11 B B A A A A A A A A B B B B B B B B A A A A A A A GM12 B B B A A A A A A A B B B B B B B A A A A A A A A INPUT: Locus file Python_MadMapper_V248_RECBIT_012.py Marker summary: [ *.z_marker_sum ] for each marker, a ‘quality class’ is assigned, which is useful for selection of ‘core’ markers Marker Scores Info: [ *.x_scores_stat ] detailed information about scores and linkage Trio Analysis: [ *.z_trio_good ] [ *.z_trio_best ] [ *.z_trios_bad ] analysis of all trios (triplets) for non-redundant set of markers LOG file: ( *.x_log_file ) information about run parameters one input file - locus file with raw marker scores 82 output files MadMapper BIT scoring system is used as an alternative to LOD scores to quantify linkage confidence between markers JoinMap LOD scoresJoinMap REC scores MadMapper BIT scoresMadMapper REC scores Arabidopsis Genetic Map (Dean and Lister), five linkage groups: Comparison of Different Scoring Systems MadMapper_RECBIT Clustering: Group Info Summary [ *.x_tree_clust file ] provides information about marker grouping – belonging of any particular marker to specific linkage group MadMapper_RECBIT BIN Analysis distinguishes true bins from linked groups M_1 A A A B B B A A A A B B B B A A - A A B B B B A B B A A B A A A B B B B M_2 A A A B B B A - A A B B B B A A A A A B B B B A B B A A B A A A B B B B M_3 A A A B B B A A A A B B - B A A A A A B B B B A B B A A B A A A B B B B M_4 A A A B B B A A A A B B A B A A A A A B B - B A B B A A B A A A B B B B M_2M_4M_3M_1 Linked Group Saturated Node Diluted Node Example of Complete Graph: all nodes are ‘saturated’ MadMapper_RECBIT Marker Summary [ *.z_marker_sum file ] provides info about redundancy of scores, marker qualities, and allele distortion MARKER_F lank_1 REC1 BIT 1 D_FR1 MARKER_ Middle REC2 BIT 2 D_FR2 MARKER_F lank_2 REC_F lank BIT _F D_FR_ F D_REC D_REC_ Flank COR470.13360.6931CAT30.08573480.6931G2395***0.02534500.7822***5+0 G23950.08573480.6931CAT30.13360.6931COR47***0.02534500.7822***5+0 LK1410.15221920.4554GUT150.1932100.5644MI238***0.12312940.6436***4+0 MI2380.1932100.5644GUT150.15221920.4554LK141***0.12312940.6436***4+0 MI2040.0515280.9703MI510.08063120.6139SGCSNP41***0.01613600.6139***4+0 SGCSNP410.08063120.6139MI510.0515280.9703MI204***0.01613600.6139***4+0 M3360.04944380.802COR150.03854320.7723VE018***0.08794500.901***0+0 VE0180.03854320.7723COR150.04944380.802M336***0.08794500.901***0+0 ARR70.01155100.8614COR4702820.4653F15571***0.01793240.5545***0+0 F1557102820.4653COR470.01155100.8614ARR7***0.01793240.5545***0+0 PAP30.02275040.8713COR780.05772760.5149PDC2***0.05882700.505***0+0 PDC20.05772760.5149COR780.02275040.8713PAP3***0.05882700.505***0+0 Bad Trios Good Trios MadMapper_RECBIT Trio (Triplet) Analysis Number of double crossovers should be low for ‘good’ trios Number of double crossovers is high for ‘bad’ trios M_1 A A A B B B A A A A B B B B A A - A A B B B B A B B A A B A A A B B B B X X X X M_M A A A B A B A - A A B B B B A B A A A B B A B A B B A A B A A A B B A B X X X X M_2 A A A B B B A A A A B B - B A A A A A B B B B A B B A A B A A A B B B B ‘middle’ marker flanking marker 1 flanking marker 2 MadMapper_XDELTA Usage: MadMapper_XDELTA takes three files as input: 1.Matrix (pairwise distances between markers) 2. List of ‘frame’ markers 3. List of markers to map First step: finding the best map for ‘frame’ markers by checking all possible combinations (up to 10 markers) optionally: unlimited list of ‘frame’ markers with a fixed order Best-Fit extension Take one marker from the list of markers to map and insert it into 2-dimensional matrix of the current best map. Check for all possible positions. Calculate ‘delta’ and find the map with the lowest ‘delta’ value (lowest ‘entropy’) Move to the next marker to map until all markers are mapped. Optional shuffling (ripple) after several steps Visual Explanation of Minimum Entropy Approach to Infer Linear Order Using MadMapper_XDELTA program CheckMatrix 2D plot: random order high ‘entropy’ partially wrong order right order low ‘entropy’ MadMapper_XDELTA analyzes two- dimensional matrices of all pairwise scores and finds the best map that has a minimum total sum of differences between adjacent cells (map with the lowest ‘entropy’). Visualization of numerical data using CheckMatrix ============================================= MATRIX (ALL PAIRS) : madmapper_test_small.out.pairs_all MARKERS TO MAP : madmapper_test_small.list FRAME MARKERS LIST : madmapper_test_small.frame OUTPUT MAP FILE : madmapper_test_small.xdelta MAX FRAME LENGTH : 12 FIXED FRAME ORDER : FALSE LINKAGE GROUP ID : LG DUMMY DEBUG : TRUE ============================================= ======= GM02 GM06 GM10 *** 1.52 *** 0.5067 *** 1 GM02 GM10 GM06 *** 1.92 *** 0.64 *** 2 GM06 GM02 GM10 *** 1.68 *** 0.56 *** 3 ======= GM03 GM02 GM06 GM10 *** 2.16 *** 0.54 *** 1 GM02 GM03 GM06 GM10 *** 2.0 *** 0.5 *** 2 GM02 GM06 GM03 GM10 *** 2.64 *** 0.66 *** 3 GM02 GM06 GM10 GM03 *** 3.2 *** 0.8 *** 4 ======= GM08 GM02 GM03 GM06 GM10 *** 3.64 *** 0.728 *** 1 GM02 GM08 GM03 GM06 GM10 *** 4.32 *** 0.864 *** 2 GM02 GM03 GM08 GM06 GM10 *** 3.28 *** 0.656 *** 3 GM02 GM03 GM06 GM08 GM10 *** 2.56 *** 0.512 *** 4 GM02 GM03 GM06 GM10 GM08 *** 3.16 *** 0.632 *** 5 ======= GM09 GM02 GM03 GM06 GM08 GM10 *** 4.8 *** 0.8 *** 1 GM02 GM09 GM03 GM06 GM08 GM10 *** 5.92 *** 0.9867 *** 2 GM02 GM03 GM09 GM06 GM08 GM10 *** 4.72 *** 0.7867 *** 3 GM02 GM03 GM06 GM09 GM08 GM10 *** 3.76 *** 0.6267 *** 4 GM02 GM03 GM06 GM08 GM09 GM10 *** 3.12 *** 0.52 *** 5 GM02 GM03 GM06 GM08 GM10 GM09 *** 3.52 *** 0.5867 *** 6 Example of the construction of a framework map and Best-Fit Extension for the remaining markers: map calculated by checking all possible combinations marker GM03 was inserted marker GM09 was inserted marker GM08 was inserted LGLG MARKER PO S#1# DST1#2# DST2#3# DST3#S# SUMM#D# DIFFSTATUSCLASS 2G45530#1#0#2#NNNNNN#3#NNNNNN#S#NNNNNN#D#NNNNNN NNNNN 2M2461#1#0.043#2#0.0213#3#0.0778#S#0.0643#D#-0.0135GOOD__0__ 2MI3202#1#0.0213#2#0#3#0.0211#S#0.0213#D#0.0002GOOD__0__..… … … … … … … 2NGA112626#1#0.0225#2#0.0645#3#0.0702#S#0.087#D#0.0168GOOD__0__ 2SGCSNP13527#1#0.0645#2#0.0645#3#0.0842#S#0.129#D#0.0448GOOD__0__ 2MI5428#1#0.0645#2#0.0211#3#0.0968#S#0.0856#D#-0.0112GOOD__0__ 2VE01429#1#0.0211#2#0.0532#3#0.0745#S#0.0743#D#-0.0002GOOD__0__ 2M28330#1#0.0532#2#0.1803#3#0.1833#S#0.2335#D#0.0502GOOD__1__ 2SGCSNP33331#1#0.1803#2#0.1167#3#0.0968#S#0.297#D#0.2002GOODLARGE 2SGCSNP21032#1#0.1167#2#0#3#0.1154#S#0.1167#D#0.0013GOOD__0__ 2COP133#1#0#2#0.0196#3#0.0263#S#0.0196#D#-0.0067GOOD__0__ 2SPL334#1#0.0196#2#0.04#3#0.0339#S#0.0596#D#0.0257GOOD__0__ 2C4H35#1#0.04#2#0.0227#3#0.0303#S#0.0627#D#0.0324GOOD__0__..… … … … … … … 2M33654#1#0.0519#2#0.0625#3#0#S#0.1144#D#0.1144GOOD__X__ 2UBIQUE55#1#0.0625#2#0.0526#3#0.0619#S#0.1151#D#0.0532GOOD__1__ 2MI79A56#1#0.0526#2#0.0781#3#0.0645#S#0.1307#D#0.0662GOOD__1__ 2ATHB757#1#0.0781#2#0.1579#3#0.1698#S#0.236#D#0.0662GOOD__1__ 2SGCSNP21458#1#0.1579#2#0.1429#3#0.1667#S#0.3008#D#0.1341GOOD__X__ 2SGCSNP19859#1#0.1429#2#NNNNNN#3#NNNNNN#S#NNNNNN#D#NNNNNN NNNNN MadMapper_XDELTA Map Output: text tab-delimited file with ordered markers and detailed info about adjacent recombination scores A B C A – marker above B – middle marker C – marker below Distance [A-B] Distance [B-C] Distance [A-C] [A-B] + [B-C] ([A-B] + [B-C]) - [A-C] ################################################################# # # EXAMPLES OF SCORING: # # # POSITIVE LINKAGE: # # # AAAAAAAAAAAAAAAAAAAA BIT SCORE = 6*20 = 120 # # AAAAAAAAAAAAAAAAAAAA REC SCORE = 0 (0.0) # #.. # # AAAAAAAAAAAAAAAAAAAA BIT SCORE = 6*18 - 6*2 = 96 # # AAAAAAAAAAAAAAAAAABB REC SCORE = 2 (2/20 = 0.1) # # # AAAAAAAAAABBBBBBBBBB BIT SCORE = 6*10 + 6*10 = 120 # # AAAAAAAAAABBBBBBBBBB REC SCORE = 0 (0.0) # #.. # # AAAAAAAAABABBBBBBBBB BIT SCORE = 6*18 - 6*2 = 96 # # AAAAAAAAAABBBBBBBBBB REC SCORE = 2 (2/20 = 0.1) # # # NO LINKAGE: # #.......... # # AAAAAAAAAAAAAAAAAAAA BIT SCORE = 6*10 - 6*10 = 0 # # AAAAAAAAAABBBBBBBBBB REC SCORE = 10 (10/20 = 0.5) # #.......... # # BBBAABBAAAAAAABAABBB BIT SCORE = 6*10 - 6*10 = 0 # # BABBAABBABABABBBAABA REC SCORE = 10 (10/20 = 0.5) # # # NEGATIVE LINKAGE: # #.................. # # AAAAAAAAAAAAAAAAAAAA BIT SCORE = 6*2 - 6*18 = -96 # # AABBBBBBBBBBBBBBBBBB REC SCORE = 18 (18/20 = 0.9) # #.................. # # ABABABABABABABABABAB BIT SCORE = 6*2 - 6*18 = -96 # # ABBABABABABABABABABA REC SCORE = 18 (18/20 = 0.9) # # ################################################################# # +-------+ GENOTYPES: # # | BIT | A – 1st; B – 2nd # # SCORING SYSTEM: | | C - NOT A ( H or B ) # # | REC | D - NOT B ( H or A ) # # +-------+ H - A and B # # #. +-------+-------+-------+-------+-------+-------+ # #. | | | | | | | # #. | A | B | C | D | H | - | # #.| | | | | | | # # +-------*-------+-------+-------+-------+-------+-------+ # # | | 6 | -6 | -4 | 4 | -2 | 0 | # # | A | | | | | | | # # | | 0 | 1 | 1 | 0 | 0.5 | 0 | # # +-------+-------*-------+-------+-------+-------+-------+ # # | | -6 | 6 | 4 | -4 | -2 | 0 | # # | B | | | | | | | # # | | 1 | 0 | 0 | 1 | 0.5 | 0 | # # +-------+-------+-------*-------+-------+-------+-------+ # # | | -4 | 4 | 4 | -4 | 0 | 0 | # # | C | | | | | | | # # | | 1 | 0 | 0 | 1 | 0 | 0 | # # +-------+-------+-------+-------*-------+-------+-------+ # # | | 4 | -4 | -4 | 4 | 0 | 0 | # # | D | | | | | | | # # | | 0 | 1 | 1 | 0 | 0 | 0 | # # +-------+-------+-------+-------+-------*-------+-------+ # # | | -2 | -2 | 0 | 0 | 2 | 0 | # # | H | | | | | | | # # | | 0.5 | 0.5 | 0 | 0 | 0 | 0 | # # +-------+-------+-------+-------+-------+-------*-------+ # # | | 0 | 0 | 0 | 0 | 0 | 0 | # # | - | | | | | | | # # | | 0 | 0 | 0 | 0 | 0 | 0 | # # +-------+-------+-------+-------+-------+-------+-------*. # # ################################################################# MadMapper_RECBIT Dataflow: Input and Output files Genetic Map visualization using CheckMatrix: Two dimensional heat plot of recombination scores between all pairs of markers detection of problematic marker Inference of linear order of markers using MadMapper_XDELTA MadMapper_RECBIT, MadMapper_XDELTA and CheckMatrix: Python programs to infer orders of genetic markers and for visualization and validation of genetic maps and haplotypes (detailed description of dataflow) http://cgpdb.ucdavis.edu/XLinkage/MadMapper/ Alexander Kozik and Richard Michelmore. UC Davis Genome Center General procedure to construct a genetic map using the MadMapper suite: 1 – Grouping of markers using MadMapper_RECBIT 2 – Selection of up to ten core markers per linkage group 3 – Construction of frame map using core markers by checking all possible combinations 4 – Best-fit extension for remaining markers (optional shuffle/ripple function can dramatically improve map quality, however, it increases the time for map construction) 5 – Visualization of constructed map using CheckMatrix 6 – Examination of MadMapper_XDELTA text output files 7 – Attempt to re-map markers (if required) that do not fit well into major framework 8 – Construction, visualization and examination of final map Once the large framework map is constructed, adding new markers does not require changing the order of core markers and can be done relatively fast. In this case, the framework map is used with a fixed order to find the best positions for new markers. Analysis of MadMapper_RECBIT text output files provides: 1 – assignment of markers to particular linkage groups 2 – sorting of markers into different quality groups 3 – detection and discrimination of mis-scored markers 4 – selection of high quality markers to build core map 5 – creation of non-redundant set of markers for further map construction True Bin Trio-analysis helps reveal markers that were most likely misscored and should be dropped from further analysis Side-by-side comparison of scores (JoinMap LOD, JoinMap recombination, MadMapper BIT and MadMapper haplotype distances – REC) Best-Fit Extension: On each iteration of the best-fit extension, the proper position for the newly added marker corresponds to the two- dimensional matrix with the lowest entropy Building of framework map: The number of comparisons that have to done to check all possible orders of markers: # of markers - # of comparisons 3 markers – 3 4 markers – 12 5 markers – 60 6 markers – 360 7 markers – 2,520 8 markers – 20,160 9 markers – 181,440 10 markers – 1,814,400 Locus file with raw marker scores is used as initial input for MadMapper_RECBIT program Input files for MadMapper_XDELTA are usually output files from MadMapper_RECBIT Iterations of clustering with incremental cutoff values Arbitrary group ID after each round of clustering Allele composition/distortion [ excess of ‘B’ alleles in this particular case ] Marker ID Marker map position or relative order [ order in this particular case ] High density of markers Low density of markers lowest score for the best order of markers is highlighted in red confidence class for correct marker position [ LARGE is bad ] small absolute difference is good, large is bad separation of markers into two distinct linkage groups information about framework markers missing scores may create some problems when defining BINs MadMapper BIT Scoring Matrix examples of BIT scoring pairwise distance matrix … continue until all markers are inserted and ordered LG_1 LG_2 LG_3 LG_4 LG_5 LG_1 LG_2 LG_3 LG_4 LG_5 LG_1LG_2LG_3LG_4LG_5LG_1LG_2LG_3LG_4LG_5 flanking marker 1 flanking marker 2 middle marker MadMapper_XDELTA works with non-redundant set of scores framework markers are highlighted in red negative linkage between markers Locus file with raw marker scores: each allele is scored as ‘A’ or ‘B’ Marker ID Generation of segregating population: Collection (set) of Recombinant Inbred Lines after several steps of self-pollination Genotyping – assignment of a particular allele score to each marker It is a long process from obtaining a set of recombinant inbred lines (RILs) to its genotyping with a thousand markers or more. Management, data processing, and genetic mapping of thousands of markers simultaneously is not a trivial task. The MadMapper suite and CheckMatrix programs simplify genetic marker data manipulation and analysis. The suite has some features other genetic programs may lack. MadMapper and CheckMatrix perform well on large scale sets of genotyping data, such as data derived from SFP (single feature polymorphism) microarray analysis. Only one input file is required to accomplish map construction: the locus file with raw marker scores. However, there are several major steps and dozens of output files in the MadMapper pipeline. Understanding of the purpose of each step and output file is required for successful genetic mapping. This poster describes details of the dataflow. Data source: http://elp.ucdavis.edu/ West MA, van Leeuwen H, Kozik A, Kliebenstein DJ, Doerge RW, St Clair DA, Michelmore RW. High-density haplotyping with microarray-based expression and single feature polymorphism markers in Arabidopsis. Genome Res. 2006 Jun;16(6):787-795. [ PubMed:16702412 ] Example Project: Construction of high-density genetic map of Arabidopsis thaliana linkage group 1 based on Affymetrix microarray SFP genotyping data using MadMapper STEPS 1-2: Marker grouping and selection of framework markers STEPS 3-4-5: Map construction and visualization with CheckMatrix Comparison of inferred order with physical location of genes on Arabidopsis genome: Graphical genotyping: Graphical genotyping - RILs are grouped and sorted according to their haplotype patterns: Framework markers are highlighted in red sorting and grouping of RILs


Download ppt "; 1 10 20 25 ; | | | | GM01 A A A A A A A A A A A A A A A A B B B B B B B B B GM02 A A A A A A A A A A A A A A A B B B B B B B B B B GM03 A A A A A A A."

Similar presentations


Ads by Google