Presentation is loading. Please wait.

Presentation is loading. Please wait.

Alexander Kozik and Richard Michelmore, UC Davis Genome Center

Similar presentations


Presentation on theme: "Alexander Kozik and Richard Michelmore, UC Davis Genome Center"— Presentation transcript:

1 Alexander Kozik and Richard Michelmore, UC Davis Genome Center
Suite of Python MadMapper scripts for quality control of genetic markers, group analysis and inference of linear order of markers on linkage groups Visualization and validation of genetic maps using two-dimensional CheckMatrix heat-plots Alexander Kozik and Richard Michelmore, UC Davis Genome Center 1

2 Mapping using Recombinant Inbred Lines
Genetic Cross Genotyping Raw Marker Scores ; ; | | | |ocus file Mapping – Inference of Linear Order of Markers 2

3 CheckMatrix (py_matrix_2D_V248_RECBIT.py ) –
MadMapper and CheckMatrix are Python scripts and can be used on any computer platform: UNIX, Windows, Mac OS-X. Grouping can be done on a set of ~2,000 markers; map construction works in reasonable timeframe with up to ~500 markers MadMapper_RECBIT – - quality control of genetic markers and group analysis MadMapper_XDELTA – - inference of linear order of markers on linkage groups CheckMatrix (py_matrix_2D_V248_RECBIT.py ) – - visualization and validation of genetic maps using two-dimensional heat-plots and graphical genotyping 3

4 82 output files Group Info Summary: file [ *.x_tree_clust ]
MadMapper_RECBIT input and output files Recombination Distance Scores: [ *.pairs_all ] GM01 GM GM01 GM GM01 GM GM01 GM GM01 GM GM01 GM GM02 GM GM02 GM GM02 GM GM02 GM GM02 GM GM02 GM Trio Analysis: [ *.z_trio_good ] [ *.z_trio_best ] [ *.z_trios_bad ] analysis of all trios (triplets) for non-redundant set of markers one input file - locus file with raw marker scores 82 output files LOG file: ( *.x_log_file ) information about run parameters ; ; | | | | GM01 A A A A A A A A A A A A A A A A B B B B B B B B B GM02 A A A A A A A A A A A A A A A B B B B B B B B B B GM03 A A A A A A A A A A A A A B B B B B B B B B B B B GM04 A A A A A A A A A A A B B B B B B B B B B B B B B GM05 A A A A A A A A A A B B B B B B B B B B B B B B B GM06 A A A A A A A A A B B B B B B B B B B B B B B B B GM07 A A A A A A A A A B B B B B B B B B B B B B B A A GM08 A A A A A A A A A B B B B B B B B B B B B B A A A GM09 A A A A A A A A A B B B B B B B B B B B A A A A A GM10 B A A A A A A A A A B B B B B B B B B A A A A A A GM11 B B A A A A A A A A B B B B B B B B A A A A A A A GM12 B B B A A A A A A A B B B B B B B A A A A A A A A INPUT: Locus file Python_MadMapper_V248_RECBIT_012.py Marker Scores Info: [ *.x_scores_stat ] detailed information about scores and linkage Group Info: [ *.group_info ] one file per iteration 16 iterations with different cutoff values Marker summary: [ *.z_marker_sum ] for each marker ‘quality class’ is assigned - - useful for selection of ‘core’ markers Adjacency List: [ *.adj_list ] one file per iteration 16 iterations with different cutoff values Group Info Summary: file [ *.x_tree_clust ] Summary for clustering results for all 16 iterations Distinct linkage groups can be inferred by analysis of this clustering / grouping information Non-Redundant Marker Scores: [ *.z_nr_scores.loc ] locus file with non-redundant raw marker scores 4

5 MadMapper BIT scoring system
################################################################# # GENOTYPES: # # | BIT | A – 1st; B – 2nd # # SCORING SYSTEM: | | C - NOT A ( H or B ) # # | REC | D - NOT B ( H or A ) # # H - A and B # # # # # # | | | | | | | # # | A | B | C | D | H | | # # | | | | | | | # # * # # | | | | | | | | # # | A | | | | | | | # # | | | | | | | | # # * # # | | | | | | | | # # | B | | | | | | | # # | | | | | | | | # # * # # | | | | | | | | # # | C | | | | | | | # # | | | | | | | | # # * # # | | | | | | | | # # | D | | | | | | | # # | | | | | | | | # # * # # | | | | | | | | # # | H | | | | | | | # # | | | | | | | | # # * # # | | | | | | | | # # | | | | | | | | # # *. # ################################################################# # # # EXAMPLES OF SCORING: # # POSITIVE LINKAGE: # # AAAAAAAAAAAAAAAAAAAA BIT SCORE = 6*20 = # # AAAAAAAAAAAAAAAAAAAA REC SCORE = 0 (0.0) # # # # AAAAAAAAAAAAAAAAAAAA BIT SCORE = 6*18 - 6*2 = # # AAAAAAAAAAAAAAAAAABB REC SCORE = 2 (2/20 = 0.1) # # AAAAAAAAAABBBBBBBBBB BIT SCORE = 6*10 + 6*10 = # # AAAAAAAAAABBBBBBBBBB REC SCORE = 0 (0.0) # # # # AAAAAAAAABABBBBBBBBB BIT SCORE = 6*18 - 6*2 = # # AAAAAAAAAABBBBBBBBBB REC SCORE = 2 (2/20 = 0.1) # # NO LINKAGE: # # # # AAAAAAAAAAAAAAAAAAAA BIT SCORE = 6*10 - 6*10 = # # AAAAAAAAAABBBBBBBBBB REC SCORE = 10 (10/20 = 0.5) # # # # BBBAABBAAAAAAABAABBB BIT SCORE = 6*10 - 6*10 = # # BABBAABBABABABBBAABA REC SCORE = 10 (10/20 = 0.5) # # NEGATIVE LINKAGE: # # # # AAAAAAAAAAAAAAAAAAAA BIT SCORE = 6*2 - 6*18 = # # AABBBBBBBBBBBBBBBBBB REC SCORE = 18 (18/20 = 0.9) # # ABABABABABABABABABAB BIT SCORE = 6*2 - 6*18 = # # ABBABABABABABABABABA REC SCORE = 18 (18/20 = 0.9) # 5

6 Arabidopsis Genetic Map: Comparison of Different Scoring Systems
JoinMap LOD scores JoinMap REC scores MadMapper BIT scores MadMapper REC scores 6

7 MadMapper_RECBIT Clustering: Group Info Summary [ *.x_tree_clust file ]
7

8 MadMapper_RECBIT BIN Analysis
M_1 A A A B B B A A A A B B B B A A - A A B B B B A B B A A B A A A B B B B M_2 A A A B B B A - A A B B B B A A A A A B B B B A B B A A B A A A B B B B M_3 A A A B B B A A A A B B - B A A A A A B B B B A B B A A B A A A B B B B M_4 A A A B B B A A A A B B A B A A A A A B B - B A B B A A B A A A B B B B M_2 M_1 M_3 Linked Group Diluted Node Saturated Node Example of Complete Graph: all nodes are ‘saturated’ M_4 8

9 MadMapper_RECBIT Marker Summary [ *.z_marker_sum file ]
9

10 MadMapper_RECBIT Trio (Triplet) Analysis
M_1 A A A B B B A A A A B B B B A A - A A B B B B A B B A A B A A A B B B B X X X X M_M A A A B A B A - A A B B B B A B A A A B B A B A B B A A B A A A B B A B M_2 A A A B B B A A A A B B - B A A A A A B B B B A B B A A B A A A B B B B flanking marker 1 ‘middle’ marker flanking marker 2 MARKER_Flank_1 REC1 BIT1 D_FR1 MARKER_Middle REC2 BIT2 D_FR2 MARKER_Flank_2 REC_Flank BIT_F D_FR_F D_REC D_REC_Flank COR47 0.1 336 0.6931 CAT3 0.0857 348 G2395 *** 0.0253 450 0.7822 5 + LK141 0.1522 192 0.4554 GUT15 0.193 210 0.5644 MI238 0.1231 294 0.6436 4 MI204 0.051 528 0.9703 MI51 0.0806 312 0.6139 SGCSNP41 0.0161 360 M336 0.0494 438 0.802 COR15 0.0385 432 0.7723 VE018 0.0879 0.901 ARR7 0.0115 510 0.8614 282 0.4653 F15571 0.0179 324 0.5545 PAP3 0.0227 504 0.8713 COR78 0.0577 276 0.5149 PDC2 0.0588 270 0.505 Bad Trios Number of double crossovers is high for bad trios Good Trios Number of double crossovers should be low for good trios 10

11 CheckMatrix Usage: three input files are required
LG GM LG GM LG GM LG GM LG GM LG GM LG GM LG GM LG GM LG GM LG GM LG GM Map file GM01 GM GM01 GM GM01 GM GM01 GM GM01 GM GM01 GM GM02 GM GM02 GM GM02 GM GM02 GM GM02 GM GM02 GM Matrix file ; ; | | | | GM01 A A A A A A A A A A A A A A A A B B B B B B B B B GM02 A A A A A A A A A A A A A A A B B B B B B B B B B GM03 A A A A A A A A A A A A A B B B B B B B B B B B B GM04 A A A A A A A A A A A B B B B B B B B B B B B B B GM05 A A A A A A A A A A B B B B B B B B B B B B B B B GM06 A A A A A A A A A B B B B B B B B B B B B B B B B GM07 A A A A A A A A A B B B B B B B B B B B B B B A A GM08 A A A A A A A A A B B B B B B B B B B B B B A A A GM09 A A A A A A A A A B B B B B B B B B B B A A A A A GM10 B A A A A A A A A A B B B B B B B B B A A A A A A GM11 B B A A A A A A A A B B B B B B B B A A A A A A A GM12 B B B A A A A A A A B B B B B B B A A A A A A A A Locus file CheckMatrix (py_matrix_2D_V248_RECBIT.py ) upon program execution three output files will be generated: HEAT PLOT – it assists to validate the quality of constructed genetic map and identify markers with wrong position GRAPHICAL GENOTYPING: visualization of haplotypes per recombinant line (suspicious double crossovers are highlighted) CIRCULAR GRAPH – it assists to validate genetic map and identify markers with spurious linkage 11

12 Genetic Map Visualization using CheckMatrix
[ good map ] 12

13 Genetic Map Visualization using CheckMatrix
[ wrong map ] 13

14 Genetic Map Visualization using CheckMatrix
[ disordered markers ] 14

15 Minimum Entropy Approach to Infer Linear Order
Using MadMapper_XDELTA program CheckMatrix 2D plot: random order high ‘entropy’ MadMapper_XDELTA analyzes two-dimensional matrices of all pairwise scores and finds best map that has minimal total sum of differences between adjacent cells (map with lowest ‘entropy’). partially wrong order right order low ‘entropy’ Visualization of numerical data using ChekMatrix 15

16 MINIMUM ENTROPY APPROACH TO INFER LINEAR ORDER OF MARKERS:
CheckMatrix Color Scheme Two-dimensional matrix of recombination pairwise scores adjacent cells (values) Visualization of numerical data using CheckMatrix Numerical data generated by MadMapper 16

17 optionally: unlimited list of ‘frame’ markers with fixed order
MadMapper_XDELTA Usage: MadMapper_XDELTA takes as input three files: Matrix (pairwise distances between markers) List of ‘frame’ markers List of markers to map First step: finding of the best map for ‘frame’ markers by checking all possible combinations (up to 10 markers) optionally: unlimited list of ‘frame’ markers with fixed order Best-Fit extension Take one marker from the list of markers to map and insert it into 2-dimensional matrix of the current best map. Check for all possible positions. Calculate ‘delta’ and find the map with lowest ‘delta’ value (lowest ‘entropy’) Move to the next marker to map until all markers are mapped. Optional shuffling (ripple) after several steps 17

18 map calculated by checking of all possible combinations
Example of Best-Fit Extension: ============================================= MATRIX (ALL PAIRS) : madmapper_test_small.out.pairs_all MARKERS TO MAP : madmapper_test_small.list FRAME MARKERS LIST : madmapper_test_small.frame OUTPUT MAP FILE : madmapper_test_small.xdelta MAX FRAME LENGTH : 12 FIXED FRAME ORDER : FALSE LINKAGE GROUP ID : LG DUMMY DEBUG : TRUE ======= GM02 GM06 GM10 *** *** *** 1 GM02 GM10 GM06 *** *** *** 2 GM06 GM02 GM10 *** *** *** 3 GM03 GM02 GM06 GM10 *** *** *** 1 GM02 GM03 GM06 GM10 *** *** *** 2 GM02 GM06 GM03 GM10 *** *** *** 3 GM02 GM06 GM10 GM03 *** *** *** 4 GM08 GM02 GM03 GM06 GM10 *** *** *** 1 GM02 GM08 GM03 GM06 GM10 *** *** *** 2 GM02 GM03 GM08 GM06 GM10 *** *** *** 3 GM02 GM03 GM06 GM08 GM10 *** *** *** 4 GM02 GM03 GM06 GM10 GM08 *** *** *** 5 GM09 GM02 GM03 GM06 GM08 GM10 *** *** *** 1 GM02 GM09 GM03 GM06 GM08 GM10 *** *** *** 2 GM02 GM03 GM09 GM06 GM08 GM10 *** *** *** 3 GM02 GM03 GM06 GM09 GM08 GM10 *** *** *** 4 GM02 GM03 GM06 GM08 GM09 GM10 *** *** *** 5 GM02 GM03 GM06 GM08 GM10 GM09 *** *** *** 6 map calculated by checking of all possible combinations marker GM03 was inserted marker GM08 was inserted marker GM09 was inserted 18

19 MadMapper_XDELTA Map Output
A – marker above B – middle marker Distance [A-B] Distance [B-C] Distance [A-C] [A-B] + [B-C] ([A-B] + [B-C]) - [A-C] C – marker below LG MARKER POS #1# DST1 #2# DST2 #3# DST3 #S# SUMM #D# DIFF STATUS CLASS 2 G4553 NNNNNN NNNNN M246 1 0.043 0.0213 0.0778 0.0643 GOOD __0__ MI320 0.0211 0.0002 .. NGA1126 26 0.0225 0.0645 0.0702 0.087 0.0168 SGCSNP135 27 0.0842 0.129 0.0448 MI54 28 0.0968 0.0856 VE014 29 0.0532 0.0745 0.0743 M283 30 0.1803 0.1833 0.2335 0.0502 __1__ SGCSNP333 31 0.1167 0.297 0.2002 LARGE SGCSNP210 32 0.1154 0.0013 COP1 33 0.0196 0.0263 SPL3 34 0.04 0.0339 0.0596 0.0257 C4H 35 0.0227 0.0303 0.0627 0.0324 M336 54 0.0519 0.0625 0.1144 __X__ UBIQUE 55 0.0526 0.0619 0.1151 MI79A 56 0.0781 0.1307 0.0662 ATHB7 57 0.1579 0.1698 0.236 SGCSNP214 58 0.1429 0.1667 0.3008 0.1341 SGCSNP198 59 A B C 19

20 20

21 Physical order of markers (based on BLAST search)
Side-by-side comparison of linear order of markers on Arabidopsis genome inferred by three different approaches (mapping programs) and comparison with physical order of markers (Col- 0 genomic sequence): MadMapper_XDELTA (minimum entropy approach), JoinMap (maximum likelihood) and RECORD (minimum number of recombination events) [Diagonal dot-plot was created using GenoPix_2D_Plotter ] Inferred order of markers by mapping programs MadMapper JoinMap RECORD 21

22 Arabidopsis Genetic Map constructed by MadMapper and visualized with CheckMatrix: 2D Heat Plot
Linkage group I Regions with Negative Linkage Main Diagonal with Linked Markers Linkage group II Linkage group III Regions with Quasi Linkage High Density of Markers Linkage group IV Low Density of Markers Allele Composition per Marker Linkage group V Linkage group I Linkage group II Linkage group III Linkage group IV Linkage group V 22

23 Arabidopsis Genetic Map constructed by MadMapper
Linkage group I Linkage group II Linkage group III Linkage group IV Linkage group V Arabidopsis Genetic Map constructed by MadMapper and visualized with CheckMatrix: Graphical Genotyping 23

24 REFERENCES AND DATA SOURCES:
1. Dean and Lister Arabidopsis Genetic Map and Raw Data: 2. MadMapper: 3. JoinMap: 4. RECORD: 5. GenoPix_2D_Plotter CREDITS: This work was funded by NSF grant # to Compositae Genome Consortium PAG-14 POSTERS WITH EXAMPLES OF MADMAPPER USAGE: #P751 High-Density Haplotyping With Microarray-Based Single Feature Polymorphism Markers In Arabidopsis #P761 Gene Expression Markers: Using Transcript Levels Obtained From Microarrays To Genotype A Segregating Population #P957 MadMapper And CheckMatrix - Python Scripts To Infer Orders Of Genetic Markers And For Visualization And Validation Of Genetic Maps And Haplotypes 24


Download ppt "Alexander Kozik and Richard Michelmore, UC Davis Genome Center"

Similar presentations


Ads by Google