Winners and Losers: Ranking Crystals from Diffraction Images Angela R. Criswell Automation Scientist.

Winners and Losers: Ranking Crystals from Diffraction Images Angela R. Criswell Automation Scientist

ACTOR Installations Pharmaceutical Companies (11)Pharmaceutical Companies (11)  Abbott Laboratories (Chicago, IL)  Astex Technology (UK)  AstraZeneca (UK)  Aventis (Frankfurt)  BMS (Princeton, NJ)  Exelixis (San Francisco, CA)  Merck (West Point, PA)  Novartis (Basel, Switzerland)  Novartis (Cambridge, MA)  Pfizer (St. Louis, MO)  Schering-Plough Research Inst. (NJ) Structural Genomics Groups (3)Structural Genomics Groups (3)  SGC – Oxford (UK)  University of Georgia  University of Toronto Beamlines (2)Beamlines (2)  Daresbury Laboratory (UK)  IMCA-CAT (APS) Future Installations (4)Future Installations (4)  2 additional beamlines (SLS, Diamond)  1 pharmaceutical company AGENT Installations (3)AGENT Installations (3)  ActiveSight (San Diego, CA)  2 future pharmaceutical sites

High Throughput Optimization Automate the processesAutomate the processes  Crystallization robots  Sample mounting robots  Automated structure solution Increase robustness for automated processesIncrease robustness for automated processes  Hardware and software improvements  Sample tracking methods and database management Ever increasing complexityEver increasing complexity  Incorporate intelligence and examine success/failure.  Heuristic and learning methods  Remote access and control of automated processes  VNC and mail-in crystallography  Diffraction improvement by controlled hydration  Free-mounting system (Proteros)

Crystal Ranking: An Evolution

Do I have another crystal??Do I have another crystal?? Is the crystal twinned?Is the crystal twinned? How far does the crystal diffract?How far does the crystal diffract? Are there ice rings?Are there ice rings? Do peaks have a decent spot shapes?Do peaks have a decent spot shapes? Can I assign a unit cell for the sample?Can I assign a unit cell for the sample? What are the unit cell dimensions and space group?What are the unit cell dimensions and space group? How do Crystallographers Rank Crystals?? I/sig(I) analysis is not sufficientI/sig(I) analysis is not sufficient Single image is probably not sufficientSingle image is probably not sufficient

Crystal Ranking Efforts d*TREK (Rigaku/MSC - Pflugrath)d*TREK (Rigaku/MSC - Pflugrath)  automatic indexing, ranking, strategy, integration, scaling DISTL and LABELIT (SSRL & LBNL)DISTL and LABELIT (SSRL & LBNL)  Automatic ranking and indexing, data processing DNA (SPINE)DNA (SPINE)  Automatic ranking and indexing CrySis (Brookhaven – Bernston, Stojanoff, and Takai)CrySis (Brookhaven – Bernston, Stojanoff, and Takai)  ranking with neural network trained with 500 diff images BEST (EMBL – Popov)BEST (EMBL – Popov)  Data collection strategy based upon statistic modeling

SpamAssassin Email SCORE: Advertisement for SuperBowl Celebration Event No. hits=3.9 Required=4.0No. hits=3.9 Required=4.0  tests= HTML_60_70 HTML_FONTCOLOR_RED HTML_FONTCOLOR_UNSAFE HTML_FONT_INVISIBLE HTML_MESSAGE HTTP_ESCAPED_HOST HTTP_EXCESSIVE_ESCAPES LINES_OF_YELLING Performs cursory header analysis: spots emails that try to mask their identitiesPerforms cursory header analysis: spots emails that try to mask their identities Performs in-depth text analysis: spam mails often have a characteristic style (to put it politely)Performs in-depth text analysis: spam mails often have a characteristic style (to put it politely)  characteristic disclaimers and lots of !!!!!  webpage links Enables blacklisting: block email from existing blacklist sitesEnables blacklisting: block email from existing blacklist sites Adaptive learns to recognize spam based upon user scores and amend blacklistsAdaptive learns to recognize spam based upon user scores and amend blacklists

Strategic Ranking Goals Incorporate image analysis tools aloneIncorporate image analysis tools alone  Diffraction limits  Bragg peak intensities  Background radiation  Ice ring identification – strong and diffuse Incorporate indexing and refinement resultsIncorporate indexing and refinement results  Spot shape  Lattice quality  Spot prediction analysis (discriminates twinned from non-twinned crystals) Incorporate Comparative analysisIncorporate Comparative analysis  Between samples (rank comparisons)  Images collected for same sample (different crystal orientations)  Automatic exposure time determination

Rules 1 and 2 Divide image into 10 resolution bins. Ignore lowest 3 bins. Analyze 7 highest resolution shells # reflns / shell S:N of reflns / shell

Rule 3: Spot Sharpness calculated for every peakcalculated for every peak output = avg 2(A/B) A = peak max position – peak center position x 1 x 2 x 1 x 2 B = ( Δx 2 + Δy 2 )1/2 B is the effective diameter of the peak.

Rules 4 – 5: Ice Ring Detection Step 1: filter out peaks from imagesStep 1: filter out peaks from images Step 2: bin pixels by 2θStep 2: bin pixels by 2θ Step 3: for each bin, sum pixel intensitiesStep 3: for each bin, sum pixel intensities Example plot:

Lysozyme 2_05 rank = 202

Rules 6 - 11 6. 6.Indexing Award for percentage of indexed spots 7. 7.Refinement Penalty based upon RMS MM residual 8. 8.Mosaicity Penalty based upon refined mosaicity 9. 9.Refinement Coverage Award for percentage of accepted reflections in prediction list 10. 10.Prediction Re-evaluate highest 7 resolution shells based upon number of found spots that match predicted reflection list 11. 11.Refined Reflection Resolution Re-evaluate highest 7 resolution shells based upon the signal-to-noise ratio of predicted reflections

Rule 1: Spot count in resolution shells (found spots) Rule 1: Spot count in resolution shells (found spots) Rule 2: I/Sigma in resolution shells (found spots) Rule 2: I/Sigma in resolution shells (found spots) Rule 3: Spot sharpness Rule 3: Spot sharpness Rule 4: Strong ice rings Rule 4: Strong ice rings Rule 5: Diffuse ice rings Rule 5: Diffuse ice rings Rule 6: Percentage of spots indexed Rule 6: Percentage of spots indexed Rule 7: RMS residual after refinement Rule 7: RMS residual after refinement Rule 8: Mosaicity Rule 8: Mosaicity Rule 9: Percentage of spots refined Rule 9: Percentage of spots refined Rule 10: Spot count in resolution shells (predicted and found spots) Rule 10: Spot count in resolution shells (predicted and found spots) Rule 11: I/Sigma in resolution shells (predicted and found spots) Rule 11: I/Sigma in resolution shells (predicted and found spots) Sample / Rules 1 2 3 4 5 6 7 8 9 10 11 Total L:\Images\lyso101_????.osc 1 70 60 -1 -10 0 50 -17 -20 28 70 62 292 Ranking Results

Sample Group #1 Tests with Lysozyme crystals

Lysozyme 2_05 rank = 202 ------------------------------------------------------------------------------- Category Points Cumul ------------------------------------------------------------------------------- >=5 reflns found in 2nd shell (1.79-1.86)Å 10 10 >=5 reflns found in 3rd shell (1.86-1.94)Å 10 20 >=5 reflns found in 4th shell (1.94-2.04)Å 10 30 >=5 reflns found in 5th shell (2.04-2.17)Å 10 40 >=5 reflns found in 6th shell (2.17-2.34)Å 10 50 >=5 reflns found in 7th shell (2.34-2.58)Å 10 60 I/sig == 44.8 in 2nd found shell (1.79-1.86)Å 7 67 I/sig == 56.8 in 3rd found shell (1.86-1.94)Å 9 76 I/sig == 60.1 in 4th found shell (1.94-2.04)Å 10 86 I/sig == 67.7 in 5th found shell (2.04-2.17)Å 10 96 I/sig == 74.2 in 6th found shell (2.17-2.34)Å 10 106 I/sig == 89.7 in 7th found shell (2.34-2.58)Å 10 116 Penalty for spot sharpness of 0.06 -1 115 Penalty for strong ring (2.82%) near resln. 3.513 -10 105 Penalty for diffuse ring (0.70%) near resln. 3.943 -5 100 Indexed 404 spots, or 75% of all spots used in indexing 74 174 Penalty for RMS residual value of 0.164 -16 158 Penalty for Mosaicity value of 0.4 -19 139 Refined 44 spots, or 4% of all predictions 3 142 >=5 reflns predicted and found in 5th shell (2.04-2.17)Å 10 152 >=5 reflns predicted and found in 6th shell (2.17-2.34)Å 10 162 >=5 reflns predicted and found in 7th shell (2.34-2.58)Å 10 172 I/sig == 77.7 in 5th predicted and found shell (2.04-2.17)Å 10 182 I/sig == 80.8 in 6th predicted and found shell (2.17-2.34)Å 10 192 I/sig == 94.5 in 7th predicted and found shell (2.34-2.58)Å 10 202 ------------------------------------------------------------------------------- Cumulative 202

Lysozyme 2_01 rank = 179 ------------------------------------------------------------------------------- Category Points Cumul ------------------------------------------------------------------------------- >=5 reflns found in 2nd shell (1.79-1.86)Å 10 10 >=5 reflns found in 3rd shell (1.86-1.94)Å 10 20 >=5 reflns found in 4th shell (1.94-2.04)Å 10 30 >=5 reflns found in 5th shell (2.04-2.17)Å 10 40 >=5 reflns found in 6th shell (2.17-2.34)Å 10 50 >=5 reflns found in 7th shell (2.34-2.58)Å 10 60 I/sig == 49.8 in 2nd found shell (1.79-1.86)Å 8 68 I/sig == 47.0 in 3rd found shell (1.86-1.94)Å 7 75 I/sig == 52.8 in 4th found shell (1.94-2.04)Å 8 83 I/sig == 65.7 in 5th found shell (2.04-2.17)Å 10 93 I/sig == 69.9 in 6th found shell (2.17-2.34)Å 10 103 I/sig == 86.8 in 7th found shell (2.34-2.58)Å 10 113 Penalty for spot sharpness of 0.10 -1 112 Penalty for strong ring (2.78%) near resln. 3.555 -10 102 Penalty for diffuse ring (0.55%) near resln. 3.943 -5 97 Indexed 342 spots, or 56% of all spots used in indexing 56 153 Penalty for RMS residual value of 0.182 -18 135 Penalty for Mosaicity value of 0.3 -15 120 Refined 24 spots, or 2% of all predictions 2 122 >=5 reflns predicted and found in 4th shell (1.94-2.04)Å 10 132 >=5 reflns predicted and found in 5th shell (2.04-2.17)Å 10 142 >=5 reflns predicted and found in 6th shell (2.17-2.34)Å 10 152 I/sig == 44.4 in 4th predicted and found shell (1.94-2.04)Å 7 159 I/sig == 87.2 in 5th predicted and found shell (2.04-2.17)Å 10 169 I/sig == 67.0 in 6th predicted and found shell (2.17-2.34)Å 10 179 ------------------------------------------------------------------------------ Cumulative 179

Lysozyme 2_10 rank = 124 ------------------------------------------------------------------------------- Category Points Cumul ------------------------------------------------------------------------------- >=5 reflns found in 3rd shell (1.86-1.94)Å 10 10 >=5 reflns found in 4th shell (1.94-2.04)Å 10 20 >=5 reflns found in 5th shell (2.04-2.17)Å 10 30 >=5 reflns found in 6th shell (2.17-2.34)Å 10 40 >=5 reflns found in 7th shell (2.34-2.58)Å 10 50 I/sig == 54.8 in 3rd found shell (1.86-1.94)Å 9 59 I/sig == 55.3 in 4th found shell (1.94-2.04)Å 9 68 I/sig == 64.3 in 5th found shell (2.04-2.17)Å 10 78 I/sig == 72.1 in 6th found shell (2.17-2.34)Å 10 88 I/sig == 86.1 in 7th found shell (2.34-2.58)Å 10 98 Penalty for spot sharpness of 0.07 -1 97 Penalty for strong ring (2.64%) near resln. 4.162 -10 87 Penalty for strong ring (2.05%) near resln. 3.875 -10 77 Penalty for strong ring (1.84%) near resln. 3.434 -10 67 Penalty for strong ring (6.76%) near resln. 2.139 -10 57 Penalty for strong ring (7.87%) near resln. 1.975 -10 47 Penalty for strong ring (4.78%) near resln. 1.875 -10 37 Indexed 305 spots, or 58% of all spots used in indexing 58 95 Penalty for RMS residual value of 0.121 -12 83 Penalty for Mosaicity value of 0.4 -18 65 >=5 reflns predicted and found in 5th shell (2.04-2.17)Å 10 75 >=5 reflns predicted and found in 6th shell (2.17-2.34)Å 10 85 >=5 reflns predicted and found in 7th shell (2.34-2.58)Å 10 95 I/sig == 57.8 in 5th predicted and found shell (2.04-2.17)Å 9 104 I/sig == 61.2 in 6th predicted and found shell (2.17-2.34)Å 10 114 I/sig == 103.3 in 7th predicted and found shell (2.34-2.58)Å 10 124 ------------------------------------------------------------------------------- Cumulative 124

Lysozyme 4_12 rank = 112 ------------------------------------------------------------------------------- Category Points Cumul ------------------------------------------------------------------------------- >=5 reflns found in 5th shell (2.25-2.39)Å 10 10 >=5 reflns found in 6th shell (2.39-2.57)Å 10 20 >=5 reflns found in 7th shell (2.57-2.83)Å 10 30 I/sig == 15.7 in 5th found shell (2.25-2.39)Å 2 32 I/sig == 19.5 in 6th found shell (2.39-2.57)Å 3 35 I/sig == 22.9 in 7th found shell (2.57-2.83)Å 3 38 Penalty for spot sharpness of 0.10 -1 37 Penalty for strong ring (1.09%) near resln. 4.031 -10 27 Indexed 242 spots, or 57% of all spots used in indexing 57 84 Penalty for RMS residual value of 0.086 -8 76 Penalty for Mosaicity value of 0.5 -20 56 Refined 186 spots, or 19% of all predictions 18 74 >=5 reflns predicted and found in 5th shell (2.25-2.39)Å 10 84 >=5 reflns predicted and found in 6th shell (2.39-2.57)Å 10 94 >=5 reflns predicted and found in 7th shell (2.57-2.83)Å 10 104 I/sig == 17.6 in 5th predicted and found shell (2.25-2.39)Å 2 106 I/sig == 19.7 in 6th predicted and found shell (2.39-2.57)Å 3 109 I/sig == 22.4 in 7th predicted and found shell (2.57-2.83)Å 3 112 ------------------------------------------------------------------------------- Cumulative 112

Effect of Indexing on Rank Values

Score Variability Rank Values vs. Exposure Time Images / Rules 1 2 3 4 5 6 7 8 9 10 11 Total Thaumatin – 5 sec/0.5º: R merge = 12.9 % (32.5 %) thau3 501,561 60 22 -2 -20 0 56 -5 -9 19 70 23 214 thau3 501 60 22 -2 -20 0 58 -5 -12 14 60 21 196 thau3 545 50 18 -3 -20 0 55 -5 -6 24 50 16 179 thau3 590 60 28 -3 -20 0 55 -6 -10 18 60 22 204 thau3 626 50 22 -3 -20 0 59 -6 -7 20 70 26 211 Thaumatin – 10 sec/0.5º: R merge = 10.3 % (27.5 %) thau3 1001,1061 70 32 -3 -20 0 57 -6 -11 20 70 30 239 thau3 1001 60 31 -3 -20 0 57 -6 -12 18 60 28 213 thau3 1045 60 26 -3 -20 0 53 -6 -11 22 70 25 216 thau3 1090 60 32 -3 -20 0 57 -6 -10 21 60 27 218 thau3 1126 70 33 -2 -20 0 55 -6 -13 17 70 33 237 Thaumatin – 30 sec/0.5º: R merge = 8.4 % (25.8 %) thau3 3001,3061 70 46 -3 -20 0 53 -7 -12 21 70 42 260 thau3 3001 60 40 -3 -20 0 57 -6 -11 21 60 40 238 thau3 3045 70 45 -3 -20 0 54 -6 -10 24 70 40 264 thau3 3090 70 48 -3 -20 0 57 -6 -11 23 70 42 270 thau3 3126 70 47 -2 -20 0 56 -6 -11 20 70 44 268 197.5 221 260

Images / Rules 1 2 3 4 5 6 7 8 9 10 11 Total VariMax-HR : R merge = 2.9 % (22.3 %) LYS0503_screen 1-2 70 46 -1 -10 -5 51 -18 -13 46 70 42 278 LYS0503_screen 1 70 46 -1 -10 -5 54 -15 -11 50 70 41 289 LYS0503_screen 2 70 44 -1 -10 0 56 -18 -14 44 70 41 282 LYS0503_ 1 70 46 -1 0 -5 56 -17 -12 48 70 42 297 LYS0503_ 45 70 46 -1 -10 0 57 -15 -13 45 70 42 291 LYS0503_ 90 70 46 -1 -10 -5 57 -16 -12 47 70 42 288 LYS0503_ 116 70 46 -1 -10 -5 57 -16 -12 49 70 40 288 VariMax-HR : R merge = 2.8 % (15.0 %) LYS0503_screen 1-2 70 57 -1 -10 0 56 -23 -18 39 70 57 297 LYS0503_screen 1 70 58 -1 0 -15 57 -23 -17 42 70 57 298 LYS0503_screen 2 70 57 -1 -10 0 59 -23 -17 42 70 57 304 LYS0503_ 1 70 57 -1 -10 0 58 -21 -17 43 70 56 305 LYS0503_ 45 70 58 -1 -10 0 57 -21 -17 46 70 56 308 LYS0503_ 90 70 58 -1 -10 -5 55 -22 -18 39 70 57 293 LYS0503_ 116 70 57 -1 0 -5 57 -22 -14 47 70 55 314 Score Variability Data sets collected with VariMax optics 291 305

What Have We Learned? Signal-to-noise is predominant factor in current d*TREK releaseSignal-to-noise is predominant factor in current d*TREK release  This is intentional! Should it be?  Each of the 11 rules have independent parameters that can be adjusted to optimize for your case Image processing adds domino effect to rankingImage processing adds domino effect to ranking  Better refinement, higher rank  Lower mosaicity, higher rank  Fewer twin spots, higher rank Spot sharpness analysis is not robustSpot sharpness analysis is not robust  Incorporate graph theory Potential PitfallsPotential Pitfalls  Weak diffractors  lowest 3 resolution bins should not excluded from spot analysis  Image Header Accuracies  Anisotropy  Need images at multiple angles  These effects become effectively ‘averaged’ across images  Merohedral twinning

Recent d*TREK Improvements Don’t ignore lowest resolution binsDon’t ignore lowest resolution bins Image Header AccuraciesImage Header Accuracies  Command line override AnisotropyAnisotropy  Incorporated anisotropy check and another rule  Rank each image, calculate average and ESD  Apply penalty as multiple of ESD Data Collection Strategy improvementsData Collection Strategy improvements  Automatic exposure time calculation (using ‘intelligent’ algorithm)  Optimize detector space for diffraction resolution  Multiple scan strategy, if possible

Acknowledgements Russ Athay Robert Bolotovsky Joseph D. Ferrara Thad Niemeyer Karen Opersteny J.W. Pflugrath

ACTOR Acknowledgements Rigaku/MSC (top 2 rows) James Pflugrath Angela Criswell Joseph Ferrara David Edwards Russ Athay Keith Crane John Edwards Kris Tesh Thomas Hendrixson Thaddeus Niemeyer Not Shown: Robert Bolotovsky Charlie Stence Karen Opersteny Stephen Sherbert John Ziegler Oceaneering Space Systems (middle row) Richard Shafer Terry Nienaber Kent Copeland Bill Robertson Abbott Laboratories (bottom row) Jeff Olson Steve Muchmore Jonathan Greer Ronald Jones Jeffrey Pan

Winners and Losers: Ranking Crystals from Diffraction Images Angela R. Criswell Automation Scientist.

Similar presentations

Presentation on theme: "Winners and Losers: Ranking Crystals from Diffraction Images Angela R. Criswell Automation Scientist."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Winners and Losers: Ranking Crystals from Diffraction Images Angela R. Criswell Automation Scientist.

Similar presentations

Presentation on theme: "Winners and Losers: Ranking Crystals from Diffraction Images Angela R. Criswell Automation Scientist."— Presentation transcript:

Similar presentations

About project

Feedback