Presentation is loading. Please wait.

Presentation is loading. Please wait.

Crystallization Image Analysis on the World Community Grid Christian A. Cumbaa and Igor Jurisica Jurisica Lab, Division of Signaling Biology Ontario Cancer.

Similar presentations


Presentation on theme: "Crystallization Image Analysis on the World Community Grid Christian A. Cumbaa and Igor Jurisica Jurisica Lab, Division of Signaling Biology Ontario Cancer."— Presentation transcript:

1 Crystallization Image Analysis on the World Community Grid Christian A. Cumbaa and Igor Jurisica Jurisica Lab, Division of Signaling Biology Ontario Cancer Institute, Toronto, Ontario

2 2 Why automate classification of protein crystallization trial images? Hauptman-Woodward has 65,000,000 images. –They want 65,000,000 outcomes. clear phase separation precipitate skin crystal X garbage unsure

3 3 Why automate classification of protein crystallization trial images? Assist or replace human screening Speed the search phase in protein crystallization Improve throughput, consistency, objectivity Enables data mining and statistical optimization of the crystallization process clearprecipitatecrystal

4 4 Image classification clear phase separation precipitate skin crystal X garbage unsure 100000s of numbers7 numbers10s of numbers feature 1 feature 2 … feature k feature extractionclassification

5 5 Truth data 96 study –96 proteins X 1536 images hand-scored by 3 experts –Presence/absence of 7 independent outcomes NESG & SGPP –15000 images –Hand-scored by 1 expert, same scoring system 50% unanimously-scored images –10 most interesting compound categories 96-study SGPP (crystals) NESG (crystals)

6 6 Feature set 12375 features computed per image –A few basic statistics –50 microcrystal features –Euler number features, two variations 1.11 Blur levels 2.11 Blur levels X 4 thresholds –Image “energy” 11 blur levels –2925 Grey-Level Co- occurrence Matrix features 3 different grey-level quantizations 13 basic functions 25 sample distances ~100 directions –Computable from every point in the image –Distilled to max range, max mean, min mean –~9500 image-blob features Radon & edge-detection

7 7 Our image analysis problem Computing all 12,375 features takes >5 hours for a single image We have 165,000 images in our training set Features must be evaluated for quality The best features (10s or low 100s) must be computed for the remaining 65,000,000 images Massive computing resources required!

8 8 Image analysis on the World Community Grid http://www.worldcommunitygrid.org –a global, distributed-computing platform for solving large scientific computing problems with human impact –377,627 volunteers contribute idle CPU time of 960,346 devices. Our project: Help Conquer Cancer* –launched November 2007. HCC has two goals: 1.To survey a wide tract of image-feature space and identify image analysis algorithms and parameters (features) that best determine crystallization outcome. 2.To perform the necessary image analysis on Hauptman Woodward’s archive of 65,000,000 crystallization trial images. * fundraising slogan of the Ontario Cancer Institute and its parent organization.

9 9 Image analysis on the World Community Grid HCC has two phases –Phase I: calculate 12,375 features per image on high-priority images, including 165,441 hand-scored images. –November 2007-May 2008 –analysis on hand-scored images completed January 2008 –Phase II: calculate the best features from Phase I on the backlog of HWI images Grid members have contributed 8,919 CPU- years so far to HCC, an average of 55 CPU- years per day.

10 10

11 11

12 Phase I: feature assessment

13 13 Measuring feature quality Treat as random variables: –Image class –Feature value Measure the mutual information between them (unit: bits) = entropy(class) + entropy(feature) – entropy(class,feature) feature entropy class entropy

14 14 clear precipitate (no crystal) other Measuring feature quality

15 15 Information density: microcrystal counts parameter space ClearPrecipitateCrystal

16 16 Information density: GLCM maximum range parameter space ClearPrecipitateCrystal

17 17 Information density: Radon-Sobel soft sum parameter space ClearPrecipitateCrystal

18 18 Information density: Radon-Sobel blob metrics (means) parameter space ClearPrecipitateCrystal

19 Towards Phase II: image classification

20 20 Building classifiers handpicked 74 features from peaks in the clear, precipitate and other mutual information plots two classification schemes three-way: clear, non-crystal precipitate, other ten-way: clear, phase separation, phase + precipitate, skin, phase + crystal, precip, precip + skin, precip + crystal, crystal, garbage naïve Bayes model leave-one-out cross-validation

21 21 Measuring classifier accuracy: precision and recall precision recall crystals “I think these are crystals” true positives false negatives false positives

22 22 Three-class distribution Clear24.3% Precipitate AND NOT crystal52.7% Other23.0% 1709552585109 15928451121819 61781727615 clear non-crystal precipitate other clearnon-crystalprecipitateother machine says true clas s Confusion matrix

23 23 Recall & precision

24 24 10-class distribution Clear33.83% Phase separation7.00% Phase separation + precipitate0.50% Skin0.79% Phase separation + crystal2.32% Precipitate34.25% Precipitate + skin4.95% Precipitate + crystal7.53% Crystal8.34% Garbage0.55%

25 25 Confusion matrix

26 26 Recall & precision

27 27 Acknowledgements Hauptman-Woodward Medical Research Institute George DeTitta, Joe Luft, Eddie Snell, Mike Malkowski, Angela Lauricella, Max Thayer, Raymond Nagel, Steve Potter, and the 96- study reviewers. World Community Grid Bill Bovermann, Viktors Berstis, Jonathan D. Armstrong, Tedi Hahn, Kevin Reed, Keith J. Uplinger, Nels Wadycki IBM Deep Computing: Jerry Heyman Jurisica Lab: Richard Lu All crystallization images were generated at the High-Throughput Screening lab at The Hauptman-Woodward Institute. Funding from NIH U54 GM074899 Genome Canada IBM NSERC (and earlier work from) NIH P50 GM62413 NSERC CITO


Download ppt "Crystallization Image Analysis on the World Community Grid Christian A. Cumbaa and Igor Jurisica Jurisica Lab, Division of Signaling Biology Ontario Cancer."

Similar presentations


Ads by Google