Presentation is loading. Please wait.

Presentation is loading. Please wait.

Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University.

Similar presentations


Presentation on theme: "Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University."— Presentation transcript:

1 Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

2 S 2 Acknowledgments Current PhD students Amina Chebira Tad Merryman Gowri Srinivasa PhD students Doru Cristian Balcan Elvira Garcia Osuna Pablo Hennings Yeomans Jason Thornton Collaborators Vijay kumar Bhagavatula Geoff Gordon José Moura Markus Püschel Marios Savvides Bob Murphy Undergrads Woon Ho Jung Funding Lionel Coulot Heather Kirshner

3 S 3 Goal Imaging in systems biology Imaging in systems biology Use informatics to Use informatics to acquire, store, manipulate and share large bioimaging databases acquire, store, manipulate and share large bioimaging databases Leads to Leads to automated, efficient and robust processing automated, efficient and robust processing Need Need Host of sophisticated tools from many areas Host of sophisticated tools from many areas Computation Knowledge Extraction Acquisition Application area

4 S 4 Application Areas Bioimaging Bioimaging Current focus in biology: mapping out the protein landscape Current focus in biology: mapping out the protein landscape Fluorescence microscopy used to gather data on subcellular events Fluorescence microscopy used to gather data on subcellular events Biometrics Biometrics Biosensing for providing security Biosensing for providing security to the financial industry to the financial industry at US borders at US borders Use persons biometric characteristic to identify/verify Use persons biometric characteristic to identify/verify

5 S 5 Acquisition Issues Issues z-stacks and time series resolution z-stacks and time series resolution Context-dependent Context-dependent Slow-changing process needs to be acquired with coarser resolution Slow-changing process needs to be acquired with coarser resolution Changes need to be detected and reacted to Changes need to be detected and reacted to Efficiency of acquisition Efficiency of acquisition Acquire only where and when needed adaptivity Acquire only where and when needed adaptivity Sample question Sample question How can we efficiently acquire fluorescence microscopy images? How can we efficiently acquire fluorescence microscopy images?

6 S 6 Knowledge Extraction Sample questions Sample questions How can we automatically and efficiently classify proteins based on images of their subcellular locations? How can we automatically and efficiently classify proteins based on images of their subcellular locations? How can we identify/verify persons identity based on his/her biometric characteristic? How can we identify/verify persons identity based on his/her biometric characteristic? Toolbox needed to solve the problem Toolbox needed to solve the problem Signal processing/data mining Signal processing/data mining Multiresolution tools allow for adaptive and efficient processing Multiresolution tools allow for adaptive and efficient processing

7 S 7 Computation The problem: fast numerical software The problem: fast numerical software Hard to write fast code Hard to write fast code Best code platform- dependent Best code platform- dependent Code becomes obsolete as fast as it is written Code becomes obsolete as fast as it is written reasonable implementation vendor library or SPIRAL generated 10x

8 S 8 SPIRAL Code Generation for DSP Algorithms The Solution The Solution Automatic generation and optimization of numerical software Automatic generation and optimization of numerical software Tuning of implementation and algorithm Tuning of implementation and algorithm A new breed of intelligent SW design tools A new breed of intelligent SW design tools SPIRAL: a prototype for the domain of DSP algorithms SPIRAL: a prototype for the domain of DSP algorithms fast algorithm as SPL formula C/Fortran program DSP transform (user specified) Platform adapted code Formula translator controls runtime on given platform Formula generator controls Search engine

9 S 9 Bioimaging Acquisition Acquisition How can we efficiently acquire fluorescence microscopy images? How can we efficiently acquire fluorescence microscopy images? Knowledge extraction Knowledge extraction How can we automatically and efficiently classify proteins based on images of their subcellular locations? How can we automatically and efficiently classify proteins based on images of their subcellular locations? Computation Computation Automatic code generation and optimization Automatic code generation and optimization ComputationKnowledge Extraction Acquisition Bioimaging

10 S 10 Motivation Current focus in biological sciences Current focus in biological sciences System-wide research omics System-wide research omicsomics Human genome project Human genome project Next frontier Next frontier Proteomics Proteomics Subcellular location one of major components Subcellular location one of major components Grand challenge Grand challenge Develop an intelligent next-generation bioimaging system capable of fast, robust and accurate classification of proteins based on images of their subcellular locations Develop an intelligent next-generation bioimaging system capable of fast, robust and accurate classification of proteins based on images of their subcellular locations

11 S 11 MR Acquisition of Fluorescence Microscopy Images Problem Problem Why acquire in areas of low fluorescence? Why acquire in areas of low fluorescence? Acquire only when and where needed Acquire only when and where needed Measure of success Measure of success Problem dependent Problem dependent Here: Strive to maintain the achieved classification accuracy Here: Strive to maintain the achieved classification accuracy Efficient acquisition leads to Efficient acquisition leads to Faster acquisition Faster acquisition Possibility of increasing acquisition resolution Possibility of increasing acquisition resolution Possible increase in classification accuracy due to increased resolution Possible increase in classification accuracy due to increased resolution ER

12 S 12 Approach Approach Develop algorithm on an acquired data set at maximum resolution Develop algorithm on an acquired data set at maximum resolution Implement a microscopes scanning protocol Implement a microscopes scanning protocol Algorithm: Mimic Battleship strategy Algorithm: Mimic Battleship strategy Acquire around the hits Acquire around the hits MR Acquisition of Fluorescence Microscopy Images 2D3D

13 S 13 Algorithm: Details Probe Intensity > T? Initialize probe locations yes Add probe locations yes Probe locations left? no

14 S 14 Trade-Offs What will we lose? What will we lose? Scanning simplicity Scanning simplicity What will we gain? What will we gain? Faster acquisition process Faster acquisition process Time is proportional to the savings in samples Time is proportional to the savings in samples Need to take into account the time to operate scanning unit Need to take into account the time to operate scanning unit Higher resolution in 3D Higher resolution in 3D The laser intensity can be reduced The laser intensity can be reduced Reduces photobleaching Reduces photobleaching Some sources indicated linear relationship, some other Some sources indicated linear relationship, some other

15 S 15 MR sampling algorithm Trivial approach Percent of samples kept / 100 Mitochondrial compression versus distortion MSE Results in 3D MR Algorithm (9.81:1) Trivial Approach (9:1) ApproximationDifference Image

16 S 16 Results in 2D Compression Ratio Accuracy [%]

17 S 17 Current and Future Work Implementation issues Implementation issues Can one operate galvo- mirrors fast enough to capitalize on the gain? Can one operate galvo- mirrors fast enough to capitalize on the gain? Algorithmic issues Algorithmic issues Add knowledge from classification (feedback) Add knowledge from classification (feedback) Build models Build models

18 S 18 Funding and References Funding Funding NSF , Next-Generation Bio-Molecular Imaging and Information Discovery, NSF, $2,500,000, 10/03-9/08. Co-PI. NSF , Next-Generation Bio-Molecular Imaging and Information Discovery, NSF, $2,500,000, 10/03-9/08. Co-PI. Journal papers Journal papers T.E. Merryman and J. Kovačević, An adaptive multirate algorithm for acquisition of fluorescence microscopy data sets," IEEE Trans. Image Proc., special issue on Molecular and Cellular Bioimaging, September T.E. Merryman and J. Kovačević, An adaptive multirate algorithm for acquisition of fluorescence microscopy data sets," IEEE Trans. Image Proc., special issue on Molecular and Cellular Bioimaging, September 2005.An adaptive multirate algorithm for acquisition of fluorescence microscopy data sets,"An adaptive multirate algorithm for acquisition of fluorescence microscopy data sets," Conference papers Conference papers T.E. Merryman, J. Kovačević, E.G. Osuna and R.F. Murphy, "Adaptive multirate data acquisition of 3D cell images," Proc. IEEE Int. Conf. Acoust., Speech, and Signal Proc., Philadelphia, PA, March T.E. Merryman, J. Kovačević, E.G. Osuna and R.F. Murphy, "Adaptive multirate data acquisition of 3D cell images," Proc. IEEE Int. Conf. Acoust., Speech, and Signal Proc., Philadelphia, PA, March 2005."Adaptive multirate data acquisition of 3D cell images,""Adaptive multirate data acquisition of 3D cell images,"

19 S 19 Knowledge Extraction MR Classification of Proteins Why MR? Why MR? Introduction of simple MR features produced a statistically significant jump in accuracy Introduction of simple MR features produced a statistically significant jump in accuracy Introduce adaptivity with little computational cost Introduce adaptivity with little computational cost This is tubulin SegmentationClassification

20 S 20 Data Sets 3D HeLa 3D HeLa 2D HeLa 2D HeLa 3T3 3T3 Huang & Murphy, Journal of Biomedical Optics 9(5), 893–912, 2004

21 S 21 3D HeLa Data Set Cells from Henrietta Lacks (d. 1951, cervical cancer) Cells from Henrietta Lacks (d. 1951, cervical cancer) Confocal Scanning Laser Microscope (100x) Confocal Scanning Laser Microscope (100x) DNA stain (PI), all protein stain (Cy5 reactive dye) and fluorescent anti-body for a specific protein DNA stain (PI), all protein stain (Cy5 reactive dye) and fluorescent anti-body for a specific protein sets per class sets per class D slices per set D slices per set Resolution x x 0.2 μm Resolution x x 0.2 μm Huang & Murphy, Journal of Biomedical Optics 9(5), 893–912, 2004 Covers all major subcellular structures Covers all major subcellular structures

22 S 22 3D HeLa Data Set Covers all major subcellular structures Covers all major subcellular structures Golgi apparatus (giantin, gpp 130) Golgi apparatus (giantin, gpp 130) Cytoskeleton (actin, tubulin) Cytoskeleton (actin, tubulin) Endoplasmic reticulum membrane (ER) Endoplasmic reticulum membrane (ER) Lysosomes (LAMP2) Lysosomes (LAMP2) Endosomes (transf. receptor) Endosomes (transf. receptor) Nucleus (nucleolin) Nucleus (nucleolin) Mitochondria outer membrane Mitochondria outer membrane

23 S 23 2D HeLa Data Set Cells from Henrietta Lacks (d. 1951, cervical cancer) Cells from Henrietta Lacks (d. 1951, cervical cancer) Widefield w nearest neighbor deconvolution (100x) Widefield w nearest neighbor deconvolution (100x) DNA stain and fluorescent anti-body for a specific protein DNA stain and fluorescent anti-body for a specific protein sets per class sets per class Resolution 0.23 x 0.23 μm Resolution 0.23 x 0.23 μm Boland & Murphy, Bioinformatics 17(12), , 2001 Mitochondria Tubulin LAMP2 Giantin Gpp130 Nucleolin DNA Actin ER Tfr

24 S 24 Classification: Previous system Preprocessing Preprocessing Manual shifting Manual shifting Manual rotation Manual rotation Feature computation Feature computation Subcellular Location Features (SLF) Subcellular Location Features (SLF) Drawn from many different feature categories Drawn from many different feature categories Texture, morphological, Gabor and wavelet Texture, morphological, Gabor and wavelet Gabor and wavelet features improved accuracy significantly (from 88% to 92%) Gabor and wavelet features improved accuracy significantly (from 88% to 92%) Classification Classification Combination of classifiers Combination of classifiers Input image Preprocessing Feature extraction Classification Class

25 S 25 MR Classification of Proteins Points to Points to Frames Frames MD frames MD frames Wavelet/frame packets Wavelet/frame packets What do we need? What do we need? Want to keep MR (based on results with Gabor and wavelet features) Want to keep MR (based on results with Gabor and wavelet features) Avoid manual processing Avoid manual processing Rotation invariance Rotation invariance Shift invariance Shift invariance Adaptivity Adaptivity

26 S 26 Does Adaptivity Help? Would like to use wavelet packets Would like to use wavelet packets Do not have an obvious cost measure Do not have an obvious cost measure Line of work Line of work Find out if adaptivity helps Find out if adaptivity helps If it does, find a cost function to use with wavelet packets If it does, find a cost function to use with wavelet packets Frame packets Frame packets Challenge: Same class, different story Challenge: Same class, different story Tubulin

27 S 27 Training Phase Number of classes C Number of classes C Number of training images/class N Number of training images/class N Clustering images Full wavelet treeFeature extractionK-means clustering Gaussian modelingWeight computationVoting Weights Training image Gaussian models

28 S 28 Full Wavelet Tree Decomposition Grow a full tree Grow a full tree Depth L levels Depth L levels Total number of subbands S Total number of subbands S Clustering images Full wavelet tree

29 S 29 Feature Extraction Use Haralick texture features Use Haralick texture features One feature vector per subband s One feature vector per subband s Indexed by class c, training image n, subband s Indexed by class c, training image n, subband s Clustering images Full wavelet tree Feature extraction

30 S 30 K-Means Clustering Clustering in a fixed subband Clustering in a fixed subband Max K clusters/class Max K clusters/class Clustering images of class c Feature vector for image I from class c and subband s Cluster mean X Clustering images Full wavelet tree Feature extraction K-means clustering

31 S 31 Gaussian Modeling Model each cluster with a Gaussian pdf Model each cluster with a Gaussian pdf Probability the training image belongs to class i Probability the training image belongs to class i Output: single probability vector Output: single probability vector Clustering images Full wavelet tree Feature extraction K-means clustering Gaussian modeling Training image

32 S 32 Class CClass 1 From Feature Space to Probability Space Subband S Image 1 from Class C Image 1 from Class 1 Subband 1 Image N from Class 1 Image N from Class C

33 S 33 Weight Computation: Initialization Decision for vector t c,n,s Decision for vector t c,n,s Class CClass 1 Subband S Image 1 from Class C Image 1 from Class 1 Subband 1 Image N from Class 1 Image N from Class C Clustering images Full wavelet tree Feature extraction K-means clustering Gaussian modeling Weight computation Training image

34 S 34 Weight Computation : Initialization Initial weight for subband s: probability of correct decision Initial weight for subband s: probability of correct decision Class CClass 1 Subband S Image 1 from Class C Image 1 from Class 1 Subband 1 Image N from Class 1 Image N from Class C correctincorrect correct incorrectcorrect

35 S 35 Weight Computation Compute probability vector for each image Compute probability vector for each image Class CClass 1 Subband S Image 1 from Class C Image 1 from Class 1 Subband 1 Image N from Class 1 Image N from Class C Class 1 Subband S Image 1 from Class 1 Subband 1

36 S 36 Weight Adjustment Voting Make a decision Make a decision Decision correct Decision correct Do nothing, take next image Do nothing, take next image Decision incorrect Decision incorrect Adjust the weights, take next image Adjust the weights, take next image Make runs through all the images Make runs through all the images Does the algorithm converge? Does the algorithm converge? Clustering images Full wavelet tree Feature extraction K-means clustering Gaussian modeling Weight computation Voting Weights Training image Gaussian models

37 S 37 Testing Phase Compute probabilities for each subband Compute probabilities for each subband Compute the overall probability vector Compute the overall probability vector Make the decision Make the decision Weights Gaussian models Full wavelet treeFeature extraction Probability space Voting Testing image Class label

38 S 38 Results C = 10 classes C = 10 classes N = 45 training images N = 45 training images T = 5 testing images T = 5 testing images 10-fold cross validation 10-fold cross validation Training phase Training phase 44clustering images 44clustering images 45-fold cross validation L = 2,3 levels of Haar wavelet decomposition K = 10 max number of clusters per class

39 S 39 Results Images Images Output of the classifier [%], K=5 TubGppNucGiaMitDNAERLMPActTfR Avg % Previous system MR system

40 S 40 Results: Accuracy vs Number of Epochs K No MR Acc (%)

41 S 41 Classification Enhancement

42 S 42 Weight Adjustment: 2 nd Try Keep the previous best weight Keep the previous best weight Can do no worse than previous system Can do no worse than previous system Images Images Output of the classifier [%], K=10 TubGppNucGiaMitDNAERLMPActTfR Avg % Previous system MR system

43 S 43 Principal Component Analysis Using eigenspace representations for Haralick texture featuresUsing eigenspace representations for Haralick texture features Texture classification (TC) Decomposition better than no decomposition (with or without PCA) Decomposition better than no decomposition (with or without PCA) There is information in the subbands There is information in the subbands TC + PCA Improves accuracy (with or without decomposition) Improves accuracy (with or without decomposition) Dimensionality reduction (DR) Increases accuracy slightly without much complexity Increases accuracy slightly without much complexity Exp. No MR MR TC69.0%81.0% TC + PCA 81.8%87.4% TC + PCA/DR 67.0%82.6%

44 S 44 Effect of Translation Variance No translation No translation accuracy(MR frames) > accuracy(MR) accuracy(MR frames) > accuracy(MR) Translation Translation MR drops MR drops MR frames stable MR frames stable No translation Translation MR81.4%80.8% MR frames 83.2%83.2%

45 S 45 Conclusions and Future Directions Adaptivity definitely helps! Adaptivity definitely helps! Accuracy stable with the increased # of epochs Accuracy stable with the increased # of epochs Investigate the algorithm for convergence Investigate the algorithm for convergence K-means clustering introduces randomness K-means clustering introduces randomness There is no notion of global, local minima There is no notion of global, local minima Reducing K reduces randomness Reducing K reduces randomness Weighting Weighting Should be done for each class separately Should be done for each class separately Would lead to WP trees Would lead to WP trees Find cost function Find cost function Construct frame packets Construct frame packets

46 S 46 References Conference papers Conference papers G. Srinivasa, A. Chebira, T. Merryman and J. Kovačević, Adaptive multiresolution texture features for protein image classification, Proc. BMES Annual Fall Meeting, Baltimore, MD, September G. Srinivasa, A. Chebira, T. Merryman and J. Kovačević, Adaptive multiresolution texture features for protein image classification, Proc. BMES Annual Fall Meeting, Baltimore, MD, September K Williams, T. Merryman and J. Kovačević, A Wavelet Subband Enhancement to Classification, Proc. Annual Biomed. Res. Conf. for Minority Students, Atlanta, GA, November Submitted. K Williams, T. Merryman and J. Kovačević, A Wavelet Subband Enhancement to Classification, Proc. Annual Biomed. Res. Conf. for Minority Students, Atlanta, GA, November Submitted. A. Mintos, G. Srinivasa, A. Chebira and J. Kovačević, Combining Wavelet Features with PCA for Classification of Protein Images, Proc. Annual Biomed. Res. Conf. for Minority Students, Atlanta, GA, November Submitted. A. Mintos, G. Srinivasa, A. Chebira and J. Kovačević, Combining Wavelet Features with PCA for Classification of Protein Images, Proc. Annual Biomed. Res. Conf. for Minority Students, Atlanta, GA, November Submitted. T. Merryman, K. Williams and J. Kovačević, A multiresolution enhancement to generic classifiers of subcellular protein location images, Proc. IEEE Intl. Symp. Biomed. Imaging, Arlington, VA, April In preparation. T. Merryman, K. Williams and J. Kovačević, A multiresolution enhancement to generic classifiers of subcellular protein location images, Proc. IEEE Intl. Symp. Biomed. Imaging, Arlington, VA, April In preparation. G. Srinivasa, T. Merryman, A. Chebira, A. Mintos and J. Kovačević, Adaptive multiresolution techniques for subcellular protein location image classification, Proc. IEEE Int. Conf. Acoust., Speech, and Signal Proc., Toulouse, France, May Invited paper. In preparation. G. Srinivasa, T. Merryman, A. Chebira, A. Mintos and J. Kovačević, Adaptive multiresolution techniques for subcellular protein location image classification, Proc. IEEE Int. Conf. Acoust., Speech, and Signal Proc., Toulouse, France, May Invited paper. In preparation.

47 S 47 Automatic Code Generation Work in progress Work in progress

48 S 48 Biometrics Acquisition Acquisition NIST database NIST database Knowledge extraction Knowledge extraction How can we identify/verify persons identity based on his/her biometric characteristic? How can we identify/verify persons identity based on his/her biometric characteristic? Computation Computation Automatic code generation and optimization Automatic code generation and optimization ComputationKnowledge Extraction Acquisition Biometrics

49 S 49 Motivation Security to the financial industry Security to the financial industry 89,000 cases of identity theft in ,000 cases of identity theft in 2000 Losses incurred by Visa/MasterCard $68.2 million Losses incurred by Visa/MasterCard $68.2 million Security at US borders Security at US borders Multimodal biometric systems Multimodal biometric systems Grand challenge Grand challenge Develop an intelligent next-generation biometric system capable of fast, robust and accurate identification and verification of human biometric characteristics. Develop an intelligent next-generation biometric system capable of fast, robust and accurate identification and verification of human biometric characteristics.

50 S 50 Challenges Variable conditions Variable conditions Different lighting, indoors/outdoors, different poses, … Different lighting, indoors/outdoors, different poses, … Small training sets Small training sets Uncooperative biometrics (access to only one picture of a suspected criminal) Uncooperative biometrics (access to only one picture of a suspected criminal) Huge databases Huge databases Computation becomes an issue Computation becomes an issue Database sizes: up to hundreds of thousands Database sizes: up to hundreds of thousands

51 S 51 State of Commercial Products NIST (National Institute of Standards) NIST (National Institute of Standards) Mandated by the Government to measure accuracy of biometric technologies (Patriot Act) Mandated by the Government to measure accuracy of biometric technologies (Patriot Act) In cooperation with FBI, State Department, DARPA, National Institute of Justice, Transportation Security Administration, United States Customs, Service, Department of Energy, Drug Enforcement Administration, INS, etc. In cooperation with FBI, State Department, DARPA, National Institute of Justice, Transportation Security Administration, United States Customs, Service, Department of Energy, Drug Enforcement Administration, INS, etc.

52 S 52 Face Recognition Vendor Tests FRVT ,589 images of 37,437 individuals 121,589 images of 37,437 individuals Outdoors Outdoors 71.5% true accept 0.01% false accept rate 90.3% true accept 1.0% false accept rate Indoors Indoors 50% true accept 1.0% false accept rate Size of the database Size of the database the recognition rate decreases linearly with the logarithm of the database size 800 people, 1,600 people, 73% for 37,437 people) Challenges Poor-quality images, small training sets, database size

53 S 53 Fingerprint Vendor Technology Evaluation FpVTE ,105 sets, 25,309 individuals, 393,370 distinct fingerprints Verification results 99.4% true accept 0.01% false accept rate 99.9% true accept 1.0% false accept rate Challenges Poor-quality images Database size

54 S 54 Correlation-Based Biometrics System One of the standard methods One of the standard methods Based on correlation filters Based on correlation filters Template matching performed on the entire image Template matching performed on the entire image Two systems Two systems Identification Identification Verification Verification MR system MR system Who am I? Who is this? This is Ben I am Ben Is this Ben? Yes/No Template matching match no match

55 S 55 Correlation Filters Specific to one class Specific to one class Produce correlation peaks when applied to their classes Produce correlation peaks when applied to their classes Output: correlation plane Output: correlation plane Match score: sharpness of peak Match score: sharpness of peak shift-invariant shift-invariant goodness of the match between input and stored image goodness of the match between input and stored image

56 S 56 Correlation Filter Design MACE (Minimum Average Correlation Energy) filter MACE (Minimum Average Correlation Energy) filter Origin of each correlation plane constrained to 1 for in-class and 0 out-of-class Origin of each correlation plane constrained to 1 for in-class and 0 out-of-class Minimizes ACE (Average Correlation Energy) Minimizes ACE (Average Correlation Energy) Solution Solution Filter Filter Minimum energy Minimum energy Fitness metric Fitness metric How well the correlation filter will perform How well the correlation filter will perform X of size nxt, FT of training images as columns X of size nxt, FT of training images as columns u of size tx1, origin constraints u of size tx1, origin constraints D of size nxn, n total number of pixels D of size nxn, n total number of pixels h of size nx1, filter values h of size nx1, filter values

57 S 57 MR Approaches in Biometrics MR system MR system Introduces adaptivity Introduces adaptivity Template matching performed on different space-frequency regions Template matching performed on different space-frequency regions Builds a different decomposition for each class Builds a different decomposition for each class

58 S 58 Training Phase: Tree Determination Use wavelet packets to build adaptive space- frequency decomposition Use wavelet packets to build adaptive space- frequency decomposition Pruning criterion Pruning criterion

59 S 59 Training Phase: Filter Design Build a correlation filter for each subspace Build a correlation filter for each subspace Decompose all in-class training images with the appropriate tree Decompose all in-class training images with the appropriate tree Compute the correlation filter Compute the correlation filter Testing Phase Match metric Match metric

60 S 60 Data Sets NIST 24 fingerprint database NIST 24 fingerprint database MPEG-2 video MPEG-2 video 10 people (5 male & 5 female) 10 people (5 male & 5 female) 2 fingers 2 fingers 20 classes 20 classes 100 images/class 100 images/class Subjects instructed to roll fingers continually Subjects instructed to roll fingers continually Used images for training: 8 in-class and the rest out-of-class Used images for training: 8 in-class and the rest out-of-class Easy class Difficult class

61 S 61 Identification results Identification results Standard Correlation Filters Wavelet Correlation Filters Standard Correlation Filters Wavelet Correlation Filters Verification results Verification results Results

62 S 62 Shift-Invariance DWT is shift-varying DWT is shift-varying Amount of shift variance depends on level j Amount of shift variance depends on level j Evaluate the effects Evaluate the effects Shift the input image Shift the input image Compute PCEs Compute PCEs 2424

63 S 63 Current and Future Work Use frames instead of bases Use frames instead of bases Takes care of shift variance Takes care of shift variance Build rotation-invariant frames Build rotation-invariant frames Implies true 2D design Implies true 2D design Build frame packets Build frame packets Issue of cost function in overlapping spaces Issue of cost function in overlapping spaces

64 S 64 Automatic Code Generation Formula Formula Uniquely represents our transform Uniquely represents our transform Code generation Code generation SPIRAL takes the formula and produces C code SPIRAL takes the formula and produces C code

65 S 65 References Journal papers Journal papers P. Hennings Yeomans, J. Thornton, J. Kovačević and B.V.K.V. Kumar, "Wavelet packet correlation methods in biometrics,'' Applied Optics, special issue on Biometric Recognition Systems, vol. 44, no. 5, February 2005., pp P. Hennings Yeomans, J. Thornton, J. Kovačević and B.V.K.V. Kumar, "Wavelet packet correlation methods in biometrics,'' Applied Optics, special issue on Biometric Recognition Systems, vol. 44, no. 5, February 2005., pp "Wavelet packet correlation methods in biometrics,''"Wavelet packet correlation methods in biometrics,'' Conference papers Conference papers J.T. Thornton, P. Hennings Yeomans, J. Kovačević and B.V.K.V. Kumar, ``Wavelet packet correlation methods in biometrics,'' Proc. IEEE Int. Conf. Acoust., Speech, and Signal Proc., Philadelphia, PA, March J.T. Thornton, P. Hennings Yeomans, J. Kovačević and B.V.K.V. Kumar, ``Wavelet packet correlation methods in biometrics,'' Proc. IEEE Int. Conf. Acoust., Speech, and Signal Proc., Philadelphia, PA, March ``Wavelet packet correlation methods in biometrics,'' ``Wavelet packet correlation methods in biometrics,''

66 S 66 MR Signal Representation Tools What? What? Analysis and processing at different resolutions Analysis and processing at different resolutions Resolution: amount of information Resolution: amount of information Why? Why? Localization Localization Adaptivity Adaptivity Computational efficiency Computational efficiency How? How? Decomposition into time-frequency atoms Decomposition into time-frequency atoms Divide and conquer Divide and conquer

67 S 67 Localization Zoom in on singularities Zoom in on singularities

68 S 68 t f Dirac basis WPWT ER Actin STFT FTAdaptivity Holy Grail of Signal Analysis/Processing Holy Grail of Signal Analysis/Processing Understand the blob-like structure of the energy distribution in the time- frequency space Understand the blob-like structure of the energy distribution in the time- frequency space Design a representation reflecting that Design a representation reflecting that

69 S 69 How? Divide and conquer Divide and conquer Represent a signal in terms of its building blocks Represent a signal in terms of its building blocks = * * * * =

70 S 70 =x How? x = synthesize (do something) analyze x x = synthesize (do something) analyze x X analyze x X = analyze x =x x = synthesize (do something) analyze x x = synthesize (do something) analyze x x synthesize X x = synthesize X

71 S 71 MR Signal Representation Tools We build tools responding to requirements from a specific application We build tools responding to requirements from a specific application Shift invariance Shift invariance Leads to redundant representations --- frames Leads to redundant representations --- frames Adaptivity Adaptivity Leads to wavelet (frame) packets Leads to wavelet (frame) packets MD nature of the signal MD nature of the signal Leads to nonseparable MR decompositions Leads to nonseparable MR decompositions

72 S 72 Frames Nonredundant decompositions Nonredundant decompositions Robustness to noise Robustness to noise Robustness to losses Robustness to losses Freedom in design Freedom in design Shift-invariance Shift-invariance

73 S 73 Bases versus Frames? Bases are nonredundant Bases are nonredundant Loss of one transform coefficient is irreplaceable Loss of one transform coefficient is irreplaceable Sensitivity to noise is great Sensitivity to noise is great Space of possible solutions is restricted Space of possible solutions is restricted Solution: frames Solution: frames 0 1 n Processing Inverse Transform Transform n x n m-1 m x nn x m

74 S 74 Robustness to Noise Noise is spread over more components: easier to clean Noise is spread over more components: easier to clean 0 1 n-1 Frame F m x n n Reconstr. F* n x m 0 1 n-1 Transmission 0 1 m-1

75 S 75 Robustness to Losses Losses Losses Modeled as erasures Modeled as erasures To reconstruct, inverse transform must exist To reconstruct, inverse transform must exist Mathematically: any (n x n) submatrix of the frame matrix must be full rank maximally robust to erasures (MR) Mathematically: any (n x n) submatrix of the frame matrix must be full rank maximally robust to erasures (MR) 0 1 n-1 Frame F m x n n Reconstr. F* n x m 0 1 n-1 Transmission 0 1 m-1 X X Losses

76 S 76 What are Frames? Generating system for R n or C n Generating system for R n or C n Usually represented by a matrix F Usually represented by a matrix F 0 1 m n-1 FxFrame coefficients y = =

77 S 77 Frame Properties Maximally robust (MR)Tight(T) Columns are orthonormal Equal norm (EN) All rows have equal norm X X 0 1 m-1 Any (n x n) submatrix is full rank n

78 S 78 01m-1n-1 What Do We Want to Do? We want to build frames with structure in steps We want to build frames with structure in steps First impose maximum robustness MR First impose maximum robustness MR Then impose tightness tight MR Then impose tightness tight MR Finally, add equal norm tight ENMR Finally, add equal norm tight ENMR Construction by seeding Construction by seeding 01n-1 Tools: Polynomial algebras and transforms Tools: Polynomial algebras and transforms m

79 S 79 Invariance of Frame Properties FA B is FA, B invertible 0 0 MR F A is MRFA, D invertible 0 0 A is UN TFD, U unitary TFU V is TFU, V unitary, nonzero a a 0 0 EN F U is ENFD, U unitary, nonzero a a

80 S 80 Building Frame Families We impose these one by one We impose these one by one MRmaximally robust to erasures MRmaximally robust to erasures use polynomial transforms use polynomial transforms then, F = P b, [1, …, N] is an MR frame then, F = P b, [1, …, N] is an MR frame TFtight frames TFtight frames use orthogonal polynomials use orthogonal polynomials construct a polynomial transform construct a polynomial transform construct the closest orthogonal polynomial transform construct the closest orthogonal polynomial transform ENequal norm ENequal norm use DFT to get complex ENMR frames use DFT to get complex ENMR frames use frame invariance properties to get real ENMR frames use frame invariance properties to get real ENMR frames

81 S 81 Funding and References Funding Funding NSF , Frame Toolbox for Bioimaging, Biometrics and Robust Transmission, 09/05-08/08. PI. NSF , Frame Toolbox for Bioimaging, Biometrics and Robust Transmission, 09/05-08/08. PI. Journal papers Journal papers V. K Goyal, J. Kovačević and J.A. Kelner, ``Quantized frame expansions with erasures,'' Journal of Appl. and Comput. Harmonic Analysis, vol. 10, no. 3, May 2001, pp V. K Goyal, J. Kovačević and J.A. Kelner, ``Quantized frame expansions with erasures,'' Journal of Appl. and Comput. Harmonic Analysis, vol. 10, no. 3, May 2001, pp ``Quantized frame expansions with erasures,''``Quantized frame expansions with erasures,'' V. K Goyal and J. Kovačević, ``Generalized multiple description coding with correlated transforms,'' IEEE Trans. Inform. Th., vol. 47, no. 6, September 2001, pp V. K Goyal and J. Kovačević, ``Generalized multiple description coding with correlated transforms,'' IEEE Trans. Inform. Th., vol. 47, no. 6, September 2001, pp ``Generalized multiple description coding with correlated transforms,''``Generalized multiple description coding with correlated transforms,'' V. K Goyal, J. A. Kelner and J. Kovačević, ``Multiple description vector quantization with a coarse lattice,'' IEEE Trans. Inform. Th., vol. 48, no. 3, March 2002, pp V. K Goyal, J. A. Kelner and J. Kovačević, ``Multiple description vector quantization with a coarse lattice,'' IEEE Trans. Inform. Th., vol. 48, no. 3, March 2002, pp ``Multiple description vector quantization with a coarse lattice,''``Multiple description vector quantization with a coarse lattice,'' J. Kovačević, P.L. Dragotti and V. K Goyal, ``Filter bank frame expansions with erasures,'' IEEE Trans. Inform. Th., special issue in Honor of Aaron D. Wyner, vol. 48, no. 6, June 2002, pp Invited paper. J. Kovačević, P.L. Dragotti and V. K Goyal, ``Filter bank frame expansions with erasures,'' IEEE Trans. Inform. Th., special issue in Honor of Aaron D. Wyner, vol. 48, no. 6, June 2002, pp Invited paper.``Filter bank frame expansions with erasures,''``Filter bank frame expansions with erasures,'' P.G. Casazza and J. Kovačević, ``Equal-norm tight frames with erasures,'' Advances in Computational Mathematics, special issue on Frames, pp , Invited paper. P.G. Casazza and J. Kovačević, ``Equal-norm tight frames with erasures,'' Advances in Computational Mathematics, special issue on Frames, pp , Invited paper.``Equal-norm tight frames with erasures,''``Equal-norm tight frames with erasures,''

82 S 82 References (contd) Conference papers Conference papers V. K Goyal, J. Kovačević and M. Vetterli, Quantized frame expansions as source-channel codes for erasure channels, Proc. Wavelets and Appl. Workshop, Ticino, Switzerland, September V. K Goyal, J. Kovačević and M. Vetterli, Quantized frame expansions as source-channel codes for erasure channels, Proc. Wavelets and Appl. Workshop, Ticino, Switzerland, September 1998.Quantized frame expansions as source-channel codes for erasure channels,Quantized frame expansions as source-channel codes for erasure channels, V. K Goyal, J. Kovačević and M. Vetterli, Quantized frame expansions as source-channel codes for erasure channels, Proc. Data Compr. Conf., Snowbird, UT, March V. K Goyal, J. Kovačević and M. Vetterli, Quantized frame expansions as source-channel codes for erasure channels, Proc. Data Compr. Conf., Snowbird, UT, March 1999.Quantized frame expansions as source-channel codes for erasure channels,Quantized frame expansions as source-channel codes for erasure channels, P. L. Dragotti, J. Kovačević and V. K Goyal, Quantized oversampled filter banks with erasures, Proc. Data Compr. Conf., Snowbird, UT, March 2001, pp P. L. Dragotti, J. Kovačević and V. K Goyal, Quantized oversampled filter banks with erasures, Proc. Data Compr. Conf., Snowbird, UT, March 2001, pp Quantized oversampled filter banks with erasures,Quantized oversampled filter banks with erasures, A. C. Lozano, J. Kovačević and M Andrews, Quantized frame expansions in a wireless environment, Proc. Data Compr. Conf., Snowbird, UT, March 2002, pp A. C. Lozano, J. Kovačević and M Andrews, Quantized frame expansions in a wireless environment, Proc. Data Compr. Conf., Snowbird, UT, March 2002, pp Quantized frame expansions in a wireless environment,Quantized frame expansions in a wireless environment, A. C. Lozano, J. Kovačević and M Andrews, Quantized frame expansions in a wireless environment, Proc. DIMACS Workshop on Source Coding and Harmonic Analysis, Rutgers, NJ, May A. C. Lozano, J. Kovačević and M Andrews, Quantized frame expansions in a wireless environment, Proc. DIMACS Workshop on Source Coding and Harmonic Analysis, Rutgers, NJ, May 2002.Quantized frame expansions in a wireless environment,Quantized frame expansions in a wireless environment, M. Püschel and J. Kovačević, Real, Tight Frames with Maximal Robustness to Erasures, Proc. Data Compr. Conf., Snowbird, UT, March 2005, pp M. Püschel and J. Kovačević, Real, Tight Frames with Maximal Robustness to Erasures, Proc. Data Compr. Conf., Snowbird, UT, March 2005, pp Real, Tight Frames with Maximal Robustness to ErasuresReal, Tight Frames with Maximal Robustness to Erasures Book chapters Book chapters P.G. Casazza, M. Fickus, J. Kovačević, M. Leon and J. Tremain, ``A physical interpretation of finite tight frames.'' Harmonic Analysis and Applications, C. Heil, Ed., Birkhauser, Boston, MA, P.G. Casazza, M. Fickus, J. Kovačević, M. Leon and J. Tremain, ``A physical interpretation of finite tight frames.'' Harmonic Analysis and Applications, C. Heil, Ed., Birkhauser, Boston, MA, 2004.``A physical interpretation of finite tight frames.''``A physical interpretation of finite tight frames.''

83 S 83 Wavelet Packets First stage: full decomposition First stage: full decomposition

84 S 84 Cost(parent) >< Cost(children)?Cost(parent) < Cost(children) Wavelet Packets Second stage: pruning Second stage: pruning Examples Examples Bioimaging Bioimaging Biometrics Biometrics

85 S 85 References on MR Light reading Light reading Wavelets: Seeing the Forest -- and the Trees, D. Mackenzie, Beyond Discovery, December 2001.Wavelets: Seeing the Forest -- and the Trees, D. Mackenzie, Beyond Discovery, December 2001.Wavelets: Seeing the Forest -- and the TreesWavelets: Seeing the Forest -- and the Trees Books Books A Wavelet Tour of Signal Processing, S. Mallat, Academic Press, A Wavelet Tour of Signal Processing, S. Mallat, Academic Press, Ten Lectures on Wavelets, I. Daubechies, SIAM, Ten Lectures on Wavelets, I. Daubechies, SIAM, Wavelets and Subband Coding, M. Vetterli and J. Kovačević, Prentice Hall, 1995.Wavelets and Subband Coding, M. Vetterli and J. Kovačević, Prentice Hall, 1995.Wavelets and Subband CodingWavelets and Subband Coding Wavelets and Filter Banks, G. Strang and T. Nguyen, Wells. Cambr. Press, Wavelets and Filter Banks, G. Strang and T. Nguyen, Wells. Cambr. Press, Bioimaging Bioimaging A Review of Wavelets in Biomedical Applications, M. Unser and A. Aldroubi, Proc. IEEE, April 1996.A Review of Wavelets in Biomedical Applications, M. Unser and A. Aldroubi, Proc. IEEE, April 1996.A Review of Wavelets in Biomedical ApplicationsA Review of Wavelets in Biomedical Applications Wavelets in Temporal and Spatial Processing of Biomedical Data, A. Laine, Annu. Rev. Biomed. Eng., 2000.Wavelets in Temporal and Spatial Processing of Biomedical Data, A. Laine, Annu. Rev. Biomed. Eng., 2000.Wavelets in Temporal and Spatial Processing of Biomedical DataWavelets in Temporal and Spatial Processing of Biomedical Data Guest Editorial: Wavelets in Medical Imaging, M. Unser, A. Aldroubi and A. Laine, IEEE Trans. On Medical Imaging, March 2003.Guest Editorial: Wavelets in Medical Imaging, M. Unser, A. Aldroubi and A. Laine, IEEE Trans. On Medical Imaging, March 2003.Guest Editorial: Wavelets in Medical ImagingGuest Editorial: Wavelets in Medical Imaging Wavelets in Bioinformatics and Computational Biology: State of the art and Perspectives, P. Lio, Bioinformatics Review, 2003.Wavelets in Bioinformatics and Computational Biology: State of the art and Perspectives, P. Lio, Bioinformatics Review, 2003.Wavelets in Bioinformatics and Computational Biology: State of the art and PerspectivesWavelets in Bioinformatics and Computational Biology: State of the art and Perspectives

86 S 86 References References on MR acquisition References on MR acquisition References on MR protein classification References on MR protein classification References on MR biometric recognition References on MR biometric recognition References on MR References on MR References on frames References on frames

87 S 87 Current Projects Bioimaging Bioimaging Efficient MR acquisition of fluorescence microscopy images Efficient MR acquisition of fluorescence microscopy images MR segmentation of multi-cell images MR segmentation of multi-cell images MR classification of proteins based on images of their subcellular locations MR classification of proteins based on images of their subcellular locations Automatic code generation for MR bioimaging algorithms Automatic code generation for MR bioimaging algorithms Biometrics Biometrics MR identification/verification (fingerprints, faces, irises,…) MR identification/verification (fingerprints, faces, irises,…) Automatic code generation for MR biometric algorithms Automatic code generation for MR biometric algorithms MR Tools MR Tools Frames Frames Algebraic theory of signal processing (SMART) Algebraic theory of signal processing (SMART)SMART

88 S 88 Conclusions The dream: The dream: automated, efficient and reliable processing of large biosignal databases Emphasis Emphasis Introduction of MR toolbox Introduction of MR toolbox Adaptivity and computational efficiency are key Adaptivity and computational efficiency are key Computation Knowledge Extraction Acquisition Systems Biology

89 S 89 Acknowledgments Current PhD students Amina Chebira Tad Merryman Gowri Srinivasa PhD students Doru Cristian Balcan Elvira Garcia Osuna Pablo Hennings Yeomans Jason Thornton Collaborators Vijay kumar Bhagavatula Geoff Gordon José Moura Markus Püschel Marios Savvides Bob Murphy Undergrads Woon Ho Jung Funding Lionel Coulot Heather Kirshner

90 S 90 Supplementary Material Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

91 S 91 Contents Bioimaging Bioimaging 3T3 data set 3T3 data set 3T3 data set 3T3 data set Segmentation Segmentation Segmentation Haralick texture features Haralick texture features Haralick texture features Haralick texture features Computation Computation Spiral details Spiral details Spiral details Spiral details

92 S 92 3T3 Data Set Cells from mouse embryo Cells from mouse embryo Spinning Disk Confocal Microscope (60x) Spinning Disk Confocal Microscope (60x) GFP for a specific protein GFP for a specific protein

93 S 93 Segmentation

94 S 94 Haralick Texture Features

95 S 95 False accept rate (FAR) False accept rate (FAR) False reject rate (FRR) False reject rate (FRR) Equal error rate (EER) Equal error rate (EER) Specificity/Sensitivity/Error Rates Different jargon in different communities Different jargon in different communities Sensitivity Sensitivity Specificity Specificity Disorder presentabsent Test result positive ab negative cdClassownother Class. result ownab othercd

96 S 96 SPIRAL Code Generation for DSP Algorithms Transform: Matrix Transform: Matrix Rules: Decompose transform into other ones Rules: Decompose transform into other ones Formula: Uniquely represents the transform Formula: Uniquely represents the transform Code generation: From formula produce C code Code generation: From formula produce C code


Download ppt "Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University."

Similar presentations


Ads by Google