Presentation is loading. Please wait.

Presentation is loading. Please wait.

School of Computer Science, Carnegie Mellon University

Similar presentations


Presentation on theme: "School of Computer Science, Carnegie Mellon University"— Presentation transcript:

1 School of Computer Science, Carnegie Mellon University
Improving Performance of Functional Magnetic Image Analysis by Parameter Sharing in Bayesian Networks Mark Palatucci School of Computer Science, Carnegie Mellon University Example: Naïve Bayes Algorithm for Parameter Sharing Abstract In a typical (non-shared) Gaussian Naïve Bayes classification we treat each voxel at each time point as a separate feature. We compute the likelihood function by estimating a mean and variance for feature for each class. We must first find “groups” of voxels to share parameters: Smooth the data using a Gaussian kernel. Choose bandwidth “h” with cross validation. Bayesian Networks have proven useful for classifying cognitive states from functional magnetic resonance images (fMRI). Despite recent progress, training effective classifiers is still a difficult problem: dimensionality of fMRI data is extremely high, and training examples are both sparse and noisy. Effective classification in this domain typically requires feature reduction or parameter sharing in order to reduce the number of estimated parameters in the Bayesian network. In this abstract, we present an overview of our current research for parameter reduction of fMRI data. Our goal is to show that we can improve classifier accuracy by sharing parameters across regions of the brain with highly correlated neural activity. Building a classifier that uses shared parameters is fairly straightforward. Rather than computing a mean and variance for each feature, we compute a mean and variance for all voxels in the group at a particular time point and class. The inference is computed as normal, but replaced with the shared parameters. Estimating these parameters with more voxels allows us to decrease the variance and find better estimates. (2) For each different stimulus, compute the correlation of each voxel with each of its neighbors: (3) Pick a correlation threshold t, 0 < t < 1. Pick a starting voxel. While any neighbor has a correlation value greater than the threshold, add that neighbor to the current group. When there are no more neighbors above the threshold, increment the group. Repeat until all voxels are in a group. Introduction Recent studies have shown that it is possible to classify cognitive states from fMRI images. Mitchell, et al. (2004) have shown that it is possible to determine when a subject is viewing either a sentence or a picture merely from a fMRI snapshot of the subject’s neural activity. They achieve this by training a classifier to learn the hemodynamic response for a particular stimulus. The brain responds differently to varied stimuli, and using even simple methods like Naïve Bayes, these differences can be learned and used for classification. Correlation of Neural Activity Preliminary Results… For each voxel, we compute the max correlation with each neighboring voxel. We then take the max correlation with any of these neighbors, and plot that correlation value. In the two images below, we can see areas of the brain that exhibit high levels of correlated neural activity. We have performed some initial tests on our parameter sharing algorithm using a a Gaussian Naïve Bayes classifier. The classification step was to determine whether a subject was viewing either a picture or a sentence. We tested our classifier on several different regions of the brain. We used a cross validation step to test different correlation thresholds. The results on the right show that our parameter sharing algorithm was able to find a more accurate classifier (compared to standard GNB) for three of the five regions shown. Note: A correlation threshold of 0.9 is essentially a normal GNB classifier. As the threshold is higher than any correlated activity, each voxel becomes its own group. From T. Mitchell– Brain activation for reading a sentence (left) and viewing a picture (right) Max correlation of each voxel with neighbors for two different Z-slices of the neocortex. References [1] Niculescu, R. S. (2005). Exploiting Parameter Domain Knowledge for Learning in Bayesian Networks. Doctoral dissertation, School of Computer Science, Carnegie Mellon University. [2] Friedman, J. H. (1997). "On bias, variance, 0/1-loss, and the curse-of-dimensionality," Data Mining and Knowledge Discovery, vol. 1, no. 1, pp [3] T.M. Mitchell, R. Hutchinson, R.S. Niculescu, F.Pereira, X. Wang, M. Just, and S. Newman (2004). Learning to Decode Cognitive States from Brain Images, Machine Learning, Vol. 57, Issue 1-2, pp The areas that have highly correlated neural activity are good candidates for parameter sharing or feature reduction.. From J. Taylor – Hemodynamic response curve

2 Parameter Estimation in a Hierarchical Model for Species Occupancy
Rebecca A. Hutchinson and Thomas G. Dietterich School of EECS, Oregon State University and Parameter Estimation in a Hierarchical Model for Species Occupancy © J.R.Woodward / CLO Abstract In this paper, we describe a model for the relationship between the occupancy pattern of a species on a landscape and imperfect observations of its presence or absence. The structure in the observation process is incorporated generatively, and environmental inputs are incorporated discriminatively. Our experiments on synthetic data compare two methods for training this model under various regularization schemes. Our results suggest that maximizing the expected log-likelihood of the observations and the unknown true occupancy produces parameter estimates that are closer to the truth than maximizing the conditional likelihood of the observations alone. Model and Algorithms Experiments Figure 1: The graphical structure of the species occupancy model. i indexes sites, and t indexes visits. Y is observed data, Z is the latent occupancy, W contains detection covariates, and X contains occupancy covariates. Introduction We consider a problem in which the quantity about which we wish to make inferences is observed imperfectly, with structure in the observation errors. We can encode our knowledge about the observation process by modeling the data generatively, with a latent variable for the quantity of interest that gives rise to observations about its value. In some situations, we also have measurements of variables affecting the latent variable and/or the observation process. We are not interested in modeling the distributions of these variables, so we encode them in the model discriminatively. This problem occurs in species occupancy modeling, in which the goal is to discover the pattern of occupancy for a species of interest from observations of its presence or absence at randomly sampled locations over a landscape. Since the species might not be detected even when it is present, the sample sites are visited multiple times. In carefully designed studies, the visits are scheduled such that it is reasonable to assume that the species’ occupancy does not change over the course of the visits, and that the visits are independent when conditioned on the true occupancy status of the species. The data collected in these studies then contains a detection history for each site, recording the observed presence or absence of the species on each visit. This data is the result of two components: occupancy, which is the biological pattern of interest, and detection, which is a confounding factor. Each component of the model may be accompanied by a set of covariates thought to affect them. For instance, covariates of the occupancy component might include elevation and vegetation type. Covariates of the detection component might include time of day and weather on the day of the visit. In the ecology literature (e.g. [2]), several models of this type are usually fit with different sets of covariates and evaluated according to some model selection criterion. As an alternative, we propose including all covariates in a single model and using regularization to penalize excess complexity. In this paper, we use synthetic data to investigate two training algorithms for this model, and the regularization trade-offs for each. References [1] A.P. Dempster, N. Laird, and D. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39(1):1–38, 1977. [2] D.I. MacKenzie et al. Occupancy Estimation and Modeling: Inferring Patterns and Dynamics of Species Occurrence. Academic Press, 2006. [3] J. Elith et al. Novel methods improve prediction of species’ distributions from occurrence data. Ecography, 29:129–151, 2006. [4] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer, 2nd edition, 2009. [5] H. Zou and T. Hastie. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, 67(2):301–320, 2005.


Download ppt "School of Computer Science, Carnegie Mellon University"

Similar presentations


Ads by Google