Butte Lab Journal Club 10/25/2010. Boltzmann machines able to solve difficult combinatorial problems Estimating the density function of multivariate.

Butte Lab Journal Club 10/25/2010

Boltzmann machines able to solve difficult combinatorial problems Estimating the density function of multivariate binary data typically done with mixture models or factor models Problem: Too computationally expensive for many multivariate binary density modeling problems Solution: Authors describe a generalization of the restricted Boltzmann Machine (RBM), the restricted Boltzmann forest (RBForest) – replaces the binary hidden variables of the RBM with groups of tree-structured binary variables – when the size of the trees is varied, the number of parameters of the model can be increased while keeping the computations of the density function tractable. – basically, “structured” binning of variables Example application: automated diagnosis using involving large number of feature types

Computational pipelines are essential, yet paucity of “good” tools for designing pipelines eHive has many design features for robustness and scalability: – Fault tolerance – Agents (“bees”) – Graph-based – Cloud/GRID-friendly Generic infrastructure: PERL, MySQL

Normalization scheme enables better detection of drug signals – Less susceptible to known confounders vorinostat trichostatin A antifungal drugs Calmodulin inhibitors Anti-neoplastic drugs Asthma drugs

Emtree = EMBASE’s MeSH equivalent; much more comprehensive in certain areas, e.g., pharmacology Caveat: SCOPUS is not EMBASE  SCOPUS does not support the kinds of complex Emtree queries EMBASE supports, as well as other features e.g., no thesaurus explosion in SCOPUS

CenterWatch Databases

Example reports…

Example Pipeline for Multiplying Large Numbers Pipeline defined in 4 files: – Start.pm splits a multiplication job into sub-tasks and creates corresponding jobs – PartMultiply.pm performs a partial multiplication and stores the intermediate result in a table – AddTogether.pm waits for partial multiplication results to compute and adds them together into final result – LongMult_conf.pm, the pipeline configuration module that links the previous Runnables into one pipeline

Features Used in Example Pipeline A pipeline can have multiple analyses (e.g.,'start', 'part_multiply' and 'add_together'). A job of one analysis can create jobs of other analyses by 'flowing the data' down branches. These branches are then assigned specific analysis names in the pipeline configuration file – one 'start' job flows partial multiplication subtasks down to branch #2, and a task of adding them together down branch #1. Execution of one analysis can be blocked until all jobs of another analysis have been successfully completed ('add_together' is blocked both by 'part_multiply'). eHive processes store intermediate and final results in a database (in this pipeline, 'intermediate_result' and 'final_result' tables are used).

Other Worthy Features eHive performance good for jobs that run for very short time but repeated millions of time – Converse of typical job scheduling systems, which have high latency

Butte Lab Journal Club 10/25/2010. Boltzmann machines able to solve difficult combinatorial problems Estimating the density function of multivariate.

Similar presentations

Presentation on theme: "Butte Lab Journal Club 10/25/2010. Boltzmann machines able to solve difficult combinatorial problems Estimating the density function of multivariate."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Butte Lab Journal Club 10/25/2010. Boltzmann machines able to solve difficult combinatorial problems Estimating the density function of multivariate.

Similar presentations

Presentation on theme: "Butte Lab Journal Club 10/25/2010. Boltzmann machines able to solve difficult combinatorial problems Estimating the density function of multivariate."— Presentation transcript:

Similar presentations

About project

Feedback