Presentation on theme: "PSO for Bioinformatics Alex Freitas and Colin Johnson University of Kent."— Presentation transcript:
PSO for Bioinformatics Alex Freitas and Colin Johnson University of Kent
People involved in swarm intelligence research at Kent (1) XPS Project –Alex Freitas (Lecturer) –Colin Johnson (Lecturer) –Elon Correa (RA – will start soon) –Mudassar Iqbal (PhD student – started Nov. 2004) Initially investigating a dynamic neighborhood topology Interested in bioinformatics – problem to be defined
People involved in swarm intelligence research at Kent (2) Other research students –Terry Arnold (3rd-year PhD student) Supervised by Colin Doing research on force-based PSO –Nick Holden (MRes student) Supervised by Alex Doing research on “A hybrid PSO/ACO algorithm for hierarchical classification of biological data (enzymes)” –Allen Chan (MRes student) Supervised by Alex Doing research on an ACO algorithm for classification of biological data – multi-label classification problem
Introduction to Classification Each record (example) belongs to a predefined class Each example consists of two parts: –, e.g.: – Goal: to predict the class of an example, based on the values of the predictor attributes for that example
Hierarchical classification (1) Hierarchical classesEnzyme Commision root(EC) codes have 4 levels, e.g. EC.220.127.116.11 1 2 most general class 1.1 1.2 1.3 2.1 2.2 most specific class
Hierarchical classification (2) Challenges –Several predictions must be made for each example – one predicted class at each level of the hierarchy –As we go down the hierarchy, there are fewer examples (records) per class – “data fragmentation” Opportunities –Information of “class similarities” in the hierarchy –Top-Down approach: first predict top-level class, then predict second-level class among children of predicted top-level class, etc., until a leaf class is predicted –Cost of misclassifying 1.1 into 1.2 is smaller than cost of misclassifying 1.1 into 2.1
A hybrid PSO/ACO algorithm – basic ideas Each particle represents a candidate classification rule Continuous (real-valued) attributes – standard PSO Categorical (nominal) attributes – special treatment; e.g. Gender: “F” or “M” (unordered values) Each categorical attribute is represented by a “pheromone vector”, with one element for each attribute value plus one element for “not used in rule” F M “off” (not used in rule) Pheromone: 0.6 0.1 0.3 General motivation: ACO algorithms, using pheromone, cope well with discrete data
A hybrid PSO/ACO algorithm for predicting hierarchical enzyme classes (1) Class attribute: 4-digit EC code (4 levels of classes) Predictor attributes: Prosite patterns (motifs) A particle represents a classification rule: pattern 1..... pattern n yes no off yes no off class 0.3 0.1 0.6 0.8 0.1 0.1 EC.18.104.22.168 The particle is “decoded” into a rule by choosing a value (“yes”, “no”, “off”) for each attribute, with probability given by its pheromone vector Pheromone values are updated based on rule quality Particle also moves towards previous best and local best
A hybrid PSO/ACO algorithm for predicting hierarchical enzyme classes (2) Algorithm follows top-down (greedy) approach: –first discover rules predicting 1st-level class, then discover rules predicting 2nd level class, etc. –this sequential procedure is used in both training and testing Preliminary results (varying some parameters) –Predictive accuracy at level 1 (6 classes): 94.9-96.7% –Predictive accuracy at level 2 (51 classes): 72.3-90.3% Current/Future work –Prediction of levels 3 and 4 of EC code; other data sets –Consider different misclassification costs –Develop a less greedy method for top-down classification (allowing the recovery from errors in higher levels)
Force-based particle swarms Drawing inspiration from physics –In particular, ways of simulating fluid flow The idea is to control the flow of particles by assigning forces between particle types, then letting the process run to completion. We can use different force types: –Electromagnetic forces –Gravitational forces –Linear distance-based forces –Lennard-Jones potential –...
Force based programming language One idea is to create a force-based programming language. We express the problem by saying how forces between pairs of particle types interact. Example: clustering –Create fixed particles for the data –Create k classes of particles for the cluster-markers –Rules: All cluster-markers repel at close range Cluster-markers of different types always repel Cluster-markers are attracted to data.
Applications Currently applying this to classification algorithms in bioinformatics. Data points will be fixed in the space. Particle attraction/repulsion will be learned using a GA/GP type strategy to learn: –The forces that apply between particle types –The shapes of the possible force profiles.