Soft Computing Methods J.A. Johnson Dept. of Math and Computer Science Seminar Series February 8, 2013.

Soft Computing Methods J.A. Johnson Dept. of Math and Computer Science Seminar Series February 8, 2013

Outline Fuzzy Sets Neural Nets Rough Sets Bayesian Nets Genetic Algorithms

Fuzzy sets ◦ Fuzzy set theory is a means of specifying how well an object satisfies a vague description. ◦ A fuzzy set can be defined as a set with fuzzy boundaries ◦ Fuzzy sets were first introduced by Zadeh (1965).

First, the membership function must be determined.

Example Consider the proposition "Nate is tall." Is the proposition true if Nate is 5' 10"? The linguistic term "tall" does not refer to a sharp demarcation of objects into two classes—there are degrees of tallness.

Fuzzy set theory treats Tall as a fuzzy predicate and says that the truth value of Tall(Nate) is a number between 0 and 1, rather than being either true or false.

Let A denote the fuzzy set of all tall employees and x be a member of the universe X of all employees. What would the function μ A(x) look like

μ A(x) = 1 if x is definitely tall μ A(x) = 0 if x is definitely not tall 0 < μ A(x) <1 for borderline cases

Classical Set Fuzzy Set

Standard Fuzzy set operations Complement cA(x) = 1 − A(x) Intersection(A ∩ B)(x) = min [A(x), B(x)] Union(A ∪ B)(x) = max [A(x), B(x)]

The range of possible values of a linguistic variable represents the universe of discourse of that variable. A linguistic variable carries with it the concept of fuzzy set qualifiers, called hedges. Hedges are terms that modify the shape of fuzzy sets.

For instance, the qualifier “very” performs concentration and creates a new subset.(very, extremely) An operation opposite to concentration is dilation. It expands the set.(More or less, somewhat)

Hedge Mathematical Expression Graphical representation

Fuzzy logic is not logic that is fuzzy, but logic that is used to describe fuzziness. Fuzzy logic deals with degrees of truth.

1. Specify the problem and define linguistic variables. 2. Determine fuzzy sets. 3. Elicit and construct fuzzy rules. 4. Perform fuzzy inference. 5. Evaluate and tune the system.

Fuzzy inference Rule base : a set of fuzzy if-then-rules Database (or dictionary) : defines the membership functions used in the fuzzy rules A reasoning mechanism :performs the inference procedure (derive a conclusion from facts & rules!) Defuzzification: extraction of a crisp value that best represents a fuzzy set

Architecture of a fuzzy expert system fuzzification interface Inference engine Defuzzification interface Fuzzy rule base Crisp input rules Fuzzy input Fuzzy output Crisp output

Purpose: Estimate the maximum age group of the agricultural laborers having cardio vascular problems due to chemical pollution

Eight symptoms: S1 - Chest pain, S2 - Pain at the rib’s sides, S3 - Back pain, S4 - Shoulder pain, S5 - Left arm and leg pain, S6 - Swollen limbs, S7 - Burning chest and S8 - Blood pressure (low or high B.P)

To obtain an unbiased uniform effect on data so collected, we transform this initial matrix into an Average Time Dependent Data (ATD) matrix.(dividing each entry by the width of the respective class-interval.)

To simplify the calculations in the third stage we convert the average time dependent data matrix to a matrix with entries eij ∈ {-1,0,1}. This stage requires use of the following statistical parameters:

Using the average( υ j ), Standard Deviation( σ j ) and a parameter α from the interval [0, 1], a fuzzy matrix called the Refined Time Dependent Data Matrix (RTD Matrix) was formed. The RTD matrix with entries e ij, where e ij ∈ {-1, 0, 1}, was formed using the following formula [1] : If a ij <= ( υ j - α * σ j ) then e ij = -1 else if a ij ∈ ( υ j - α * σ j, υ j + α * σ j ) then e ij = 0 else if a ij >= ( υ j + α * σ j ) then e ij = 1 where a ij ’s are entries of Average Time Dependent Matrix. By varying the parameter α, any number of Refined Time Dependent Data Matrices can be obtained.

Three of such matrices obtained were as follows:

By combining these three matrices, the Combined Effect Time Dependent Data Matrix (CETD Matrix), gives the cumulative effect of all these symptoms.

software packages available to help with the development of fuzzy systems MAT-LAB Fuzzy Logic Toolbox, Mathematica Fuzzy Logic, SieFuzzy, fuzzyTech, TILShell, FIDE, RT/Fuzzy, Fuzzy Knowledge Build-er, and Fuzz-C. These packages provide user-friendly, graphical interfaces, which make the development process simple and efficient.

Limitations of fuzzy logic Verification and validation of a fuzzy knowledge-based system typically requires extensive testing on expensive hardware. Fuzzy systems cannot learn. Determining fuzzy rules and membership functions is a hard task. One cannot predict how many membership functions are required.

Examples of fuzzy expert systems Click on the links to view some examples Example 1: Classifying Houses Example 2: Representing Age Example 3: Finding the Disjunctive Sum Example 4: Natural Numbers Example 5: Fuzzy Hedges Example 6: Distance Relation Example 7: Choosing a Job Example 8: Digital Fuzzy Sets Example 9: Image Processing References http://www.ingentaconnect.com/search;jsessionid=2othqew1vf1it.alice?option 1=tka&value1=Fuzzy+expert+system

[1]Artificial Intelligence (A Guide to Intelligent Systems) 2 nd Edition by MICHAEL NEGNEVITSKY [2]An Introduction to Fuzzy Sets by Witold Pedrycz and Fernando Gomide [3]Fuzzy Sets and Fuzzy Logic: Theory and Applications by Bo Yuan and George J. [4]ELEMENTARY FUZZY MATRIX THEORY AND FUZZY MODELS FOR SOCIAL SCIENTISTS by W. B. Vasantha Kandasamy [5]Wikipedia: http://en.wikipedia.org/wiki/Fuzzy_logic [6] Wikipedia: http://en.wikipedia.org/wiki/Fuzzy References

References http://www.softcomputing.net/fuzzy_chapter.pdf http://www.cs.cmu.edu/Groups/AI/html/faqs/ai/fuzzy/part1/faq-doc-18.html http://www.mv.helsinki.fi/home/niskanen/zimmermann_review.pdf http://sawaal.ibibo.com/computers-and-technology/what-limits-fuzzy-logic- 241157.html http://sawaal.ibibo.com/computers-and-technology/what-limits-fuzzy-logic- 241157.html http://my.safaribooksonline.com/book/software-engineering-and- development/9780763776473/fuzzy- logic/limitations_of_fuzzy_systems#X2ludGVybmFsX0ZsYXNoUmVhZGV yP3htbGlkPTk3ODA3NjM3NzY0NzMvMTUy http://my.safaribooksonline.com/book/software-engineering-and- development/9780763776473/fuzzy- logic/limitations_of_fuzzy_systems#X2ludGVybmFsX0ZsYXNoUmVhZGV yP3htbGlkPTk3ODA3NjM3NzY0NzMvMTUy

Thanks to Ding Xu Edwige Nounang Ngnadjo For help with researching content and preparation of overheads on Fuzzy Sets

Artificial Neural Networks Neuron:basic information-processing units

Single neural network basic information-processing units

Single neural network

Active function The Step and Sign active function, also named hard limit functions, are mostly used in decision-making neurons. The Sigmoid function transforms the input, which can have any value between plus and minus infinity, into a reasonable value in the range between 0 and 1. Neurons with this function are used in the back-propagation networks. The Linear activation function provides an output equal to the neuron weighted input. Neurons with the linear function are often used for linear approximation.

How the machine learns: Perceptron(Neuron+Weight training)

The Algorithm of single neural network Step 1: Initialization Set initial weights w1,w2,...,wn and threshold to random numbers in the range [-0.5,0.5] 。 Step 2: Activation Step 3: Weight training Step 4: Iteration Increase iteration p by one, go back to Step 2 and repeat the process until convergence.

How the machine learns

The design of my program

The result

Problem

Multilayer neural network

References 1. http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html. http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html 2. Stuart J. Russell and Peter Norvig. Artificial Intelligence: A Modern Approach. Prentice Hall, 2009. 3. http://www.roguewave.com/Portals/0/products/imsl-numerical-libraries/c- library/docs/6.0/stat/default.htm?turl=multilayerfeedforwardneuralnetworks.htmhttp://www.roguewave.com/Portals/0/products/imsl-numerical-libraries/c- library/docs/6.0/stat/default.htm?turl=multilayerfeedforwardneuralnetworks.htm 4. Notes on Multilayer, Feedforward Neural Networks, Lynne E. Parker. 5.http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/cs11/report.html#Why use neural networks

Thanks to Hongming(Homer) Zuo Danni Ren For help with researching content and preparation of overheads on Neural Nets

Rough Sets Introduced by Zdzislaw Pawlak in the early 1980’s. Formal framework for the automated transformation of data into knowledge. Simplifies the search for dominant attributes in an inconsistent information table leading to derivation of shorter if- then rules.

Inconsistent Information Table Attributes Headache Temperature Decision Flu e1 yes normalno e2 yes highyes e3 yes very_highyes e4 no normalno e5 no highno e6 no very_highyes e7 no highyes e8 no very_highno

Certain rules for examples are: (Temperature, normal)  (Flu, no), (Headache, yes) and (Temperature, high)  (Flu, yes), (Headache, yes) and (Temperature, very_high)  (Flu, yes). Uncertain (or possible) rules are: (Headache, no)  (Flu, no), (Temperature, high)  (Flu, yes), (Temperature, very_high)  (Flu, yes).

Strength of a Rule Weights ◦ Coverage: # elements covered by rule # elements in universe ◦ Support: # positive elements covered by rule # elements in universe ◦ Degree of certainty: support x 100 coverage

Attribute Reduction Which are the dominate attributes? How do we determine redundant attributes?

Indiscernibility Classes An indiscernibility class, with respect to set of attributes X, is defined as a set of examples all of whose values for attributes x Є X agree For example, the indiscernibility classes with respect to attributes X = {Headache, Temperature} are {e1}, {e2}, {e3}, {e4}, {e5, e7} and {e6, e8}

Defined by a lower approximation and an upper approximation The lower approximation is X =   i  x  i The upper approximation is X=  (  i  x)    i

Lower and upper approximations of set X upper approximation of X Set X lower approximation of X e2e3 e7e6 e5e8 e1 e4

If the indiscernibility classes with and without attribute A are identical then attribute A is redundant.

Inconsistent Information Table Attributes Headache Temperature Pain Decision Flu e1 yes high yesyes e2 yes high yesyes e3 yes very_high yesyes e4 yes very_high yesyes e5 no high yesno e6 no very_high noyes e7 no high yesyes e8 no very_high nono

Inconsistent Information Table Attributes Headache Temperature Pain Decision Flu e9 no high noyes e10 no high yesno e11 no very_high nono e12 no high nono e13 no high yesno e14 no very_high nono e15 no high nono e16 no high nono

Example: Identifying Edible Mushrooms with ILA Algorithm

Mushrooms

Mushroom Dataset Dataset contains 8124 entries of different mushrooms Each entry (mushroom) has 22 different attributes

22 different attributes Cap-shape Cap-surface Cap-color Bruises Odor Gill-attachment Gill-spacing Gill-size Gill-color Stalk-shape Stalk-root Stalk-surface-above-ring Stalk-surface-below-ring Stalk-color-above-ring Stalk-color-below-ring Veil-type Veil-color Ring-number Ring-type Spore-print-color Population Habitat

Soft Values for Attributes almond anise creosote fishy foul musty none pungent spicy One of the attributes chosen is odor All the possible values are

Example of the dataset p,x,s,n,t,p,f,c,n,k,e,e,s,s,w,w,p,w,o,p,k,s,u e,x,s,y,t,a,f,c,b,k,e,c,s,s,w,w,p,w,o,p,n,n,g e,b,s,w,t,l,f,c,b,n,e,c,s,s,w,w,p,w,o,p,n,n,m p,x,y,w,t,p,f,c,n,n,e,e,s,s,w,w,p,w,o,p,k,s,u e,x,s,g,f,n,f,w,b,k,t,e,s,s,w,w,p,w,o,e,n,a,g

ILA Algorithm Was invented by Mehmed R. Tolun and Saleh M. Abu-Soud It is used for data mining Runs in a stepwise forward iteration Searches for a description that covers a relatively large number of data Outputs IF-THEN rules

General Requirements: Examples are listed in a tabular form where each row corresponds to an example and each column contains attribute values. A set of m training examples each example composed of k attributes and a class attribute with n possible decisions. A rule set R with an initial value of Ø All rows in the table are initially unmarked

ILA Algorithm Steps Step 1: Partition the table containing m examples into n sub-tables. One table for each possible value of the class attribute. ( Steps 2 through 8 are repeated for each sub- table ) Step 2: Initialize attribute combination count j as j = 1. Step 3: For the sub-table under consideration, divide the attribute list into distinct combinations, each combination with j distinct attributes.

ILA Algorithm Steps Step 4: For each combination of attributes, count the number of occurrences of attribute values that appear under the same combination of attributes in unmarked rows of the sub-table under consideration but at the same time that should not appear under the same combination of attributes of other sub-tables. Call the first combination with the maximum number of occurrences as max-combination.

ILA Algorithm Steps Step 5: If max-combination = Ø Step 6: Mark all rows of the sub-table under consideration, in which the values of max-combination appear, as classified. Step 7: Add a rule to R whose left hand side comprise attribute names of max-combination with their values separated by AND operator(s) and its right hand side contains the decision attribute value associated with the sub-table. Step 8: If all rows are marked as classified, then move on to process another sub-table and go to Step 3. Otherwise (i.e., if there are still unmarked rows) go to Step 4. If no sub-tables are available, exit with the set of rules obtained so far.

Output of the ILA algoritm 25 Rules (first 12 Rules) If stalk-color-above-ring=gray then edible. If odor=almond then edible. If odor=anise then edible. If population=abundant then edible. If stalk-color-below-ring=gray then edible. If habitat=waste then edible. If stalk-color-above-ring=orange then edible. If population=numerous then edible. If ring-type=flaring then edible. If cap-shape=sunken then edible. If spore-print-color=black and odor=none then edible. If spore-print-color=brown and odor=none then edible. RuleNo TP FN Error 1- 576 0 0.0 2- 400 0 0.0 3- 400 0 0.0 4- 384 0 0.0 5- 384 0 0.0 6- 192 0 0.0 7- 192 0 0.0 8- 144 0 0.0 9- 48 0 0.0 10- 32 0 0.0 11- 608 0 0.0 12- 608 0 0.0

Output of the ILA algorithm 2 25 Rules (Remaining 13 rules) If stalk-color-below-ring=brown and gill- spacing=crowded then edible. If spore-print-color=white and ring- number=two then edible. If odor=foul then poisonous. If gill-color=buff then poisonous. If odor=pungent then poisonous. If odor=creosote then poisonous. If spore-print-color=green then poisonous. If odor=musty then poisonous. If stalk-color-below-ring=yellow then poisonous. If cap-surface=grooves then poisonous. If cap-shape=conical then poisonous. If stalk-surface-above-ring=silky and gill- spacing=close then poisonous. If population=clustered and cap- color=white then poisonous. RuleNo TP FN Error 13- 48 0 0.0 14- 192 0 0.0 15- 2160 0 0.0 16- 1152 0 0.0 17- 256 0 0.0 18- 192 0 0.0 19- 72 0 0.0 20- 36 0 0.0 21- 24 0 0.0 22- 4 0 0.0 23- 1 0 0.0 24- 16 0 0.0 25- 3 0 0.0

Introduction to Bayesian Networks A probabilistic graphical model that represents a set of random variables and their conditional dependencies via a directed acyclic graph (DAG). Nodes, which are not connected, represent variables which are conditionally independent of each other.

Introduction to Bayesian Networks Each node is associated with a probability function that takes as input a particular set of values for the node's parent variables and gives the probability of the variable represented by the node. If the parents are m Boolean variables then the probability function could be represented by a table of 2 m entries, one entry for each of the 2 m possible combinations of its parents being true or false.

Example Suppose there are two events which could cause grass to be wet: either the sprinkler is on or it's raining. Also, suppose that the rain has a direct effect on the use of the sprinkler (namely that when it rains, the sprinkler is usually not turned on). Then the situation can be modeled with a Bayesian network. All three variables have two possible values, T (for true) and F (for false).

The joint probability function is: P(G,S,R) = P(G | S,R)P(S | R)P(R) where the names of the variables have been abbreviated to G = Grass wet, S = Sprinkler, and R = Rain.

The model can answer questions like "What is the probability that it is raining, given the grass is wet?“ By using the conditional probability formula and summing over all nuisance variables:

Example (continued)

Applications Biology and bioinformatics (gene regulatory networks, protein structure, gene expression analysis). Medicine. Document classification. Information retrieval. Image processing. Data fusion. Decision support systems. Engineering. Gaming. Law.

Reference [1] "Bayesian Probability Theory" in George F. Luger, William A. Stubbleeld, "Artificial Intelligence: Structures and Strategies for Complex Problem Solving", Second Edition, The Benjamin/Cummings Publishing Company, Inc., ISBN 0-8053-4780-1. [2] "Bayesian Reasoning" in Michael Negnevitsky, "Artificial Intelligence: A Guide to Intelligent Systems", Third Edition, Pearson Education Limited, ISBN 978-1- 4082-2574-5. [3] "Bayesian Network" in http://en.wikipedia.org/wiki/Bayesian_network. [4] "Probabilistic Graphical Model" in http://en.wikipedia.org/wiki/Graphical_model. [5] "Random Variables" in http://en.wikipedia.org/wiki/Random_variables. [6] "Conditional Independence" in http://en.wikipedia.org/wiki/Conditional_independence.

Reference [7] "Directed Acyclic Graph" in http://en.wikipedia.org/wiki/Directed_acyclic_graph. [8] "Inference" in http://en.wikipedia.org/wiki/Inference. [9] "Machine Learning" in http://en.wikipedia.org/wiki/Machine_learning. [10] "History" in http://en.wikipedia.org/wiki/Bayesian_network. [11] "Example" in http://en.wikipedia.org/wiki/Bayesian_network. [12] "Applications" in http://en.wikipedia.org/wiki/Bayesian_network. [13] "A simple Bayesian Network" figure in http://en.wikipedia.org/wiki/File:SimpleBayesNet.svg.

Reference [14] "Representation" in http://www.cs.ubc.ca/ murphyk/Bayes/bnintro.html#repr. [15] "Conditional Independence in Bayes Nets" in http://www.cs.ubc.ca/ murphyk/Bayes/bnintro.html#repr. [16] "Representation Example" figure in http://www.cs.ubc.ca/ murphyk/Bayes/bnintro.html#repr. [17] "Conditional Independence" figure in http://www.cs.ubc.ca/ murphyk/Bayes/bnintro.html#repr. [18] "Inference and Learning" in http://en.wikipedia.org/wiki/Bayesian_network. [19] "Decision Theory" in http://www.cs.ubc.ca/ murphyk/Bayes/bnintro.html#repr.

Thanks to Sheikh Shushmita Jahan For help with researching content and preparation of overheads on Bayesean Nets

Genetic Algorithms Use random numbers to search for near- optimal solutions. Use a process similar to the Theory of Evolution by Natural Selection proposed by Charles Darwin in his book On The Origin of Species. Apply the same rules as Natural Selection in order to find near-optimal solutions.

An initial population of candidate solutions is generated, the fitness of each solution is evaluated. the most-fit solutions are chosen to reproduce.

Candidate Solutions o An array of bytes: 00010101 00111010 11110000 o May be converted to string representation

FITNESS FUNCTION FITNESS FUNCTION o May be an integer representation (or score) o There should be a preset maximum or minimum score (to help with termination) o One of the bigger challenges of designing a genetic algorithm

C ROSSOVER An operation which is analogous to biological reproduction, in which parts of parent solutions are combined in order to produce offspring solutions. Typically, a single crossover point is chosen and the data beyond it are swapped in the children.

CROSSOVER

M UTATION An operation aimed at including diversity into successive generations of solutions. A mutation takes an existing solution to a problem and alters it in some way before including it in the next generation.

Using crossover points and mutation factors, offspring solutions are produced and added to the population. This procedure is repeated until a termination condition is reached (eg. sufficient fitness, time limit exceeded)

Initialization The creation of an initial population of solutions Random bytes or strings are generated: solutions = new array(size) for (i = 0; i < size; i++) new solution solution.value = random bytes or strings solution.fitness = 0 endfor

Selection Individual solutions are measured against the fitness function, and marked for either reproduction or removal

Selection Cont. for (i = 0; i < size; i++) solutions[i].fitness = fitnessFunction(i) endfor next = new array(maxSolutionsPerGeneration) fittest = solutions[0] for (i = 0; i < maxSolutionsPerGeneration; i++) for (j = 0; j < size; j++) if (fittest.fitness < solutions[j].fitness) fittest = solutions[j] endif endfor next[i] = fittest endfor solutions = next

Overall Algorithm initial population fitness function on individual solutions of initial population average fitness of all solutions loop (until terminating condition) select x solutions for reproduction combine pairs randomly mutate evaluate fitness determine average fitness end loop

References

Thanks to Devon Noel de Tilly Tyler Chamberland For help with researching content and preparation of overheads on Genetic Algorithms.

Hybridization (FS/NN) Fuzzy systems lack the capabilities of machine learning, as well as neural network-type memory and pattern recognition, therefore, hybrid systems(eg, neurofuzzy systems) are becoming more popular for specific applications.

Hybridization (RS/NN) Rough sets paradigm permits reduction of the number of inputs for a neural network as well as assists with the assignment of initial weights that are likely to cause the NN to converge more quickly.

Soft Computing Methods J.A. Johnson Dept. of Math and Computer Science Seminar Series February 8, 2013.

Similar presentations

Presentation on theme: "Soft Computing Methods J.A. Johnson Dept. of Math and Computer Science Seminar Series February 8, 2013."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Soft Computing Methods J.A. Johnson Dept. of Math and Computer Science Seminar Series February 8, 2013.

Similar presentations

Presentation on theme: "Soft Computing Methods J.A. Johnson Dept. of Math and Computer Science Seminar Series February 8, 2013."— Presentation transcript:

Similar presentations

About project

Feedback