Computational Intelligence: Methods and Applications Lecture 1 Organization and overview Source: Włodzisław Duch; Dept. of Informatics, UMK; Google: W.

Computational Intelligence: Methods and Applications Lecture 1 Organization and overview Source: Włodzisław Duch; Dept. of Informatics, UMK; Google: W Duch

What is it all about? Many engineering and scientific problems may be solved by using numerical algorithms; theory is known, equations formulated, either analytical or numerical solutions are required. Any examples? Such problems require high-performance computing, but no intelligence: just press the key and wait for an answer. Other problems may be easily formulated, but all algorithms solving them may be NP hard, requiring almost infinite amount of computations to solve complex cases. Any examples? Yet other problems have no algorithms at all! Any examples? Problems, for which effective algorithms cannot be formulated require intelligence to solve them.

What this course covers This is a list of topics covered: Computational Intelligence overview, sources of inspiration, types of adaptive (learning) systems, types of applications. (~2 h) Visualization and exploratory data analysis: few variables, direct visualization, Principal Component Analysis (PCA), Multidimensional Scaling (MDS), Self-Organized Mappings (SOM), parallel coordinates and other visualization algorithms.(~9 h) Theory: overview of statistical approaches to learning, bias-variance decomposition, expectation maximization algorithm, model selection, evaluation of results, ROC curves. (~5 h)

What this course covers (cont) Introduction to Yale/WEKA and GhostMiner software packages, presentation of algorithms available in these packages (~5 h) Statistical algorithms: discriminant analysis - linear (LDA), Fisher (FDA), regularized (RDA), probabilistic data modeling, kernel methods (~5 h) Density estimation, expectation maximization, RBF and SFN networks, and rule induction (~4 h) Similarity based methods, generation of prototypes, similarity functions, separability criteria (~2 h) Improving CI models: boosting, stacking, ensemble learning, meta-learning, using information theory and other approaches for selection of relevant features (~6 h)

Personal information Name: Agus Zainal Arifin Head of the: Laboratory of Vision and Image Processing, Department of Informatics, Faculty of Information Technology Institut Teknologi Sepuluh Nopember (ITS) Surabaya Detailed information, CV, papers and lecture notes, project descriptions, photos from many conferences etc, are at WWW: Course: http://viplab.if.its.ac.id/ci Personal: http://agusza.if.its.ac.id/ This course page is linked to VIP Lab. Email: at my webpage.

Recommended books 3 best books covering foundations and various aspects of CI (with strong statistical bias) R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification (2nd Edition), J Wiley 2000 Amit Konar, Computational Intelligence. Principles, Techniques and Applications. Springer 2005. New book covering many advanced CI subject. T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning. Springer 2001.

Other useful books A. Webb, Statistical Pattern Recognition. Wiley, 2-nd ed. 2002 V. Kecman, Learning and soft computing, MIT Press 2001 Good intro book on neural, SVM and fuzzy subjects, detailed explanations, many problems. D. Hand, H. Mannila, P. Smyth, Principles of Data Mining, MIT Press 2001 I.H. Witten, E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann 1999 WEKA intro, but algorithm description is very sketchy. J. P. Marques De Sa, Pattern Recognition: Concepts, Methods, and Applications. Springer 2001. Small book, but useful overview. Tom Soukup and Ian Davidson, Visual Data Mining: Techniques and Tools for Data Visualization and Mining, 2002 C. Bishop, Pattern Recognition and Machine Learning. CUP 2006

8 Schedule per Week (3 sks) 1. Lecture 1, 2, 3 2. Lecture 4, 5, 6 3. Lecture 7, 8, 9 4. Evaluation 1 5. Lecture 10, 11, 12 6. Lecture 13, 14, 15 7. Lecture 16, 17, 18 8. Evaluation 2 9. Lecture 19, 20, 21 10. Lecture 22, 23, 24 11. Lecture 25, 26, 27 12. Evaluation 3 13. Lecture 28, 29, 30 14. Lecture 31, 32, 33 15. Lecture 34, 35, 36 16. Evaluation 4 17. Lecture 37, 38, Summary 18. Final Project Evaluation Work: Personal or group Type: Quiz (limited open), presentation, discussion, paper summarization and/or arrangement, final project

Computational Intelligence: Methods and Applications Lecture 2 Problems requiring CI Source: Włodzisław Duch; Dept. of Informatics, UMK; Google: W Duch

Some non-algorithmizable problems Understanding meaning of sentences (queries), all problems related to natural language analysis. Perception: recognition of signals, phoneme recognition, olfactory signals – first step in robotics. Visual perception: face recognition, object recognition and many computer vision problems. Hand-written characters recognition for PDAs or security. Control and planning problems in robotics and control of non- linear complex systems with many degrees of freedom. Medical diagnostics, interpretation of medical images and biomedical signals (EEG, ECG...), therapy planning. Playing complex games, like go or strategic war games. Solving untypical problems, or problems requiring creativity.

CI definition Many bad definitions have been proposed, listing computational methods used in CI; my working technical definition of CI is: Computational Intelligence (CI) is a branch of science dealing with problems that cannot be solved using effective computational algorithms (this does not mean that they cannot be solved using computations!). MIT Encyclopedia of Cognitive Sciences includes it among 6 main branches of cognitive sciences, aiming at understanding (natural science perspective), and creating (engineering perspective), animal-like (and human-like) cognitive systems. CS include philosophy, psychology, neurosciences, linguistics, CI, sociobiology (evolutionary perspectives on culture). In this course only engineering perspective is used.

Relations between CI, AI and CS Cognitive Sciences includes both low and high-level processes: Artificial Intelligence (AI) is part of CI that: is based on symbolic representation of knowledge; creates expert systems that help to reason; “knowledge engineering” is its most important branch. AI: is focused on higher cognitive processes, such as language, logic, reasoning, thinking, problem solving, sequential actions. CI: includes also basic sensory signal processing, low-level cognition, perception and control, senso-motoric behavior. CI methods may help to discover knowledge hidden in data. Only a few hybrid CI-AI systems exist, cognitive robotics needs them! CI and AI conferences and journals have little overlap.

Pattern recognition problems Classification, categorization, recognition of: images of all kinds, medical images; phonetic structure of speech signals; insects, birds, animals by their voices; biomedical signals, such as USG, fMRI; sonar, radar and other artificial signals; hand-sign speech and signals; machine and human intentions and behavioral patterns; chemical structures, QSAR (Qualitative Structure-Activity Relationships), rational drug design; biological structures, protein types, folding; silhouettes of cars and other vehicles, planes, ships; astronomical objects, recognition of types of stars and galaxies...

Data mining problems Data mining, or KDD, Knowledge Discovery in Data, aims at: understanding data, not just pattern classification; finding short explanation of data structure and association rules; discovering logical rules (classical and fuzzy) in the data; creating summaries of databases; Ex: discovering new rules in analytical chemistry, such as NMR spectroscopy rules; searching for knowledge in genomic and biological databases; automatically creating understandable models (theories) of data; creating causal models of processes, pathway databases in biological cells or hormone interactions in human body; using the knowledge discovered to reason in expert systems.

Selection of information Attention allows to select relevant perceptual signals, without attention we could not function. finding relevant features in medical, biological, chemical, astronomical, technical, business and other data; finding a right person for the job – what is important? filtering junk mail by automatically discovering rules (~ 9 G$ losses in US + 2.5 G$ in Europe in 2002, according to CNN), ; aggregation of features in a linear or non-linear way; reducing dimensionality of the problem locally or globally; creating new, more relevant features and concepts that are useful in reasoning; separation of signals into interesting components, cleaning of signals, removing noise and artifacts, separating music from noise and pictures/video from dirt and scratches...

Information Retrieval IR and IE (info extraction): indexing documents, library keywords. semantic information retrieval; searching in large databases and in the Web; search engines discovering preferences of users; automatic annotation of Web documents, or going from HTML=> XML => DAML to create semantic Web; document clustering, categorization and visualization of relations between documents and keywords; constructing concept spaces from keywords; latent semantic analysis and other methods of dimensionality reduction of document search terms; NLP (Natural Language Processing) parsing + other techniques; automatic summarization of texts, dialog and chatter bot systems.

Decision support Intelligent decision support: support for business decisions, like pricing, upgrading, personalization of services; increasing response rates of advertising campaigns; support for medical diagnostics, evidence-based medicine; creating causal networks to explain inferences; models of customer and shopping behavior or service user behavior; Andreas Weigend (www.weigend.com) “Amazon.com might be the world's largest laboratory to study human behavior and decision making.”

Control and planning CI is needed in: quality control in industrial processes; tuning of complex equipment, automatic camera focus; driving a car or airplane without human intervention; controlling chemical factory; controlling mob behavior; controlling a robot with many degrees of freedom; planning robot actions, surviving in hostile environment; developing toy robots such as AIBO dogs planning large-scale constructions, such as highways, skyscrapers, power plants; optimization of organization structure and functions...

Detection of regularities If we do not know what we are looking for: spontaneous, unsupervised learning; self-organization of feature detectors, automatic creation of elementary features; discovery and analysis of “interesting” clusters and structures in the data; detection of (ir)regularities in signals, for example in heart ECG; discovery of genes and other patterns in DNA strings; discovery of important active sites in proteins; use of the regularities discovered for construction of new features and new concepts, in the systematic reasoning.

Other problems Many other problems require CI techniques for their solution: strategic games, learning from errors and successes; multi-criterion optimization problems (have no solution); determining missing features; discovering errors and outliers in the data; learning to solve simple sub-problems and generalizing to complex problems; combining data from many sources into one model, like consumer preferences and TV watching behavior; prediction of time series, stock behavior, Sun activity, weather, human decisions and intentions; understanding of human mind, results of psychological experiments, ways of reasoning, categorization, planning, learning, recognizing emotions.

CI inspirations How to solve such problems? Humans and many animals are quite good in solving them. Although many specialized methods exist some adaptive (learning) algorithms that have wide applications were discovered. Biological inspirations help to formulate initial models. Neurobiological inspirations: how brains are solving such problems? Artificial Neural Networks (ANN) is a large field, many types of networks, from function mapping to hierarchical, self organizing, modular dynamical systems providing solutions to many problems. Some ANN models are close to statistics and pattern recognition. Some ANN models are inspired by specific brain structures, cf. CMAC (Cerebellar Model Arithmetic Computer); SDM (Sparse Distributed Memory); SOM (Self Organizing Maps). Computational Cognitive Neurosciences aim at models that are faithful to biology, results are compared with experiments.

Psychological inspirations How minds are solving such problems? Although mind is brain function, here instead of single neurons functions of larger brain structures (neural columns, modules, maps) are considered and used as inspiration. Connectionist models: networks of connected nodes, representing concepts, processing information in parallel distributed way, general graphical models and Bayesian causal influences networks. Learning through chunking, or hierarchical grouping of smaller units into more meaningful, complex structures replaced by a single symbol: from lines, curves, to letters, words, phrases and sentences to meaning; contrast that with neural learning (perceptrons).

Bio-medical inspirations How biology has solved the problem of species survival? Evolution leads to survival of the fittest, removing the weak. May be used for complex optimization problems: find parameters P that minimize some cost function E(P). May help to learn optimal parameters of a complex adaptive system (such as ANN), but is evolution really the most efficient? Swarm algorithms: find optimal solution by creating many initial parameters {P i } and moving them in parameter space, using cost function E(P i ) to determine interactions between “particles” {P i }. Ant algorithms: interactions of many ants and pheromone traces helping to find optimal sets of parameters. Useful if the cost functions changes during the process (moving target problems). Immunological system: inspirations from medicine.

Inspirations from logic Modern logic deals with uncertainty in the data. Crisp logic: yes or no only, {0,1}, good for symbolic problems. Q: Failed or passed? Teenager or not? Fuzzy logic: degrees of truth [0,1] modeled by “membership function”. Q: old or young? So-so? Perhaps 0.6 old and 0.4 young? Fuzzy = continuous generalization of multi-valued logic. Rough logic is based on rough sets: some objects certainly belong to a set, some certainly don’t, other objects maybe, to a degree determined by the data. Dempster-Shafer theory of evidence represents our knowledge of the truth of sets of possibilities. Uncertainty theory is a large field...

Math and statistics Probabilistic methods are the basis for inference and decision support. Object (sample) X = {X i } is a vector of features. Naive Bayesian classifiers treat all features as independent and are used for classification, estimating “a posteriori probabilities” p(C|X), of assigning an object (sample) X to some category C. Bayesian networks include mutual influences of different features X i using Bayes formula: p(C,X) = p(C|X) p(X) = p(X|C) p(C) Multivariate statistics has devised many classification and regression (approximation) methods. Clusterization and visualization methods also belong to statistics.

Machine learning and pattern recognition Pattern Recognition (PR) is an engineering field has developed many classification and approximation methods. Most popular PR methods are based on the k-NN (k Nearest Neighbors) rule or decision trees, recursively partitioning data into smaller subsets. Machine learning (ML) is a part of AI, using inductive methods to find good symbolic rules that classify or approximate data. ML method usually deal with symbolic data only, covering samples from different classes to find best sets of rules. AI also uses memory-based methods similar to k-NN, calling it memory-based reasoning or case-based reasoning, that estimates similarity to memorized cases. We will be able to cover only few of these methods...

Computational Intelligence: Data + Knowledge Soft Computing Neural networks Probabilistic models Machine learning AI, Expert Systems Pattern Recognition Fuzzy logic Evolutionary algorithms Visualization Multivariate statistics

Computational Intelligence: Methods and Applications Lecture 3 Histograms and probabilities. Source: Włodzisław Duch; Dept. of Informatics, UMK; Google: W Duch

Features AI uses complex knowledge representation methods. Pattern recognition is based mostly on simple feature spaces. Set of Objects: physical entities (images, patients, clients, molecules, cars, signal samples, software pieces), or states of physical entities (board states, patient states etc). Features: measurements or evaluation of some object properties. Ex: are pixel intensities good features? No - not invariant to translation/scaling/rotation. Better: type of connections, type of lines, number of lines... Selecting good features, transforming raw measurements that are collected, is very important.

Feature space representation Representation: mapping objects/states into vectors, {O i } => X(O i ), with X j (O i ) being j-th attribute of object O i Attribute” and “feature” are used as synonyms, although strictly speaking “age” is an attribute, and “young” is its feature value. Types of features. Categorical: symbolic or discrete – may be nominal (unordered), like “sweet, salty, sour”, or ordinal (can be ordered), like colors or small < medium < large (drink). Continuous: numerical values. x2x2 x1x1 x3x3 x(O) Vector X =(x 1,x 2,x 3... x d ), or a d-dimensional point in the feature space.

Fishy example. Chapter 1.2, Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000 Singapore Fisheries automates the process of sorting salmon and sea bass fish, coming on a conveyor belt. Optical sensor and processing software evaluate a number of features: length, lightness, width, #fins Step 1: look at the histograms. Select number of bins, ex. n =20 (discretize data) calculate bin size  =( x max  x min )/n, calculate N (C, r i ) = # samples in class C  {salmon, bass} in each bin r i = [ x min +(i-1) , x min +i  i=1...n normalize P(C,r i )=N(C,r i )/N, where N is the number of all samples. This gives joint probability P(C,r i ) = P(r i |C)P(C)

Fishy histograms. Example of histograms for two features, length (left) and skin lightness (right). Optimal thresholds are marked. P ( r i |C ) is an approximation to the class probability distribution P ( x|C ). How to calculate it? Discretization: replacing continuous values by discrete bins, or integration by summation, very useful when dealing with real data. Alternative: integrate, fit some simple functions to these histograms, for example a sum of several Gaussians, optimize their width and centers.

Fishy example. Exploratory data analysis (EDA): visualize relations in the data. How are histograms created? Select number of bins, ex. n =20 (discretize data into 20 bins) calculate bin size  =( x max  x min )/n, calculate N (C, r i ) = # samples from class C in each bin r i =[ x min +(i-1) , x min +i  i=1...n This may be converted to a joint probability P(C,r i ) that a fish of the type C will be found within length in bin r i P(C,r i ) = N(C,r i )/N, where N is the number of all samples. Histograms show joint probability P(C,r i ) rescaled by N.

Basic probability concepts. Normalized co-occurrence (contingency) table: P ( C,r i )= N ( C,r i )/N P ( C, r i ) – matrix, columns=bins r i, rows = classes. P ( C, r i ) – joint probability distribution, P of finding C and x  r i P ( C ) or a priori class probability, before making any measurements or learning that x  r i is in some bin, is obtained by summing the row. P ( x  r i ) or probability that object from any class is found in bin r i is obtained by summing the column.

PDF and conditional probability What if there x is a continuous variable and there are no natural bins? Then P ( C,x ) is a probability density function (pdf), and for small dx P ( C, [ x,x+dx] ) = P ( C,x ) dx Suppose now that class C is know; what is the probability of finding x  r i or for continuous features finding it in [ x,x+dx] interval? P ( x  r i |C ) denotes conditional probability distribution, knowing C. Because sum over all bins gives: P C ( x )= P ( x|C ) class probability distribution is simple rescaled joint probability, divide a single row of P ( C,x ) matrix by P ( C ). and for joint probability therefore the formula is

Probability formulas Most probability formulas are simple summations rules! Matrix of joint probability distributions: elements in P ( C, x ) for discrete x, or P ( C, x  r i ) after discretization. Just count how many times N(C,x) is found. P ( C, x ) = N(C,x)/N Row of P ( C, x ) sums to: therefore P(x|C)=P(C, x)/P(C) sums to For x continuous P ( C,x ) is the probability density distribution, integrating to P ( C )

Bayes formula Bayes formula allows for calculation of the conditional probability distribution P ( x|C ) and posterior P ( C|x ) probability distribution. These probabilities are renormalized elements of the joint probability P ( C,r i ). They sum to 1 since we know that the object with x  r i is from C class, and that given x it must be in one of the C classes. Therefore Bayes formula is quite obvious!

Example Two types of Iris flowers: Iris Setosa and Iris Virginica Measuring petal lengths in cm in two intervals, r 1 =[0,3] cm and r 2 =[3,6] cm of 100 samples we may get for example the following distribution: Therefore probabilities for finding different types of Iris flowers is: Calculate conditional probabilities and verify all formulas given on the previous pages! Do more examples.

Going to 2D histograms 2D histograms in 3D: still useful, although sometimes hard to analyze. For a single class it is still easy, but for >2 classes rather difficult. Many visualization software packages create such 3D plots. Joint probability P ( C,x,y ) is shown here, for each class on separate drawing; for small N it may be quite different than real distribution.

Examples of histograms Histograms in 2D: still useful, although sometimes hard to analyze. Made by SigmaPlot, Origin, or statistical packages like SPSS. Illustration of discretization (bin width) influence http://www.stat.sc.edu/~west/javahtml/Histogram.html Various chart applets: http://www.quadbase.com/espresschart/help/examples/ http://www.stat.berkeley.edu/~stark/SticiGui/index.htm Image histograms – popular in electronic cameras. How to improve the information content in histograms? How to analyze high-dimensional data?

2D histograms in bioinformatics Very popular, two nominal variables (genes, samples) vs. continuous variable (activity) normalized to [-1,+1]. Example: gene expression data in blood B-cell of 16 types. Use color instead of height. Intensity = -1 => bright green Intensity = 0 => black Intensity =+1 => bright red Intensity = -1 => inhibition Intensity = 0 => normal Intensity =+1 => high expression Gene activity(gen name, cell type)

Computational Intelligence: Methods and Applications Lecture 1 Organization and overview Source: Włodzisław Duch; Dept. of Informatics, UMK; Google: W.

Similar presentations

Presentation on theme: "Computational Intelligence: Methods and Applications Lecture 1 Organization and overview Source: Włodzisław Duch; Dept. of Informatics, UMK; Google: W."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Computational Intelligence: Methods and Applications Lecture 1 Organization and overview Source: Włodzisław Duch; Dept. of Informatics, UMK; Google: W.

Similar presentations

Presentation on theme: "Computational Intelligence: Methods and Applications Lecture 1 Organization and overview Source: Włodzisław Duch; Dept. of Informatics, UMK; Google: W."— Presentation transcript:

Similar presentations

About project

Feedback