Download presentation
Presentation is loading. Please wait.
Published byRaven Barrick Modified over 9 years ago
1
Machine Learning
2
Predictive analytics Machine learning techniques
For such cases, machine learning techniques emulate human cognition and learn from training examples to predict future events.
3
Predictive analytics Machine learning techniques
A brief discussion of some of these methods used commonly for predictive analytics is provided below. A detailed study of machine learning can be found in Mitchell (1997).
4
Decentralized Autonomous Corporation - Machine learning layer
This layer runs the Artificial Intelligence algorithm that the DAC relies on to detect patterns in real-world data and model it without human intervention.
5
Machine learning Machine learning, a branch of Artificial Intelligence, concerns the construction and study of systems that can learn from data. For example, a machine learning system could be trained on messages to learn to distinguish between spam and non-spam messages. After learning, it can then be used to classify new messages into spam and non-spam folders.
6
Machine learning The core of machine learning deals with representation and generalization. Representation of data instances and functions evaluated on these instances are part of all machine learning systems. Generalization is the property that the system will perform well on unseen data instances; the conditions under which this can be guaranteed are a key object of study in the subfield of computational learning theory.
7
Machine learning There is a wide variety of machine learning tasks and successful applications. Optical character recognition, in which printed characters are recognized automatically based on previous examples, is a classic example of machine learning.
8
Machine learning - Definition
In 1959, Arthur Samuel defined machine learning as a "Field of study that gives computers the ability to learn without being explicitly programmed".
9
Machine learning - Definition
This definition is notable for its defining machine learning in fundamentally operational rather than cognitive terms, thus following Alan Turing's proposal in Turing's paper "Computing Machinery and Intelligence" that the question "Can machines think?" be replaced with the question "Can machines do what we (as thinking entities) can do?"
10
Machine learning - Generalization
A core objective of a learner is to generalize from its experience
11
Machine learning - Machine learning and data mining
These two terms are commonly confused, as they often employ the same methods and overlap significantly. They can be roughly defined as follows:
12
Machine learning - Machine learning and data mining
Machine learning focuses on prediction, based on known properties learned from the training data.
13
Machine learning - Machine learning and data mining
Data mining focuses on the discovery of (previously) unknown properties in the data. This is the analysis step of Knowledge Discovery in Databases.
14
Machine learning - Machine learning and data mining
Much of the confusion between these two research communities (which do often have separate conferences and separate journals, ECML PKDD being a major exception) comes from the basic assumptions they work with: in machine learning, performance is usually evaluated with respect to the ability to reproduce known knowledge, while in Knowledge Discovery and Data Mining (KDD) the key task is the discovery of previously unknown knowledge
15
Machine learning - Human interaction
Some machine learning systems attempt to eliminate the need for human intuition in data analysis, while others adopt a collaborative approach between human and machine. Human intuition cannot, however, be entirely eliminated, since the system's designer must specify how the data is to be represented and what mechanisms will be used to search for a characterization of the data.
16
Machine learning - Algorithm types
Machine learning algorithms can be organized into a taxonomy based on the desired outcome of the algorithm or the type of input available during training the machine.
17
Machine learning - Algorithm types
Supervised learning algorithms are trained on labelled examples, i.e., input where the desired output is known. The supervised learning algorithm attempts to generalise a function or mapping from inputs to outputs which can then be used to speculatively generate an output for previously unseen inputs.
18
Machine learning - Algorithm types
Unsupervised learning algorithms operate on unlabelled examples, i.e., input where the desired output is unknown. Here the objective is to discover structure in the data (e.g. through a cluster analysis), not to generalise a mapping from inputs to outputs.
19
Machine learning - Algorithm types
Semi-supervised learning combines both labeled and unlabelled examples to generate an appropriate function or classifier.
20
Machine learning - Algorithm types
Transduction, or transductive inference, tries to predict new outputs on specific and fixed (test) cases from observed, specific (training) cases.
21
Machine learning - Algorithm types
Reinforcement learning is concerned with how intelligent agents ought to act in an environment to maximise some notion of reward. The agent executes actions which cause the observable state of the environment to change. Through a sequence of actions, the agent attempts to gather knowledge about how the environment responds to its actions, and attempts to synthesise a sequence of actions that maximises a cumulative reward.
22
Machine learning - Algorithm types
Learning to learn learns its own inductive bias based on previous experience.
23
Machine learning - Algorithm types
Developmental learning, elaborated for Robot learning, generates its own sequences (also called curriculum) of learning situations to cumulatively acquire repertoires of novel skills through autonomous self-exploration and social interaction with human teachers, and using guidance mechanisms such as active learning, maturation, motor synergies, and imitation.
24
Machine learning - Theory
The computational analysis of machine learning algorithms and their performance is a branch of theoretical computer science known as computational learning theory. Because training sets are finite and the future is uncertain, learning theory usually does not yield guarantees of the performance of algorithms. Instead, probabilistic bounds on the performance are quite common.
25
Machine learning - Theory
In addition to performance bounds, computational learning theorists study the time complexity and feasibility of learning. In computational learning theory, a computation is considered feasible if it can be done in polynomial time. There are two kinds of time complexity results. Positive results show that a certain class of functions can be learned in polynomial time. Negative results show that certain classes cannot be learned in polynomial time.
26
Machine learning - Theory
There are many similarities between machine learning theory and statistical inference, although they use different terms.
27
Machine learning - Decision tree learning
Decision tree learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the item's target value.
28
Machine learning - Association rule learning
29
Machine learning - Association rule learning
Association rule learning is a method for discovering interesting relations between variables in large databases.
30
Machine learning - Artificial neural networks
31
Machine learning - Artificial neural networks
An artificial neural network (ANN) learning algorithm, usually called "neural network" (NN), is a learning algorithm that is inspired by the structure and functional aspects of biological neural networks
32
Machine learning - Inductive logic programming
33
Machine learning - Inductive logic programming
Inductive logic programming (ILP) is an approach to rule learning using logic programming as a uniform representation for examples, background knowledge, and hypotheses. Given an encoding of the known background knowledge and a set of examples represented as a logical database of facts, an ILP system will derive a hypothesized logic program which entails all the positive and none of the negative examples.
34
Machine learning - Support vector machines
35
Machine learning - Support vector machines
Support vector machines (SVMs) are a set of related supervised learning methods used for classification and regression. Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that predicts whether a new example falls into one category or the other.
36
Machine learning - Clustering
Cluster analysis is the assignment of a set of observations into subsets (called clusters) so that observations within the same cluster are similar according to some predesignated criterion or criteria, while observations drawn from different clusters are dissimilar
37
Machine learning - Bayesian networks
A Bayesian network, belief network or directed acyclic graphical model is a probabilistic graphical model that represents a set of random variables and their conditional independencies via a directed acyclic graph (DAG)
38
Machine learning - Reinforcement learning
Reinforcement learning is concerned with how an agent ought to take actions in an environment so as to maximize some notion of long-term reward. Reinforcement learning algorithms attempt to find a policy that maps states of the world to the actions the agent ought to take in those states. Reinforcement learning differs from the supervised learning problem in that correct input/output pairs are never presented, nor sub-optimal actions explicitly corrected.
39
Machine learning - Representation learning
Several learning algorithms, mostly unsupervised learning algorithms, aim at discovering better representations of the inputs provided during training
40
Machine learning - Similarity and metric learning
In this problem, the learning machine is given pairs of examples that are considered similar and pairs of less similar objects. It then needs to learn a similarity function (or a distance metric function) that can predict if new objects are similar. It is sometimes used in Recommendation systems.
41
Machine learning - Sparse Dictionary Learning
In this method, a datum is represented as a linear combination of basis functions, and the coefficients are assumed to be sparse. Let x be a d-dimensional datum, D be a d by n matrix, where each column of D represents a basis function. r is the coefficient to represent x using D. Mathematically, sparse dictionary learning means the following where r is sparse. Generally speaking, n is assumed to be larger than d to allow the freedom for a sparse representation.
42
Machine learning - Sparse Dictionary Learning
Sparse dictionary learning has been applied in several contexts
43
Machine learning - Applications
Applications for machine learning include:
44
Machine learning - Applications
In 2006, the online movie company Netflix held the first "Netflix Prize" competition to find a program to better predict user preferences and improve the accuracy on its existing Cinematch movie recommendation algorithm by at least 10%. A joint team made up of researchers from AT&T Labs-Research in collaboration with the teams Big Chaos and Pragmatic Theory built an ensemble model to win the Grand Prize in 2009 for $1 million.
45
Machine learning - Applications
In 2010 The Wall Street Journal wrote about a money management firm Rebellion Research's use of machine learning to predict economic movements, the article talks about Rebellion Research's prediction of the financial crisis and economic recovery.
46
Machine learning - Software
Ayasdi, Angoss KnowledgeSTUDIO, Apache Mahout, Gesture Recognition Toolkit, IBM SPSS Modeler, KNIME, KXEN Modeler, LIONsolver, MATLAB, mlpy, MCMLL, OpenCV, dlib, Oracle Data Mining, Orange, Python scikit-learn, R, RapidMiner, Salford Predictive Modeler, SAS Enterprise Miner, Shogun toolbox, STATISTICA Data Miner, and Weka are software suites containing a variety of machine learning algorithms.
47
Machine learning - Journals and conferences
Journal of Machine Learning Research
48
Machine learning - Journals and conferences
Neural Computation (journal)
49
Machine learning - Journals and conferences
Journal of Intelligent Systems(journal)
50
Machine learning - Journals and conferences
Neural Information Processing Systems (NIPS) (conference)
51
Machine learning - Further reading
Mehryar Mohri, Afshin Rostamizadeh, Ameet Talwalkar (2012). Foundations of Machine Learning, The MIT Press. ISBN
52
Machine learning - Further reading
Ian H. Witten and Eibe Frank (2011). Data Mining: Practical machine learning tools and techniques Morgan Kaufmann, 664pp., ISBN
53
Machine learning - Further reading
Sergios Theodoridis, Konstantinos Koutroumbas (2009) "Pattern Recognition", 4th Edition, Academic Press, ISBN
54
Machine learning - Further reading
Mierswa, Ingo and Wurst, Michael and Klinkenberg, Ralf and Scholz, Martin and Euler, Timm: YALE: Rapid Prototyping for Complex Data Mining Tasks, in Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-06), 2006.
55
Machine learning - Further reading
Bing Liu (2007), Web Data Mining: Exploring Hyperlinks, Contents and Usage Data. Springer, ISBN
56
Machine learning - Further reading
Huang T.-M., Kecman V., Kopriva I. (2006), Kernel Based Algorithms for Mining Huge Data Sets, Supervised, Semi-supervised, and Unsupervised Learning, Springer-Verlag, Berlin, Heidelberg, 260 pp. 96 illus., Hardcover, ISBN
57
Machine learning - Further reading
Ethem Alpaydın (2004) Introduction to Machine Learning (Adaptive Computation and Machine Learning), MIT Press, ISBN
58
Machine learning - Further reading
MacKay, D.J.C. (2003). Information Theory, Inference, and Learning Algorithms, Cambridge University Press. ISBN
59
Machine learning - Further reading
KECMAN Vojislav (2001), Learning and Soft Computing, Support Vector Machines, Neural Networks and Fuzzy Logic Models, The MIT Press, Cambridge, MA, 608 pp., 268 illus., ISBN
60
Machine learning - Further reading
Richard O. Duda, Peter E. Hart, David G. Stork (2001) Pattern classification (2nd edition), Wiley, New York, ISBN
61
Machine learning - Further reading
Bishop, C.M. (1995). Neural Networks for Pattern Recognition, Oxford University Press. ISBN
62
Machine learning - Further reading
Ryszard S. Michalski, George Tecuci (1994), Machine Learning: A Multistrategy Approach, Volume IV, Morgan Kaufmann, ISBN
63
Machine learning - Further reading
Sholom Weiss and Casimir Kulikowski (1991). Computer Systems That Learn, Morgan Kaufmann. ISBN
64
Machine learning - Further reading
Yves Kodratoff, Ryszard S. Michalski (1990), Machine Learning: An Artificial Intelligence Approach, Volume III, Morgan Kaufmann, ISBN
65
Machine learning - Further reading
Ryszard S. Michalski, Jaime G. Carbonell, Tom M. Mitchell (1986), Machine Learning: An Artificial Intelligence Approach, Volume II, Morgan Kaufmann, ISBN
66
Machine learning - Further reading
Ryszard S. Michalski, Jaime G. Carbonell, Tom M. Mitchell (1983), Machine Learning: An Artificial Intelligence Approach, Tioga Publishing Company, ISBN
67
Machine learning - Further reading
Ray Solomonoff, An Inductive Inference Machine, IRE Convention Record, Section on Information Theory, Part 2, pp., 56-62, 1957.
68
Machine learning - Further reading
Ray Solomonoff, "An Inductive Inference Machine" A privately circulated report from the 1956 Dartmouth Summer Research Conference on AI.
69
Natural language processing - NLP using machine learning
The paradigm of machine learning is different from that of most prior attempts at language processing
70
Natural language processing - NLP using machine learning
Many different classes of machine learning algorithms have been applied to NLP tasks
71
Natural language processing - NLP using machine learning
Systems based on machine-learning algorithms have many advantages over hand-produced rules:
72
Natural language processing - NLP using machine learning
The learning procedures used during machine learning automatically focus on the most common cases, whereas when writing rules by hand it is often not obvious at all where the effort should be directed.
73
Natural language processing - NLP using machine learning
Automatic learning procedures can make use of statistical inference algorithms to produce models that are robust to unfamiliar input (e.g. containing words or structures that have not been seen before) and to erroneous input (e.g. with misspelled words or words accidentally omitted). Generally, handling such input gracefully with hand-written rules — or more generally, creating systems of hand-written rules that make soft decisions — is extremely difficult, error-prone and time-consuming.
74
Natural language processing - NLP using machine learning
Systems based on automatically learning the rules can be made more accurate simply by supplying more input data
75
Natural language processing - NLP using machine learning
The subfield of NLP devoted to learning approaches is known as Natural Language Learning (NLL) and its conference CoNLL and peak body SIGNLL are sponsored by ACL, recognizing also their links with Computational Linguistics and Language Acquisition. When the aims of computational language learning research is to understand more about human language acquisition, or psycholinguistics, NLL overlaps into the related field of Computational Psycholinguistics.
76
Functional decomposition - Machine learning
In practical scientific applications, it is almost never possible to achieve perfect functional decomposition because of the incredible complexity of the systems under study. This complexity is manifested in the presence of "noise," which is just a designation for all the unwanted and untraceable influences on our observations.
77
Functional decomposition - Machine learning
However, while perfect functional decomposition is usually impossible, the spirit lives on in a large number of statistical methods that are equipped to deal with noisy systems
78
Functional decomposition - Machine learning
As an example, Bayesian network methods attempt to decompose a joint distribution along its causal fault lines, thus "cutting nature at its seams"
79
Data compression - Machine learning
There is a close connection between machine learning and compression: a system that predicts the posterior probabilities of a sequence given its entire history can be used for optimal data compression (by using arithmetic coding on the output distribution) while an optimal compressor can be used for prediction (by finding the symbol that compresses best, given the previous history). This equivalence has been used as justification for data compression as a benchmark for general intelligence.
80
Self-modifying code - Self-referential machine learning systems
Traditional machine learning systems have a fixed, pre-programmed learning algorithm to adjust their parameters. However, since the 1980s Jürgen Schmidhuber has published several self-modifying systems with the ability to change their own learning algorithm. They avoid the danger of catastrophic self-rewrites by making sure that self-modifications will survive only if they are useful according to a user-given fitness function|fitness, error function|error or reward function|reward function.
81
Andrew Ng - Machine learning research
In 2011, Ng founded the Google Brain project at Google, which developed very large scale artificial neural networks using Google's distributed compute infrastructure.
82
Andrew Ng - Machine learning research
Among its notable results was a neural network trained using deep learning algorithms on 16,000 CPU cores, that learned to recognize higher-level concepts, such as cats, after watching only YouTube videos, and without ever having been told what a cat is.
83
Andrew Ng - Machine learning research
The project's technology is currently also used in the Android (Operating System)|Android Operating System's speech recognition system.
84
Pattern recognition - Classification (machine learning)|Classification algorithms (supervised learning|supervised algorithms predicting categorical data|categorical labels) Parametric:Assuming known distributional shape of feature distributions per class, such as the Gaussian distribution|Gaussian shape.
85
Pattern recognition - Classification (machine learning)|Classification algorithms (supervised learning|supervised algorithms predicting categorical data|categorical labels) *Maximum entropy classifier (aka logistic regression, multinomial logistic regression): Note that logistic regression is an algorithm for classification, despite its name. (The name comes from the fact that logistic regression uses an extension of a linear regression model to model the probability of an input being in a particular class.)
86
Pattern recognition - Classification (machine learning)|Classification algorithms (supervised learning|supervised algorithms predicting categorical data|categorical labels) Nonparametric:No distributional assumption regarding shape of feature distributions per class.
87
Pattern recognition - Classification (machine learning)|Classification algorithms (supervised learning|supervised algorithms predicting categorical data|categorical labels) *Variable kernel density estimation#Use for statistical classification|Kernel estimation and K-nearest-neighbor algorithms
88
*Neural networks (multi-layer perceptrons)
Pattern recognition - Classification (machine learning)|Classification algorithms (supervised learning|supervised algorithms predicting categorical data|categorical labels) *Neural networks (multi-layer perceptrons)
89
*Support vector machines
Pattern recognition - Classification (machine learning)|Classification algorithms (supervised learning|supervised algorithms predicting categorical data|categorical labels) *Support vector machines
90
List of machine learning algorithms - Supervised learning
* Artificial neural network
91
List of machine learning algorithms - Supervised learning
** Spiking neural networks
92
List of machine learning algorithms - Supervised learning
* Inductive logic programming
93
List of machine learning algorithms - Supervised learning
* Gaussian process regression
94
List of machine learning algorithms - Supervised learning
* Group method of data handling (GMDH)
95
List of machine learning algorithms - Supervised learning
* Learning Automata
96
List of machine learning algorithms - Supervised learning
* Learning Vector Quantization
97
List of machine learning algorithms - Supervised learning
* Minimum message length (decision trees, decision graphs, etc.)
98
List of machine learning algorithms - Supervised learning
* Ripple down rules, a knowledge acquisition methodology
99
List of machine learning algorithms - Supervised learning
* Subsymbolic machine learning algorithms
100
List of machine learning algorithms - Supervised learning
* Support vector machines
101
List of machine learning algorithms - Supervised learning
* Information Fuzzy Networks|Information fuzzy networks (IFN)
102
List of machine learning algorithms - Statistical classification
** Multinomial logistic regression
103
List of machine learning algorithms - Statistical classification
** Support vector machines
104
List of machine learning algorithms - Unsupervised learning
* Radial basis function network
105
List of machine learning algorithms - Unsupervised learning
* Vector Quantization
106
List of machine learning algorithms - Association rule learning
* Association_rule_learning#FP-growth_algorithm|FP-growth algorithm
107
List of machine learning algorithms - Hierarchical clustering
* Conceptual clustering
108
List of machine learning algorithms - Deep learning
* Deep Convolutional neural networks
109
Identity resolution - Machine learning
Higher accuracy can often be achieved by using various other machine learning techniques, including a single-layer perceptron.Wilson, D
110
Bootstrapping - Artificial intelligence and machine learning
Bootstrapping is a technique used to iteratively improve a classifier (machine learning)|classifier's performance. Seed AI is a hypothesized type of strong Artificial Intelligence capable of recursion|recursive self-improvement. Having improved itself, it would become better at improving itself, potentially leading to an exponential increase in intelligence. No such AI is known to exist, but it remains an active field of research.
111
Bootstrapping - Artificial intelligence and machine learning
Seed AI is a significant part of some theories about the technological singularity: proponents believe that the development of seed AI will rapidly yield ever-smarter intelligence (via bootstrapping) and thus a new era.
112
Monte Carlo Machine Learning Library
The 'Monte Carlo Machine Learning Library (MCMLL)' is an open source C++ template library which already relies on some C++0x specs. MCMLL is licensed under the GNU GPL. It is developed under the 64 bit Linux OS. MCMLL should be usable on other platforms as well, since it is based on International Organization for Standardization|ISO C++.
113
Monte Carlo Machine Learning Library
The philosophy behind MCMLL is to have a broad range support for Monte Carlo methods to implement machine learning applications. Since Monte Carlo methods are inherently Parallel algorithm|parallelizable, the goal is to provide multi-threaded implementations of the most important methods.
114
Monte Carlo Machine Learning Library - Overview
* complete framework for vector and matrix computations
115
Monte Carlo Machine Learning Library - Overview
* multi-threaded support for generic Evolutionary algorithms (EA)
116
Monte Carlo Machine Learning Library - Overview
* support for generic Sequential Monte Carlo methods ('Particle Filtering').
117
Monte Carlo Machine Learning Library - Overview
Example applications include:
118
Monte Carlo Machine Learning Library - Overview
* support for learning Artificial Neural Networks (ANN) using EA's
119
Monte Carlo Machine Learning Library - Overview
* example programs for Sequential Monte Carlo methods ('Particle Filtering')
120
Monte Carlo Machine Learning Library - Overview
* a benchmark suite for testing and implementing Evolutionary Algorithms.
121
Monte Carlo Machine Learning Library - Supported Evolutionary Algorithms
DOI= /TEVC without history, R2DE,Onay Urfalioglu and Orhan Arikan, Randomized and Rank Based Differential Evolution, Machine Learning and Applications, Fourth International Conference on, vol
122
* Covariance Matrix Adaptation Evolution Strategies (CMA-ES)
Monte Carlo Machine Learning Library - Supported Evolutionary Algorithms * Covariance Matrix Adaptation Evolution Strategies (CMA-ES)
123
Monte Carlo Machine Learning Library - Supported Sequential Monte Carlo Methods
For particle filtering, the Particle filter|Sequential Importance Resampling (SIR) method is supported. To create an SMC application based on MCMLL, one has to define an observation distribution, a transition distribution and optionally an importance distribution to be used in the SIR operator.
124
Online machine learning
Online machine learning is a model of inductive reasoning|induction that learns one instance at a time
125
Online machine learning
Third the algorithm receives the true label of the instance.Littlestone, Nick; (1988) Learning Quickly When Irrelevant Attributes Abound: A New Linear-threshold Algorithm, Machine Learning (2), Kluwer Academic Publishers The third stage is the most crucial as the algorithm can use this label feedback to update its hypothesis for future trials
126
Online machine learning
Because on-line learning algorithms continually receive label feedback, the algorithms are able to adapt and learn in difficult situations
127
Online machine learning
Unfortunately, the main difficulty of on-line learning is also a result of the requirement for continual label feedback
128
Online machine learning - A prototypical online supervised learning algorithm
In the setting of supervised learning, or learning from examples, we are interested in learning a function f : X \to Y, where X is thought of as a space of inputs and Y as a space of outputs, that predicts well on instances that are drawn from a joint probability distribution p(x,y) on X \times Y
129
Online machine learning - A prototypical online supervised learning algorithm
In reality, the learner never knows the true distribution p(x,y) over instances
130
Online machine learning - A prototypical online supervised learning algorithm
The above paradigm is not well-suited to the online learning setting though, as it requires complete a priori knowledge of the entire training set
131
Online machine learning - The algorithm and its interpretations
Here we outline a prototypical online learning algorithm in the supervised learning setting and we discuss several interpretations of this algorithm
132
Online machine learning - The algorithm and its interpretations
where w_1 \gets 0 , \nabla V(\langle w_t, x_t \rangle, y_t) is the gradient of the loss for the next data point (x_t, y_t) evaluated at the current linear functional w_t, and \gamma_t !-- Bot inserted parameter. Either remove it; or change its value to . for the cite to end in a ., as necessary. --ref name=kushneryinreferences /
133
Weka (machine learning)
'Weka' (Waikato Environment for Knowledge Analysis) is a popular suite of machine learning software written in Java (programming language)|Java, developed at the University of Waikato, New Zealand. Weka is free software available under the GNU General Public License.
134
Weka (machine learning) - Description
The original non-Java version of Weka was a Tcl|TCL/TK front-end to (mostly third-party) modeling algorithms implemented in other programming languages, plus data preprocessing utilities in C (programming language)|C, and a Makefile-based system for running machine learning experiments
135
Weka (machine learning) - Description
* portability, since it is fully implemented in the Java programming language and thus runs on almost any modern computing platform
136
Weka (machine learning) - Description
* a comprehensive collection of data preprocessing and modeling techniques
137
Weka (machine learning) - Description
* ease of use due to its graphical user interfaces
138
Weka (machine learning) - Description
Weka supports several standard data mining tasks, more specifically, data preprocessing, data clustering|clustering, Statistical classification|classification, Regression analysis|regression, visualization, and feature selection
139
Weka (machine learning) - Description
Weka's main user interface is the Explorer, but essentially the same functionality can be accessed through the component-based Knowledge Flow interface and from the command line. There is also the Experimenter, which allows the systematic comparison of the predictive performance of Weka's machine learning algorithms on a collection of datasets.
140
Weka (machine learning) - Description
The Explorer interface features several panels providing access to the main components of the workbench:
141
Weka (machine learning) - Description
* The Preprocess panel has facilities for importing data from a database, a Comma-separated values|CSV file, etc., and for preprocessing this data using a so-called filtering algorithm. These filters can be used to transform the data (e.g., turning numeric attributes into discrete ones) and make it possible to delete instances and attributes according to specific criteria.
142
Weka (machine learning) - Description
* The Classify panel enables the user to apply Statistical classification|classification and Regression analysis|regression algorithms (indiscriminately called classifiers in Weka) to the resulting dataset, to estimate the accuracy of the resulting Predictive modeling|predictive model, and to visualize erroneous predictions, Receiver operating characteristic|ROC curves, etc., or the model itself (if the model is amenable to visualization like, e.g., a decision tree).
143
Weka (machine learning) - Description
* The Associate panel provides access to Association rule learning|association rule learners that attempt to identify all important interrelationships between attributes in the data.
144
Weka (machine learning) - Description
* The Cluster panel gives access to the cluster analysis|clustering techniques in Weka, e.g., the simple k-means algorithm. There is also an implementation of the Expectation-maximization algorithm|expectation maximization algorithm for learning a mixture of normal distributions.
145
Weka (machine learning) - Description
* The Select attributes panel provides algorithms for identifying the most predictive attributes in a dataset.
146
Weka (machine learning) - Description
* The Visualize panel shows a scatter plot matrix, where individual scatter plots can be selected and enlarged, and analyzed further using various selection operators.
147
Weka (machine learning) - History
* In 1993, the University of Waikato in New Zealand started development of the original version of Weka (which became a mixture of TCL/TK, C, and Makefiles).
148
Weka (machine learning) - History
* In 1997, the decision was made to redevelop Weka from scratch in Java, including implementations of modeling algorithms.
149
Weka (machine learning) - History
* In 2006, Pentaho|Pentaho Corporation acquired an exclusive licence to use Weka for business intelligence. It forms the data mining and predictive analytics component of the Pentaho business intelligence suite.
150
Weka (machine learning) - History
* [ All-time ranking] on Sourceforge.net as of , 243 (with 2,487,213 downloads)
151
Machine Learning (journal)
'Machine Learning' is a peer-review|peer-reviewed scientific journal, published since 1986.
152
Machine Learning (journal)
In 2001, forty editors and members of the editorial board of Machine Learning resigned in order to found the Journal of Machine Learning Research (JMLR), saying that in the era of the internet, it was detrimental for researchers to continue publishing their papers in expensive journals with pay-access archives. Instead, they wrote, they supported the model of JMLR, in which authors retained copyright over their papers and archives were freely available on the internet.
153
Journal of Machine Learning Research
The 'Journal of Machine Learning Research' (usually abbreviated 'JMLR'), is a scientific journal focusing on machine learning, a subfield of Artificial Intelligence. It was founded in 2000.
154
Journal of Machine Learning Research
In 2001, forty editors of Machine Learning resigned in order to support JMLR, saying that in the era of the internet, it was detrimental for researchers to continue publishing their papers in expensive journals with pay-access archives
155
Journal of Machine Learning Research
Print editions of JMLR were published by MIT Press until 2004, and by Microtome Publishing thereafter.
156
Journal of Machine Learning Research
Since Summer 2007 JMLR is also publishing [ Machine Learning Open Source Software ].
157
Boosting (machine learning)
Boosting is based on the question posed by Kearns:Michael Kearns (1988); [ Thoughts on Hypothesis Boosting], Unpublished manuscript (Machine Learning class project, December 1988) Can a set of 'weak learners' create a single 'strong learner'? A weak learner is defined to be a classifier which is only slightly correlated with the true classification (it can label examples better than random guessing)
158
Boosting (machine learning)
Schapire's affirmative answer to Kearns' question has had significant ramifications in machine learning and statistics, most notably leading to the development of boosting.
159
Boosting (machine learning)
When first introduced, the hypothesis boosting problem simply referred to the process of turning a weak learner into a strong learner
160
Boosting (machine learning) - Boosting algorithms
While boosting is not algorithmically constrained, most boosting algorithms consist of iteratively learning weak classifiers with respect to a distribution and adding them to a final strong classifier
161
Boosting (machine learning) - Boosting algorithms
There are many boosting algorithms. The original ones, proposed by Robert Schapire (a recursive majority gate formulation) and Yoav Freund (boost by majorityLlew Mason, Jonathan Baxter, Peter Bartlett, and Marcus Frean (2000); Boosting Algorithms as Gradient Descent, in S
162
Boosting (machine learning) - Examples of boosting algorithms
The main variation between many boosting algorithms is their method of weighting training data points and hypotheses
163
Boosting (machine learning) - Criticism
In 2008 Phillip Long (at Google) and Rocco A. Servedio (Columbia University) published [ a paper] at the 25th International Conference for Machine Learning suggesting that many of these algorithms are probably flawed. They conclude that convex potential boosters cannot withstand random classification
164
Boosting (machine learning) - Criticism
Servedio (2010); Random Classification Noise Defeats All Convex Potential Boosters, Machine Learning 78(3), pp
165
Transduction (machine learning)
In logic, statistical inference, and supervised learning,
166
Transduction (machine learning)
'transduction' or 'transductive inference' is reasoning from
167
Transduction (machine learning)
induction (philosophy)|induction is reasoning from observed training cases
168
Transduction (machine learning)
to general rules, which are then applied to the test cases. The distinction is
169
Transduction (machine learning)
most interesting in cases where the predictions of the transductive model are
170
Transduction (machine learning)
not achievable by any inductive model. Note that this is caused by transductive
171
Transduction (machine learning)
inference on different test sets producing mutually inconsistent predictions.
172
Transduction (machine learning)
Transduction was introduced by Vladimir Vapnik in the 1990s, motivated by
173
Transduction (machine learning)
his view that transduction is preferable to induction since, according to him, induction requires
174
Transduction (machine learning)
solving a more general problem (inferring a function) before solving a more
175
Transduction (machine learning)
specific problem (computing outputs for new cases): When solving a problem of
176
Transduction (machine learning)
An example of learning which is not inductive would be in the case of binary
177
Transduction (machine learning)
classification, where the inputs tend to cluster in two groups. A large set of
178
Transduction (machine learning)
test inputs may help in finding the clusters, thus providing useful information
179
Transduction (machine learning)
about the classification labels. The same predictions would not be obtainable
180
Transduction (machine learning)
from a model which induces a function based only on the training cases. Some
181
Transduction (machine learning)
people may call this an example of the closely related semi-supervised learning, since Vapnik's motivation is quite different. An example of an algorithm in this category is the Transductive Support Vector Machine (TSVM).
182
Transduction (machine learning)
A third possible motivation which leads to transduction arises through the need
183
Transduction (machine learning)
to approximate. If exact inference is computationally prohibitive, one may at
184
Transduction (machine learning)
least try to make sure that the approximations are good at the test inputs. In
185
Transduction (machine learning)
this case, the test inputs could come from an arbitrary distribution (not
186
Transduction (machine learning)
necessarily related to the distribution of the training inputs), which wouldn't
187
Transduction (machine learning)
be allowed in semi-supervised learning. An example of an algorithm falling in
188
Transduction (machine learning) - Example Problem
The following example problem contrasts some of the unique properties of transduction against induction.
189
Transduction (machine learning) - Example Problem
A collection of points is given, such that some of the points are labeled (A, B, or C), but most of the points are unlabeled (?). The goal is to predict appropriate labels for all of the unlabeled points.
190
Transduction (machine learning) - Example Problem
The inductive approach to solving this problem is to use the labeled points to train a supervised learning algorithm, and then have it predict labels for all of the unlabeled points
191
Transduction (machine learning) - Example Problem
Transduction has the advantage of being able to consider all of the points, not just the labeled points, while performing the labeling task. In this case, transductive algorithms would label the unlabeled points according to the clusters to which they naturally belong. The points in the middle, therefore, would most likely be labeled B, because they are packed very close to that cluster.
192
Transduction (machine learning) - Example Problem
An advantage of transduction is that it may be able to make better predictions with fewer labeled points, because it uses the natural breaks found in the unlabeled points
193
Transduction (machine learning) - Transduction Algorithms
Transduction algorithms can be broadly divided into two categories: those that seek to assign discrete labels to unlabeled points, and those that seek to regress continuous labels for unlabeled points
194
Transduction (machine learning) - Partitioning Transduction
Partitioning transduction can be thought of as top-down transduction. It is a semi-supervised extension of partition-based clustering. It is typically performed as follows:
195
Transduction (machine learning) - Partitioning Transduction
Of course, any reasonable partitioning technique could be used with this algorithm. Max flow min cut partitioning schemes are very popular for this purpose.
196
Transduction (machine learning) - Agglomerative Transduction
Agglomerative transduction can be thought of as bottom-up transduction. It is a semi-supervised extension of agglomerative clustering. It is typically performed as follows:
197
Transduction (machine learning) - Agglomerative Transduction
Compute the pair-wise distances, D, between all the points.
198
Transduction (machine learning) - Agglomerative Transduction
Consider each point to be a cluster of size 1.
199
Transduction (machine learning) - Agglomerative Transduction
If (a is unlabeled) or (b is unlabeled) or (a and b have the same label)
200
Transduction (machine learning) - Agglomerative Transduction
Merge the two clusters that contain a and b.
201
Transduction (machine learning) - Agglomerative Transduction
Label all points in the merged cluster with the same label.
202
Transduction (machine learning) - Manifold Transduction
Manifold-learning-based transduction is still a very young field of research.
203
BodyMedia - Wearable device and machine learning expertise
The BodyMedia informatics group made available a large anonymised human physiology data set for the 2004 International Conference on Machine Learning, running a Machine Learning Challenge
204
Learning curve - In machine learning
The machine learning curve is useful for many purposes including comparing different algorithms, choosing model parameters during design, adjusting optimization to improve convergence, and determining the amount of data used for training.
205
Protein structure prediction - Machine learning
Artificial neural network|Neural network methods use training sets of solved structures to identify common sequence motifs associated with particular arrangements of secondary structures
206
Protein structure prediction - Machine learning
Support vector machines have proven particularly useful for predicting the locations of turn (biochemistry)|turns, which are difficult to identify with statistical methods. The requirement of relatively small training sets has also been cited as an advantage to avoid overfitting to existing structural data.
207
Protein structure prediction - Machine learning
Extensions of machine learning techniques attempt to predict more fine-grained local properties of proteins, such as protein backbone|backbone dihedral angles in unassigned regions. Both SVMs and neural networks have been applied to this problem. More recently, real-value torsion angles can be accurately predicted by SPINE-X and successfully employed for ab initio structure prediction.
208
Predictive Analysis - Machine learning techniques
For such cases, machine learning techniques emulate human cognition and learn from training examples to predict future events.
209
Artificial intelligence marketing - Machine Learning
Machine learning is concerned with the design and development of algorithms and techniques that allow computers to learn.
210
Artificial intelligence marketing - Machine Learning
As defined above machine learning is one of the techniques that can be employed to enable more effective 'behavioral targeting'
211
Bootstrap - Artificial intelligence and machine learning
Bootstrapping is a technique used to iteratively improve a classifier (machine learning)|classifier's performance. Seed AI is a hypothesized type of artificial intelligence capable of recursive self-improvement. Having improved itself, it would become better at improving itself, potentially leading to an exponential increase in intelligence. No such AI is known to exist, but it remains an active field of research.
212
Academic studies about Wikipedia - Machine learning
Automated Semantic data model|semantic knowledge extraction using machine learning algorithms is used to extract machine-processable information at a relatively low complexity cost. DBpedia uses structured content extracted from infoboxes by machine learning algorithms to create a resource of linked data in a Semantic Web.
213
Concept learning - Machine learning approaches to concept learning
In machine learning, algorithms of exemplar theory are also known as instance learners or lazy learners.
214
Concept learning - Machine learning approaches to concept learning
#Data Mining: using historical data to improve decisions. An example is looking at medical records and then applying one's medical knowledge to make a diagnosis.
215
Concept learning - Machine learning approaches to concept learning
#Software applications that cannot be programmed by hand: examples are autonomous driving and speech recognition
216
Concept learning - Machine learning approaches to concept learning
#Self-customizing programs: an example is a newsreader that learns a reader's particular interests and highlights them when the reader visits the site.
217
Concept learning - Machine learning approaches to concept learning
Machine learning has an exciting future. Some potential advantages include: learning across full mixed-media data, learning across multiple internal databases (including the Internet and news feeds), learning by active experimentation, learning decisions rather than predictions, and the possibility of programming languages with embedded learning.
218
List of algorithms - Machine learning and statistical classification
* Association rule learning: discover interesting relations between variables, used in data mining
219
List of algorithms - Machine learning and statistical classification
** Association rule learning#Eclat algorithm|Eclat algorithm
220
List of algorithms - Machine learning and statistical classification
** Association rule learning#FP-growth algorithm|FP-growth algorithm
221
List of algorithms - Machine learning and statistical classification
** One-attribute rule
222
List of algorithms - Machine learning and statistical classification
** Association rule learning#Zero-attribute rule|Zero-attribute rule
223
List of algorithms - Machine learning and statistical classification
* Boosting (meta-algorithm): Use many weak learners to boost effectiveness
224
List of algorithms - Machine learning and statistical classification
** BrownBoost:a boosting algorithm that may be robust to noisy datasets
225
List of algorithms - Machine learning and statistical classification
* Bootstrap aggregating (bagging): technique to improve stability and classification accuracy
226
List of algorithms - Machine learning and statistical classification
** ID3 algorithm (Iterative Dichotomiser 3): Use heuristic to generate small decision trees
227
List of algorithms - Machine learning and statistical classification
* k-nearest neighbors (k-NN): a method for classifying objects based on closest training examples in the feature space
228
List of algorithms - Machine learning and statistical classification
* Linde–Buzo–Gray algorithm: a vector quantization algorithm used to derive a good codebook
229
List of algorithms - Machine learning and statistical classification
* Locality-sensitive hashing (LSH): a method of performing probabilistic dimension reduction of high-dimensional data
230
List of algorithms - Machine learning and statistical classification
** Backpropagation: A supervised learning method which requires a teacher that knows, or can calculate, the desired output for any given input
231
List of algorithms - Machine learning and statistical classification
** Hopfield net: a Recurrent neural network in which all connections are symmetric
232
List of algorithms - Machine learning and statistical classification
** Perceptron: the simplest kind of feedforward neural network: a linear classifier.
233
List of algorithms - Machine learning and statistical classification
** Pulse-coupled neural networks (PCNN): Neural network|neural models proposed by modeling a cat's visual cortex and developed for high-performance Bionics|biomimetic image processing.
234
List of algorithms - Machine learning and statistical classification
** Radial basis function network: an artificial neural network that uses radial basis functions as activation functions
235
List of algorithms - Machine learning and statistical classification
** Self-organizing map: an unsupervised network that produces a low-dimensional representation of the input space of the training samples
236
List of algorithms - Machine learning and statistical classification
* Random forest: classify using many decision trees
237
List of algorithms - Machine learning and statistical classification
** Q-learning: learn an action-value function that gives the expected utility of taking a given action in a given state and following a fixed policy thereafter
238
List of algorithms - Machine learning and statistical classification
* Relevance Vector Machine (RVM): similar to SVM, but provides probabilistic classification
239
List of algorithms - Machine learning and statistical classification
* Support Vector Machines (SVM): a set of methods which divide multidimensional data by finding a dividing hyperplane with the maximum margin between the two sets
240
List of algorithms - Machine learning and statistical classification
** Structured SVM: allows training of a classifier for general structured output labels.
241
List of algorithms - Machine learning and statistical classification
* Winnow algorithm: related to the perceptron, but uses a multiplicative weight-update scheme
242
Torch (machine learning)
'Torch' is an open source deep learning library for the Lua (programming language)|Lua programming language
243
Torch (machine learning)
and a scientific computing framework with wide support for machine learning algorithms. It uses a fast scripting language LuaJIT, and an underlying C (programming language)|C implementation.
244
Torch (machine learning) - torch
The core package of Torch is [ torch]
245
Torch (machine learning) - torch
The following exemplifies using torch via its REPL interpreter:
246
Torch (machine learning) - torch
It also has StochasticGradient class for training a neural network using Stochastic gradient descent, although the Optim package provides much more options in this respect, like momentum and weight decay Regularization (mathematics)|regularization.
247
Torch (machine learning) - Other packages
Many packages other than the above official packages are used with Torch. These are listed in the [ torch cheatsheet]. These extra packages provide a wide range of utilities such as parallelism, asynchronous input/output, image processing, and so on.
248
Torch (machine learning) - Applications
Torch is used by DeepMind Technologies|Google DeepMind,[ What is going on with DeepMind and Google?]
249
Torch (machine learning) - Applications
the Facebook AI Research Group,[ KDnuggets Interview with Yann LeCun, Deep Learning Expert, Director of Facebook AI Lab] the Computational Intelligence, Learning, Vision, and Robotics Lab at NYU,[ CILVR Lab Software] MADBITS,[ Machine Learning with Torch7] IBM,[ Hacker News] Yandex[ Yann Lecun's FaceBook Page] and the Idiap Research Institute.[ IDIAP Research Institute : Torch] It is used and cited in 240 research papers.[ Google Scholar results for Torch: a modular machine learning software library citations] For comparison, Theano (software)|Theano, a similar library written in Python (programming language), C and CUDA, has 138 citations.[ Theano: a CPU and GPU math expression compiler] Torch has been extended for use on Android (operating system)|Android[ Torch-android GitHub repository] and iOS.[ Torch-ios GitHub repository] It has been used to build hardware implementations for data flows like those found in neural networks.[ NeuFlow: A Runtime Reconfigurable Dataflow Processor for Vision]
250
Overfitting - Machine learning
The concept of overfitting is important in machine learning
251
Overfitting - Machine learning
As a simple example, consider a database of retail purchases that includes the item bought, the purchaser, and the date and time of purchase. It's easy to construct a model that will fit the training set perfectly by using the date and time of purchase to predict the other attributes; but this model will not generalize at all to new data, because those past times will never occur again.
252
Overfitting - Machine learning
Generally, a learning algorithm is said to overfit relative to a simpler one if it is more accurate in fitting known data (hindsight) but less accurate in predicting new data (foresight)
253
Training set - Use in artificial intelligence, machine learning, and statistics
In artificial intelligence or machine learning, a training set consists of an input Array data structure|vector and an answer vector, and is used together with a supervised learning method to train a knowledge database (e.g. a neural net or a naive bayes classifier) used by an AI machine.
254
Training set - Use in artificial intelligence, machine learning, and statistics
In statistics|statistical modeling, a training set is used to fit a model that can be used to predict a response value from one or more predictors. The fitting can include both feature selection|variable selection and parameter estimation theory|estimation. Statistical models used for prediction are often called regression analysis|regression models, of which linear regression and logistic regression are two examples.
255
Training set - Use in artificial intelligence, machine learning, and statistics
In these fields, a major emphasis is placed on avoiding overfitting, so as to achieve the best possible performance on an independent test set that follows the same probability distribution as the training set.
256
Tanagra (machine learning)
'Tanagra' is a free suite of machine learning software for research and academic purposes
257
Tanagra (machine learning)
developed by Ricco Rakotomalala at the Lumière University Lyon 2, France.
258
Tanagra (machine learning)
Tanagra supports several standard data mining tasks such as: Visualization, Descriptive statistics, Instance selection, feature selection, feature construction, regression analysis|regression, factor analysis, data clustering|clustering, statistical classification|classification and association rule learning.
259
Tanagra (machine learning)
Tanagra is an academic project
260
Tanagra (machine learning) - History
The development of Tanagra was started in June The first version is distributed in December Tanagra is the successor of Sipina, another free data mining tool which is intended only for the supervised learning tasks (classification), especially an interactive and visual construction of decision trees. Sipina is still available online and is maintained.
261
Tanagra (machine learning) - History
Tanagra is an open source project as every researcher can access to the source code, and add his own algorithms, as far as he agrees and conforms to the software distribution license.
262
Tanagra (machine learning) - History
The main purpose of Tanagra project is to give researchers and students a user-friendly data mining software, conforming to the present norms of the software development in this domain (especially in the design of its GUI and the way to use it), and allowing to analyze either real or synthetic data.
263
Tanagra (machine learning) - History
From 2006, Ricco Rakotomalala made an important documentation effort. A large number of tutorials are published on a dedicated website. They describe the statistical and machine learning methods and their implementation with Tanagra on real case studies. The use of the other free data mining tools on the same problems is also widely described. The comparison of the tools enables to the readers to understand the possible differences in the presenting of results.
264
Tanagra (machine learning) - Description
Each node is a statistical or machine learning technique, the connection between two nodes represents the data transfer
265
Tanagra (machine learning) - Description
Tanagra makes a good compromise between the statistical approaches (e.g. parametric and nonparametric statistical tests), the multivariate analysis methods (e.g. factor analysis, correspondence analysis, cluster analysis, regression) and the machine learning techniques (e.g. neural network, support vector machine, decision trees, random forest).
266
Music Information Retrieval - Statistics and Machine Learning
*Computational methods for classification, clustering, and modelling — musical feature extraction for mono- and polyphonic music, similarity and pattern matching, retrieval
267
Music Information Retrieval - Statistics and Machine Learning
* Formal methods and databases — applications of automated music identification and recognition, such as score following, automatic accompaniment, routing and filtering for music and music queries, query languages, standards and other metadata or protocols for music information handling and information retrieval|retrieval, multi-agent systems, distributed search)
268
Music Information Retrieval - Statistics and Machine Learning
*Software for music information retrieval — Semantic Web and musical digital objects, intelligent agents, collaborative software, web-based search and semantic retrieval, query by humming, acoustic fingerprinting
269
Music Information Retrieval - Statistics and Machine Learning
* Music analysis and knowledge representation — automatic summarization, citing, excerpting, downgrading, transformation, formal models of music, digital scores and representations, music indexing and metadata.
270
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases 'ECML PKDD', the 'European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases', is one of the leading ECML is number 4 on the list. Both ECML and PKDD are ranked on “tier A”. academic conferences on machine learning and knowledge discovery, held in Europe every year.
271
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases - History ECML PKDD is a merger of two European conferences, 'European Conference on Machine Learning' ('ECML') and 'European Conference on Principles and Practice of Knowledge Discovery in Databases' ('PKDD'). ECML and PKDD have been co-located since 2001; however, both ECML and PKDD retained their own identity until For example, the 2007 conference was known as “the 18th European Conference on Machine Learning (ECML) and the 11th European Conference
272
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases - History The history of ECML dates back to 1986, when the European Working Session on Learning was first held. In 1993 the name of the conference was changed to European Conference on Machine Learning.
273
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases - History PKDD was first organised in Originally PKDD stood for the European Symposium on Principles of Data Mining and Knowledge Discovery from Databases.. The name European Conference on Principles and Practice of Knowledge Discovery in Databases was used since
274
Feature (machine learning)
In machine learning and pattern recognition, a 'feature' is an individual measurable heuristic property of a phenomenon being observed. Choosing discriminating and independent features is key to any pattern recognition algorithm being successful in classification (machine learning)|classification. Features are usually numeric, but structural features such as string (computer science)|strings and graph (mathematics)|graphs are used in syntactic pattern recognition.
275
Feature (machine learning)
The set of features of a given data instance is often grouped into a feature vector. The reason for doing this is that the vector can be treated mathematically. For example, many algorithms compute a score for classifying an instance into a particular category by linearly combining a feature vector with a vector of weights, using a linear predictor function.
276
Feature (machine learning)
The concept of feature is essentially the same as the concept of explanatory variable used in statistics|statistical techniques such as linear regression.
277
Feature (machine learning) - Classification
While different areas of pattern recognition obviously have different features, once the features are decided, they are classified by a much smaller set of algorithms. These include k-nearest neighbor algorithm|nearest neighbor classification in multiple dimensions, neural networks or statistical classification|statistical techniques such as Bayesian inference|Bayesian approaches.
278
Feature (machine learning) - Examples
In character recognition, features may include horizontal and vertical profiles, number of internal holes, stroke detection and many others.
279
Feature (machine learning) - Examples
In speech recognition, features for recognizing phonemes can include noise ratios, length of sounds, relative power, filter matches and many others.
280
Feature (machine learning) - Examples
In spam (electronic)|spam detection algorithms, features may include whether certain headers are present or absent, whether they are well formed, what language the appears to be, the grammatical correctness of the text, Markovian frequency analysis and many others.
281
Feature (machine learning) - Examples
In all these cases, and many others, feature extraction|extracting features that are measurable by a computer is an art, and with the exception of some neural networking and genetic techniques that automatically intuit features, hand selection of good features forms the basis of almost all classification algorithms.
282
Regularization (mathematics) - Regularization in statistics and machine learning
The most common variants in machine learning are and regularization, which can be added to learning algorithms that minimize a loss function by instead minimizing , where is the model's weight vector, ‖·‖ is either the norm or the squared norm, and α is a free parameter that needs to be tuned empirically (typically by Cross-validation (statistics)|cross-validation; see hyperparameter optimization)
283
Regularization (mathematics) - Regularization in statistics and machine learning
regularization is often preferred because it produces sparse models and thus performs feature selection within the learning algorithm, but since the norm is not differentiable, it may require changes to learning algorithms, in particular gradient-based learners.
284
Regularization (mathematics) - Regularization in statistics and machine learning
Bayesian model comparison|Bayesian learning methods make use of a prior probability that (usually) gives lower probability to more complex models. Well-known model selection techniques include the Akaike information criterion (AIC), minimum description length (MDL), and the Bayesian information criterion (BIC). Alternative methods of controlling overfitting not involving regularization include cross-validation (statistics)|cross-validation.
285
Regularization (mathematics) - Regularization in statistics and machine learning
Regularization can be used to fine tune model complexity using an augmented error function with cross-validation
286
Regularization (mathematics) - Regularization in statistics and machine learning
Examples of applications of different methods of regularization to the linear model are:
287
Regularization (mathematics) - Regularization in statistics and machine learning
A linear combination of the LASSO and ridge regression methods is elastic net regularization.
288
Classification in machine learning
In machine learning and statistics, 'classification' is the problem of identifying to which of a set of categorical data|categories (sub-populations) a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known
289
Classification in machine learning
In the terminology of machine learning, classification is considered an instance of supervised learning, i.e. learning where a training set of correctly identified observations is available. The corresponding unsupervised learning|unsupervised procedure is known as cluster analysis|clustering, and involves grouping data into categories based on some measure of inherent similarity or distance.
290
Classification in machine learning
Often, the individual observations are analyzed into a set of quantifiable properties, known variously explanatory variables, features, etc
291
Classification in machine learning
An algorithm that implements classification, especially in a concrete implementation, is known as a 'Pattern recognition|classifier'. The term classifier sometimes also refers to the mathematical function (mathematics)|function, implemented by a classification algorithm, that maps input data to a category.
292
Classification in machine learning
In machine learning, the observations are often known as instances, the explanatory variables are termed features (grouped into a feature vector), and the possible categories to be predicted are classes
293
Classification in machine learning - Relation to other problems
Classification and clustering are examples of the more general problem of pattern recognition, which is the assignment of some sort of output value to a given input value
294
Classification in machine learning - Relation to other problems
A common subclass of classification is probabilistic classification
295
Classification in machine learning - Relation to other problems
*It can output a confidence value associated with its choice (in general, a classifier that can do this is known as a confidence-weighted classifier).
296
Classification in machine learning - Relation to other problems
*Correspondingly, it can abstain when its confidence of choosing any particular output is too low.
297
Classification in machine learning - Relation to other problems
*Because of the probabilities which are generated, probabilistic classifiers can be more effectively incorporated into larger machine-learning tasks, in a way that partially or completely avoids the problem of error propagation.
298
Classification in machine learning - Frequentist procedures
Early work on statistical classification was undertaken by Fisher,R
299
Classification in machine learning - Bayesian procedures
Unlike frequentist procedures, Bayesian classification procedures provide a natural way of taking into account any available information about the relative sizes of the sub-populations associated with the different groups within the overall population.Binder, D.A
300
Classification in machine learning - Bayesian procedures
Some Bayesian procedures involve the calculation of class membership probabilities|group membership probabilities: these can be viewed as providing a more informative outcome of a data analysis than a simple attribution of a single group-label to each new observation.
301
Classification in machine learning - Binary and multiclass classification
Classification can be thought of as two separate problems – binary classification and multiclass classification
302
Classification in machine learning - Linear classifiers
A large number of algorithms for classification can be phrased in terms of a linear function that assigns a score to each possible category k by linear combination|combining the feature vector of an instance with a vector of weights, using a dot product. The predicted category is the one with the highest score. This type of score function is known as a linear predictor function and has the following general form:
303
Classification in machine learning - Linear classifiers
where 'X'i is the feature vector for instance i, 'beta;'k is the vector of weights corresponding to category k, and score('X'i, k) is the score associated with assigning instance i to category k. In discrete choice theory, where instances represent people and categories represent choices, the score is considered the utility associated with person i choosing category k.
304
Classification in machine learning - Linear classifiers
Algorithms with this basic setup are known as linear classifiers. What distinguishes them is the procedure for determining (training) the optimal weights/coefficients and the way that the score is interpreted.
305
Classification in machine learning - Linear classifiers
Examples of such algorithms are
306
Classification in machine learning - Linear classifiers
*Logistic regression and multinomial logit
307
Classification in machine learning - Algorithms
Examples of classification algorithms include:
308
Classification in machine learning - Algorithms
**Least squares support vector machines
309
Classification in machine learning - Algorithms
* Variable kernel density estimation#Use for statistical classification|Kernel estimation
310
Classification in machine learning - Evaluation
Classifier performance depends greatly on the characteristics of the data to be classified
311
Classification in machine learning - Evaluation
The measures precision and recall are popular metrics used to evaluate the quality of a classification system. More recently, receiver operating characteristic (ROC) curves have been used to evaluate the tradeoff between true- and false-positive rates of classification algorithms.
312
Classification in machine learning - Evaluation
As a performance metric, the uncertainty coefficient has the advantage over simple accuracy in that it is not affected by the relative sizes of the different classes.
313
Classification in machine learning - Evaluation
Further, it will not penalize an algorithm for simply rearranging the classes.
314
Classification in machine learning - Application domains
Classification has many applications. In some of these it is employed as a data mining procedure, while in others more detailed statistical modeling is undertaken.
315
Classification in machine learning - Application domains
* Drug discovery and Drug development|development
316
Classification in machine learning - Application domains
** Quantitative structure-activity relationship
317
Classification in machine learning - Application domains
* Statistical natural language processing
318
Classification in machine learning - Application domains
* Document classification
319
Cognitive bias mitigation - Machine learning
Machine learning, a branch of artificial intelligence, has been used to investigate human learning and decision making.Sutton, R. S., Barto, A. G. (1998). MIT CogNet Ebook Collection; MITCogNet 1998, Adaptive Computation and Machine Learning, ISBN
320
Cognitive bias mitigation - Machine learning
One technique particularly applicable to Cognitive Bias Mitigation is neural network|neural network learning and choice selection, an approach inspired by the imagined structure and function of actual neural networks in the human brain
321
Cognitive bias mitigation - Machine learning
In principle, such models are capable of modeling decision making that takes account of human needs and motivations within social contexts, and suggest their consideration in a theory and practice of Cognitive Bias Mitigation
322
ConceptNet - Machine learning tools
The information in ConceptNet can be used as a basis for machine learning algorithms. One representation, called AnalogySpace, uses singular value decomposition to generalize and represent patterns in the knowledge in
323
ConceptNet - Machine learning tools
ConceptNet, in a way that can be used in AI applications. Its creators distribute a Python machine learning toolkit called Divisi for performing machine learning based on text corpora, structured knowledge bases such as ConceptNet, and combinations of the two.
324
Learning algorithms - Machine learning and data mining
* Machine learning focuses on prediction, based on known properties learned from the training data.
325
Learning algorithms - Machine learning and data mining
* Data mining focuses on the discovery (observation)|discovery of (previously) unknown properties in the data. This is the analysis step of Knowledge discovery|Knowledge Discovery in Databases.
326
Learning algorithms - Machine learning and data mining
Much of the confusion between these two research communities (which do often have separate conferences and separate journals, ECML PKDD being a major exception) comes from the basic assumptions they work with: in machine learning, performance is usually evaluated with respect to the ability to reproduce known knowledge, while in Knowledge Discovery and Data Mining (KDD) the key task is the discovery of previously unknown knowledge
327
Classification (machine learning) - Feature vectors
Most algorithms describe an individual instance whose category is to be predicted using a feature vector of individual, measurable properties of the instance
328
Classification (machine learning) - Feature vectors
The vector space associated with these vectors is often called the feature space. In order to reduce the dimensionality of the feature space, a number of dimensionality reduction techniques can be employed.
329
Ground truth - Statistics and Machine Learning
In machine learning, the term ground truth refers to the accuracy of the training set's classification for supervised learning techniques. This is used in statistical models to prove or disprove research hypothesis|hypotheses. The term ground truthing refers to the process of gathering the proper objective data for this test. Compare with gold standard (test).
330
Ground truth - Statistics and Machine Learning
Bayesian spam filtering is a common example of supervised learning. In this system, the algorithm is manually taught the differences between spam and non-spam. This depends on the ground truth of the messages used to train the algorithm; inaccuracies in that ground truth will correlate to inaccuracies in the resulting spam/non-spam verdicts.
331
For More Information, Visit:
The Art of Service
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.