University of Missouri, Department of Computer Science University of Missouri, Informatics Institute Sean Lander, Master’s Candidate An Evolutionary Method.

University of Missouri, Department of Computer Science University of Missouri, Informatics Institute Sean Lander, Master’s Candidate An Evolutionary Method for Training Autoencoders for Deep Learning Networks MASTER’S THESIS DEFENSE SEAN LANDER ADVISOR: YI SHANG

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Agenda o Overview o Background and Related Work o Methods o Performance and Testing o Results o Conclusion and Future Work

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Overview Deep Learning classification/reconstruction o Since 2006, Deep Learning Networks (DLNs) have changed the landscape of classification problems o Strong ability to create and utilize abstract features o Easily lends itself to GPU and distributed systems o Does not require labeled data – VERY IMPORTANT o Can be used for feature reduction and classification

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Overview Problem and proposed solution o Problems with DLNs: o Costly to train with large data sets or high feature spaces o Local minima systemic with Artificial Neural Networks o Hyper-parameters must be hand selected o Proposed Solutions: o Evolutionary based approach with local search phase o Increased chance of global minimum o Optimizes structure based on abstracted features o Data partitions based on population size (large data only) o Reduced training time o Reduced chances of overfitting

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Background Perceptrons o Started with Perceptron in 1950 o Only capable of linear separability o Failed on XOR

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Background Artificial Neural Networks (ANNs) o ANNs went out of favor until the Multilayer Perceptron (MLP) introduced o Pro: Non-linear classification o Con: Time consuming o Advance in training: Backpropagation o Increased training speeds o Limited to shallow networks o Error propagation diminishes a number of layers increase

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Background Backpropagation using Gradient Descent

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Background Deep Learning Networks (DLNs) o Allows for deep networks with multiple layers o Layers pre-trained using unlabeled data o Layers are “stacked” and fine tuned o Minimizes error degradation for deep neural networks (many layers) o Still costly to train o Manual selection of hyper-parameters o Local, not global, minimum

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Background Autoencoders for reconstruction o Autoencoders can be used for feature reduction and clustering o “Classification error” is the ability to reconstruct the sample input o Abstracted features – output from the hidden layer – can be used to replace raw input for other techniques

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Related Work Evolutionary and genetic ANNs o First use of Genetic Algorithms (GAs) in 1989 o Two layer ANN on a small data set o Tested multiple types of chromosomal encodings and mutation types o Late 1990s and early 2000s introduced other techniques o Multi-level mutations and mutation priority o Addition of local search in each generation o Inclusion of hyper-parameters as part of the mutation o Issue of competing conventions starts to appear o Two ANNs produce the same results by sharing the same nodes but in a permuted order

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Related Work Hyper-parameter selection for DLNs o Majority of the work explored using newer technologies and methods such as GPU and distributed (MapReduce) training o Improved versions of Backpropagation, such as Conjugated Gradient or Limited Memory BFGS were tested under different conditions o Most conclusions pointed toward manual parameter selection via trial-and-error

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Method 1 Evolutionary Autoencoder (EvoAE) o IDEA: Autoencoders’ power are in their feature abstraction, the hidden node output o Training many AEs will make more potential abstracted features o Best AEs will contain the best features o Joining these features should create a better AE

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Method 1 Evolutionary Autoencoder (EvoAE) o A population of autoencoders (AEs) is initialized with a semi-random number of hidden nodes o Each AE is trained for a small number of epochs using Backpropagation o The AEs are ranked based on reconstruction error o The top AEs are selected for crossover o New AEs are mutated based on mutation rate, with an even chance at gaining or losing a node o New nodes are selected randomly from the population, not randomly initialized o Continue until convergence criteria is met

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Method 1 Evolutionary Autoencoder (EvoAE) xx’ A3 A4 A1 A2 hx B3 C2 B1 B2 h Initialization Local Search x’ Crossover Mutation

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Method 1A Distributed learning and Mini-batches o Training of generic EvoAE increases in time linearly to the size of the population o ANN training time increases drastically with data size o To combat this, mini-batches can be used where each AE is trained against a batch and updated o Batch size << total data

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Method 1A Distributed learning and Mini-batches o EvoAE lends itself to distributed system o Data duplication and storage now an issue due to data duplication Train Forward propagation Backpropagation Rank Calculate error Sort GA Crossover Mutate Batch 1 Batch 2 … Batch N

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Method 2 EvoAE Evo-batches o IDEA: When data is large, small batches can be representative o Prevents overfitting as nodes being trained are almost always introduced to new data o Scales well with large amounts of data even when parallel training is not possible o Works well on limited memory systems by increasing size of the population, thus reducing data per batch o Quick training of large populations, equivalent to training a single autoencoder using traditional methods

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Method 2 EvoAE Evo-batches Data AData BData CData D Data C Data B Data A Original Data Local Search CrossoverMutate

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Performance and Testing Hardware and testing parameters o Lenovo Y500 laptop o Intel i7 3 rd generation 2.4GHz o 12 GB RAM o All weights randomly initialized to N(0,0.5) ParameterWineIrisHeart DiseaseMNIST Hidden Size32 12200 Hidden Std DevNULL 80 Hidden +/-16 6NULL Mutation Rate0.1 ParameterDefaults Learning Rate0.1 Momentum2 Weight Decay0.003 Population Size30 Generations50 Epochs/Gen20 Train/Validate80/20

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Performance and Testing Baseline Learning rate Learning rate * 0.1 o Baseline is a single AE with 30 random initializations o Two learning rates to create two baseline measurements o Base learning rate o Learning rate * 0.1

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Performance and Testing Data partitioning o Three data partitioning methods were used o Full data o Mini-batch o Evo-batch Full data Mini-batch Evo-batch

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Performance and Testing Post-training configurations o Post-training run in the following ways o Full data (All) o Batch data (Batch) o None Full data Batch data None All sets below are using the Evo-batch configuration

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Results Parameters Review ParameterWineMNIST Hidden Size32200 Hidden Std DevNULL80 Hidden +/-16NULL Mutation Rate0.1 ParameterDefaults Learning Rate0.1 Momentum2 Weight Decay0.003 Population Size30 Generations50 Epochs/Gen20 Train/Validate80/20

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Results Datasets o UCI wine dataset o 178 samples o 13 features o 3 classes o Reduced MNIST dataset o 6000/1000 and 24k/6k training/testing samples o 784 features o 10 classes (0-9)

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Results Small datasets - UCI Wine ParameterWine Hidden Size32 Hidden Std DevNULL Hidden +/-16 Mutation Rate0.1

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Results Small datasets - UCI Wine o Best error-to-speed: o Baseline 1 o Best overall error: o Full data All o Full data is fast on small scale data o Evo- and mini-batch not good on small scale data ParameterWine Hidden Size32 Hidden Std DevNULL Hidden +/-16 Mutation Rate0.1

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Results Small datasets – MNIST 6k/1k ParameterMNIST Hidden Size200 Hidden Std Dev80 Hidden +/-NULL Mutation Rate0.1

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Results Small datasets – MNIST 6k/1k o Best error-to-time: o Mini-batch None o Best overall error: o Mini-batch Batch o Full data slows exponentially on large scale data o Evo- and mini-batch close to baseline speed ParameterMNIST Hidden Size200 Hidden Std Dev80 Hidden +/-NULL Mutation Rate0.1

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Results Medium datasets – MNIST 24k/6k ParameterMNIST Hidden Size200 Hidden Std Dev80 Hidden +/-NULL Mutation Rate0.1

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Results Medium datasets – MNIST 24k/6k o Best error-to-time: o Evo-batch None o Best overall error: o Evo-batch Batch OR o Mini-batch Batch o Full data too slow to run on dataset o EvoAE w/ population 30 trains as quickly as a single baseline AE when using Evo-batch ParameterMNIST Hidden Size200 Hidden Std Dev80 Hidden +/-NULL Mutation Rate0.1

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Conclusions Good for large problems o Traditional methods are still preferred choice for small problems and toy problems o EvoAE with Evo-batch produces effective and efficient feature reduction given a large volume of data o EvoAE is robust against poorly-chosen hyper-parameters, specifically learning rate

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Future Work o Immediate goals: o Transition to distributed system, MapReduce based or otherwise o Harness GPU technology for increased speeds (~50% in some cases) o Long term goals: o Open the system for use by novices and non-programmers o Make the system easy to use and transparent to the user for both modification and training purposes

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Thank you

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Background Backpropagation with weight decay

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Background Conjugated Gradient Descent

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Background Architecture and hyper-parameters

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Results Small datasets – UCI Iris o The UCI Iris dataset has 150 samples with 4 features and 3 classes o Best error-to-speed: o Baseline 1 o Best overall error: o Full data None ParameterIris Hidden Size32 Hidden Std DevNULL Hidden +/-16 Mutation Rate0.1

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Results Small datasets – UCI Heart Disease o The UCI Heart Disease dataset has 297 samples with 13 features and 5 classes o Best error-to-time: o Baseline 1 o Best overall error: o Full data None ParameterHeart Disease Hidden Size12 Hidden Std DevNULL Hidden +/-6 Mutation Rate0.1

University of Missouri, Department of Computer Science University of Missouri, Informatics Institute Sean Lander, Master’s Candidate An Evolutionary Method.

Similar presentations

Presentation on theme: "University of Missouri, Department of Computer Science University of Missouri, Informatics Institute Sean Lander, Master’s Candidate An Evolutionary Method."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

University of Missouri, Department of Computer Science University of Missouri, Informatics Institute Sean Lander, Master’s Candidate An Evolutionary Method.

Similar presentations

Presentation on theme: "University of Missouri, Department of Computer Science University of Missouri, Informatics Institute Sean Lander, Master’s Candidate An Evolutionary Method."— Presentation transcript:

Similar presentations

About project

Feedback