Document Classification using Deep Belief Nets Lawrence McAfee 6/9/08 CS224n, Sprint ‘08
Overview Corpus: Wikipedia XML Corpus Single-labeled data – each document falls under single category Binary Feature Vectors Bag-of-words ‘1’ indicates word occurred one or more times in document Doc#1 Doc#3 Doc#2 Classifier Doc#1 Food Doc#2 Brazil Doc#3 Presidents
Background on Deep Belief Nets Training Data RBM 1 RBM 2 RBM 3 Higher level features Features/basis vectors for training data Very abstract features RBM Unsupervised, clustering training algorithm
Inside an RBM hidden i j visible Configuration (v,h) Golf Cycling Energy Input/Training data Goal in training RBM is to minimize energy of configurations corresponding to input data Train RBM by repeatedly sampling hidden and visible units for a given data input
Depth Binary representation does not capture word frequency information Inaccurate features learned at each level of DBN
Training Iterations Accuracy increases with more training iterations Increasing iterations may (partially) make up for learning poor features Configuration (v,h) LionsTigers Configuration (v,h) Lions Tigers Energy
Comparison to SVM, NB Binary features do not provide good starting point for learning higher level features Binary still useful, as 22% is better than random Time: DBN-2h,13m; SVM-4sec; NB-3sec 30 categories
Lowercasing Supposedly richer vocabulary when lowercasing Overfitting: we don’t need these extra words Other experiments show only top 500 words relevant
Suggestions for Improvement Use appropriate continuous-valued neurons Linear or Gaussian neurons Slower to train Not much documentation on using continuous-valued neurons with RBMs Implement backpropagation to fine-tune weights and biases Propagate error derivatives from top level RBM back to inputs Unsupervised training gives good initial weights, while backpropagation slightly modifies weights/biases Backpropagation cannot be used alone, as it tends to get stuck in local optima