Presentation on theme: "From Cortical Microcircuits to Machine Intelligence"— Presentation transcript:
1From Cortical Microcircuits to Machine Intelligence NumentaNeuro-Inspired Computing Elements Sandia National Labs 2014Jeff HawkinsCatalyst for machine intelligenceResearch Theory, AlgorithmsNuPIC Open source communityProducts for streaming analyticsBased on cortical algorithms
2“If you invent a breakthrough so computers can learn, that is worth 10 Microsofts” Matt
3"If you invent a breakthrough so computers that learn that is worth 10 Micros Machine IntelligenceWhat principles will we use to build intelligent machines?What applications will drive adoption in the near and long term?Matt
4Machine intelligence will be built on the principles of the neocortex Flexible (universal learning machine)RobustIf we knew how the neocortex worked, we would be in a race to build them.Matt
5The neocortex learns a sensory-motor model of the world What the Cortex DoesThe neocortex learns a model from sensory data- predictions anomalies actionsretinacochleasomaticdata streamThe neocortex learns a sensory-motor model of the world
6“Hierarchical Temporal Memory” (HTM) 1) Hierarchy of nearly identical regions- across species- across modalitiesretinacochleasomaticdata stream2) Regions are mostly sequence memory- inference- motor3) Feedforward: Temporal stability - inferenceFeedback: Unfold sequences- motor/prediction
7The cellular layer is unit of cortical computation. Region2.5 mm2/3456Layers2/3456Sequence memoryLayers &Mini-columnsThe cellular layer is unit of cortical computation.Each layer implements a variant of a common learning algorithm.Mini-columns play critical role in how layers learn sequences.
8L4 and L2/3: Feedforward Inference Pass throughchangesUn-predicted2/3Learns high-order sequencesA-B-C-D X-B-C-YA-B-C- ? D X-B-C- ? YStablePredicted4Learns sensory-motor transitionsCopy of motor commandsSensor/afferent data56Next higher regionLayer 4 learns changes in input due to behavior.Layer 3 learns sequences of L4 remainders.These are universal inference steps.They apply to all sensory modalities.
9L5 and L6: Feedback, Behavior and Attention Well understoodtested / commercial2/3Learns high-order sequencesFeedforward90% understoodtesting in progress4Learns sensory-motor transitions5Recalls motor sequences50% understoodFeedback6Attention10% understood
10How does a layer of neurons learn sequences? 2/3Learns high-order sequences456First:- Sparse Distributed Representations- Neurons
11Dense Representations Few bits (8 to 128)All combinations of 1’s and 0’sExample: 8 bit ASCIIBits have no inherent meaning Arbitrarily assigned by programmer= mSparse Distributed Representations (SDRs)Many bits (thousands)Few 1’s mostly 0’sExample: 2,000 bits, 2% activeEach bit has semantic meaning Learned…………01000
12SDR Properties 1) Similarity: shared bits = semantic similarity 2) Store and Compare: store indices of active bitsIndices | 40subsampling is OKIndices 1 2 | 101) 2) 3)3) Union membership:2%….10)Union20%Is this SDR a member?
13Neurons Active Dendrites Pattern detectors Each cell can recognize 100’s of unique patternsFeedforward100’s of synapses“Classic” receptive fieldModel neuronContext1,000’s of synapsesDepolarize neuron“Predicted state”
19Learning TransitionsForm connections to previously active cells.Predict future activity.
20Multiple predictions can occur at once. A-B A-C A-D Learning TransitionsMultiple predictions can occur at once.A-B A-C A-DThis is a first order sequence memory.It cannot learn A-B-C-D vs. X-B-C-Y.
21High-order Sequences Require Mini-columns A-B-C-D vs. X-B-C-YAfter trainingXB’’C’’Y’’AB’C’D’Same columns, but only one cell active per column.Before trainingXBCYABCDIF active columns, 10 cells per columnTHEN ways to represent the same input in different contexts
22Cortical Learning Algorithm (CLA) aka Cellular LayerConverts input to sparse representations in columnsLearns transitionsMakes predictions and detects anomaliesApplications 1) High-order sequence inference L2/32) Sensory-motor inference L43) Motor sequence recall L5Capabilities - On-line learning- High capacity- Simple local learning rules- Fault tolerant- No sensitive parametersBasic building block of neocortex/Machine Intelligence
23Anomaly Detection Using CLA PredictionPoint anomalyTime averageHistorical comparisonAnomaly scoreEncoderMetric(s)SDRCLAEncoderSDRPredictionPoint anomalyTime averageHistorical comparisonAnomaly scoreMetric N. . .System Anomaly Score
24“Breakthrough Science for Anomaly Detection” Grok for Amazon AWS“Breakthrough Science for Anomaly Detection”Automated model creationRanks anomalous instancesContinuously updatedContinuously learning
25Unique Value of Cortical Algorithms Technology can be applied to any kind of data, financial, manufacturing, web sales, etc.
26- = Application: CEPT Systems Apple Fruit Computer 100K “Word SDRs”Document corpus (e.g. Wikipedia)128 x 128AppleFruitComputer-MacintoshMicrosoftMacLinuxOperating system….=
27Sequences of Word SDRs Training set Word 3 Word 2 Word 1 frog eats fliescowgrainelephantleavesgoatgrasswolfrabbitcatlikesballwatersheepsalmonmiceliondogsleepcoyoterodentsquirrel---------Word 3Word 2Word 1
28Sequences of Word SDRs ? Training set “fox” eats frog eats flies cow grainelephantleavesgoatgrasswolfrabbitcatlikesballwatersheepsalmonmiceliondogsleepcoyoterodentsquirrel---------“fox”eats?
29Sequences of Word SDRs rodent Word SDRs created unsupervised Training setfrogeatsfliescowgrainelephantleavesgoatgrasswolfrabbitcatlikesballwatersheepsalmonmiceliondogsleepcoyoterodentsquirrel---------“fox”eatsrodentWord SDRs created unsupervisedSemantic generalization SDR: lexical CLA: grammaticCommercial applications Sentiment analysis Abstraction Improved text to speech Dialog, Reporting, etc.
30Cept and Grok use exact same code base “fox”eatsrodent
31NuPIC Open Source Project (created July 2013) Source code for: - Cortical Learning Algorithm - Encoders - Support librariesSingle source tree (used by GROK), GPLv3Active and growing community contributors as of 2/ mailing list subscribersEducation Resources / Hackathons
32Goals For 2014Research- Implement and test L4 sensory/motor inference - Introduce hierarchy - PublishNuPIC- Grow open source community - Support partners, e.g. IBM Almaden, CEPT Grok- Demonstrate commercial value for CLA Attract resources (people and money) Provide a “target” and market for HW - Explore new application areas
33The Limits of Software One layer, no hierarchy 2000 mini-columns 30 cells per mini-columnUp to 128 dendrite segments per cellUp to 40 active synapses per segmentThousands of potential synapses per segmentOne millionth human cortex50 msec per inference/training step (highly optimized)30128402000 columnsWe need HW for the usual reasons.SpeedPowerVolume
35Start of supplemental material A more detailed view of Ray and Murray’s ideaWalk through slowly, end on CPGs (central pattern generators) Ray Guillery and Murray Sherman’s “efference copy”. I don’t know if this is true everywhere. But I like it. It is consistent with the idea that cortex is cortex. But it begs the question “How do you interpret this?” How do we understand these connections from a theoretical point of view.It is easier to imagine M1 driving muscles, but how can mid and high level connections drive motor behavior?
39Requirements for Online learning Train on every new inputIf pattern does not repeat, forget itIf pattern repeats, reinforce itLearning is the formation of connections1connectedunconnectedConnection permanence0.2Connection strength/weight is binaryConnection permanence is a scalarTraining changes permanenceIf permanence > threshold then connected