From Cortical Microcircuits to Machine Intelligence

Slides:



Advertisements
Similar presentations
General learning in multiple domains transfer of learning across domains Generality and Transfer in Learning training items test items training items test.
Advertisements

Introduction to Neural Networks
Towards an Implementation of a Theory of Visual Learning in the Brain Shamit Patel CMSC 601 May 2, 2011.
Cognitive Systems, ICANN panel, Q1 What is machine intelligence, as beyond pattern matching, classification and prediction. What is machine intelligence,
Godfather to the Singularity
Hierarchical Temporal Memory (HTM)
Object recognition and scene “understanding”
Sparse Coding in Sparse Winner networks Janusz A. Starzyk 1, Yinyin Liu 1, David Vogel 2 1 School of Electrical Engineering & Computer Science Ohio University,
Template design only ©copyright 2008 Ohio UniversityMedia Production Spring Quarter  A hierarchical neural network structure for text learning.
Ming-Feng Yeh1 CHAPTER 13 Associative Learning. Ming-Feng Yeh2 Objectives The neural networks, trained in a supervised manner, require a target signal.
Soft computing Lecture 6 Introduction to neural networks.
[1].Edward G. Jones, Microcolumns in the Cerebral Cortex, Proc. of National Academy of Science of United States of America, vol. 97(10), 2000, pp
Un Supervised Learning & Self Organizing Maps Learning From Examples
COGNITIVE NEUROSCIENCE
EE141 1 Broca’s area Pars opercularis Motor cortexSomatosensory cortex Sensory associative cortex Primary Auditory cortex Wernicke’s area Visual associative.
A Theory of Cerebral Cortex (or, “How Your Brain Works”) Andrew Smith (CSE)
Associative Learning in Hierarchical Self Organizing Learning Arrays Janusz A. Starzyk, Zhen Zhu, and Yue Li School of Electrical Engineering and Computer.
IT 691 Final Presentation Pace University Created by: Robert M Gust Mark Lee Samir Hessami Mark Lee Samir Hessami.
Neural Networks (NN) Ahmad Rawashdieh Sa’ad Haddad.
Lecture 09 Clustering-based Learning
Artificial Neural Networks - Introduction -
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Lesson 4 Computer Software
ACM CIKM October 31, 2013 Jeff Hawkins On-line Learning From Streaming Data.
CHAPTER 12 ADVANCED INTELLIGENT SYSTEMS © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Biologically-Inspired Neural Nets Modeling the Hippocampus.
Self-Organized Recurrent Neural Learning for Language Processing April 1, March 31, 2012 State from June 2009.
Image Recognition using Hierarchical Temporal Memory Radoslav Škoviera Ústav merania SAV Fakulta matematiky, fyziky a informatiky UK.
Associative Pattern Memory (APM) Larry Werth July 14, 2007
Michael Ernst, page 1 Collaborative Learning for Security and Repair in Application Communities Performers: MIT and Determina Michael Ernst MIT Computer.
Introduction to Neural Networks. Neural Networks in the Brain Human brain “computes” in an entirely different way from conventional digital computers.
IE 585 Introduction to Neural Networks. 2 Modeling Continuum Unarticulated Wisdom Articulated Qualitative Models Theoretic (First Principles) Models Empirical.
 The most intelligent device - “Human Brain”.  The machine that revolutionized the whole world – “computer”.  Inefficiencies of the computer has lead.
Artificial Neural Network Yalong Li Some slides are from _24_2011_ann.pdf.
Outline What Neural Networks are and why they are desirable Historical background Applications Strengths neural networks and advantages Status N.N and.
Advances in Modeling Neocortex and its impact on machine intelligence Jeff Hawkins Numenta Inc. VS265 Neural Computation December 2, 2010 Documentation.
NEURAL NETWORKS FOR DATA MINING
2 2  Background  Vision in Human Brain  Efficient Coding Theory  Motivation  Natural Pictures  Methodology  Statistical Characteristics  Models.
Eric C. Rouchka, University of Louisville SATCHMO: sequence alignment and tree construction using hidden Markov models Edgar, R.C. and Sjolander, K. Bioinformatics.
LeDoux – Chapt 3 All mammalian brains share same organization Neocortex and particularly telencephalon is larger and more developed in primates and humans.
Practical Heirarchical Temporal Memory for Time Series Prediction
Hierarchical Temporal Memory as a Means for Image Recognition by Wesley Bruning CHEM/CSE 597D Final Project Presentation December 10, 2008.
A New Theory of Neocortex and Its Implications for Machine Intelligence TTI/Vanguard, All that Data February 9, 2005 Jeff Hawkins Director The Redwood.
Soft Computing Lecture 14 Clustering and model ART.
Pencil-and-Paper Neural Networks Prof. Kevin Crisp St. Olaf College.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 12-1 Chapter 12 Advanced Intelligent Systems.
CS 127 Introduction to Computer Science. What is a computer?  “A machine that stores and manipulates information under the control of a changeable program”
Chapter 2. From Complex Networks to Intelligent Systems in Creating Brain-like Systems, Sendhoff et al. Course: Robots Learning from Humans Baek, Da Som.
Why Can't A Computer Be More Like A Brain?. Outline Introduction Turning Test HTM ◦ A. Theory ◦ B. Applications & Limits Conclusion.
Artificial Intelligence: Research and Collaborative Possibilities a presentation by: Dr. Ernest L. McDuffie, Assistant Professor Department of Computer.
General mechanisms of Neocortical memory Jeff Hawkins Director Redwood Neuroscience Institute June 12, 2003 MIT.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
CSC321: Neural Networks Lecture 1: What are neural networks? Geoffrey Hinton
“Principles of Soft Computing, 2 nd Edition” by S.N. Sivanandam & SN Deepa Copyright  2011 Wiley India Pvt. Ltd. All rights reserved. CHAPTER 2 ARTIFICIAL.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Basics of Computational Neuroscience. What is computational neuroscience ? The Interdisciplinary Nature of Computational Neuroscience.
March 31, 2016Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms I 1 … let us move on to… Artificial Neural Networks.
HIERARCHICAL TEMPORAL MEMORY WHY CANT COMPUTERS BE MORE LIKE THE BRAIN?
Building a scalable neural processing subsystem Joy Bose Supervisors Steve Furber (Amulet Group) Jonathan Shapiro (AI Group) Amulet Group Meeting 30 January.
Oracle Advanced Analytics
Spiking Neuron Networks
Deep Learning Amin Sobhani.
Implementing a sequence machine using spiking neurons
What is an ANN ? The inventor of the first neuro computer, Dr. Robert defines a neural network as,A human brain like system consisting of a large number.
Chapter 12 Advanced Intelligent Systems
Content Analysis of Text
Introduction to Neural Network
CSC 578 Neural Networks and Deep Learning
Presentation transcript:

From Cortical Microcircuits to Machine Intelligence Numenta Neuro-Inspired Computing Elements Sandia National Labs 2014 Jeff Hawkins jhawkins@Numenta.com Catalyst for machine intelligence Research Theory, Algorithms NuPIC Open source community Products for streaming analytics Based on cortical algorithms

“If you invent a breakthrough so computers can learn, that is worth 10 Microsofts” Matt

"If you invent a breakthrough so computers that learn that is worth 10 Micros Machine Intelligence What principles will we use to build intelligent machines? What applications will drive adoption in the near and long term? Matt

Machine intelligence will be built on the principles of the neocortex Flexible (universal learning machine) Robust If we knew how the neocortex worked, we would be in a race to build them. Matt

The neocortex learns a sensory-motor model of the world What the Cortex Does The neocortex learns a model from sensory data - predictions - anomalies - actions retina cochlea somatic data stream The neocortex learns a sensory-motor model of the world

“Hierarchical Temporal Memory” (HTM) 1) Hierarchy of nearly identical regions - across species - across modalities retina cochlea somatic data stream 2) Regions are mostly sequence memory - inference - motor 3) Feedforward: Temporal stability - inference Feedback: Unfold sequences - motor/prediction

The cellular layer is unit of cortical computation. Region 2.5 mm 2/3 4 5 6 Layers 2/3 4 5 6 Sequence memory Layers & Mini-columns The cellular layer is unit of cortical computation. Each layer implements a variant of a common learning algorithm. Mini-columns play critical role in how layers learn sequences.

L4 and L2/3: Feedforward Inference Pass through changes Un-predicted 2/3 Learns high-order sequences A-B-C-D X-B-C-Y A-B-C- ? D X-B-C- ? Y Stable Predicted 4 Learns sensory-motor transitions Copy of motor commands Sensor/afferent data 5 6 Next higher region Layer 4 learns changes in input due to behavior. Layer 3 learns sequences of L4 remainders. These are universal inference steps. They apply to all sensory modalities.

L5 and L6: Feedback, Behavior and Attention Well understood tested / commercial 2/3 Learns high-order sequences Feedforward 90% understood testing in progress 4 Learns sensory-motor transitions 5 Recalls motor sequences 50% understood Feedback 6 Attention 10% understood

How does a layer of neurons learn sequences? 2/3 Learns high-order sequences 4 5 6 First: - Sparse Distributed Representations - Neurons

Dense Representations Few bits (8 to 128) All combinations of 1’s and 0’s Example: 8 bit ASCII Bits have no inherent meaning Arbitrarily assigned by programmer 01101101 = m Sparse Distributed Representations (SDRs) Many bits (thousands) Few 1’s mostly 0’s Example: 2,000 bits, 2% active Each bit has semantic meaning Learned 01000000000000000001000000000000000000000000000000000010000…………01000

SDR Properties 1) Similarity: shared bits = semantic similarity 2) Store and Compare: store indices of active bits Indices 1 2 3 4 5 | 40 subsampling is OK Indices 1 2 | 10 1) 2) 3) 3) Union membership: 2% …. 10) Union 20% Is this SDR a member?

Neurons Active Dendrites Pattern detectors Each cell can recognize 100’s of unique patterns Feedforward 100’s of synapses “Classic” receptive field Model neuron Context 1,000’s of synapses Depolarize neuron “Predicted state”

Learning Transitions Feedforward activation

Learning Transitions Inhibition

Learning Transitions Sparse cell activation

Learning Transitions Time = 1

Learning Transitions Time = 2

Learning Transitions Form connections to previously active cells. Predict future activity.

Multiple predictions can occur at once. A-B A-C A-D Learning Transitions Multiple predictions can occur at once. A-B A-C A-D This is a first order sequence memory. It cannot learn A-B-C-D vs. X-B-C-Y.

High-order Sequences Require Mini-columns A-B-C-D vs. X-B-C-Y After training X B’’ C’’ Y’’ A B’ C’ D’ Same columns, but only one cell active per column. Before training X B C Y A B C D IF 40 active columns, 10 cells per column THEN 1040 ways to represent the same input in different contexts

Cortical Learning Algorithm (CLA) aka Cellular Layer Converts input to sparse representations in columns Learns transitions Makes predictions and detects anomalies Applications 1) High-order sequence inference L2/3 2) Sensory-motor inference L4 3) Motor sequence recall L5 Capabilities - On-line learning - High capacity - Simple local learning rules - Fault tolerant - No sensitive parameters Basic building block of neocortex/Machine Intelligence

Anomaly Detection Using CLA Prediction Point anomaly Time average Historical comparison Anomaly score Encoder Metric(s) SDR CLA Encoder SDR Prediction Point anomaly Time average Historical comparison Anomaly score Metric N . . . System Anomaly Score

“Breakthrough Science for Anomaly Detection” Grok for Amazon AWS “Breakthrough Science for Anomaly Detection” Automated model creation Ranks anomalous instances Continuously updated Continuously learning

Unique Value of Cortical Algorithms Technology can be applied to any kind of data, financial, manufacturing, web sales, etc.

- = Application: CEPT Systems Apple Fruit Computer 100K “Word SDRs” Document corpus (e.g. Wikipedia) 128 x 128 Apple Fruit Computer - Macintosh Microsoft Mac Linux Operating system …. =

Sequences of Word SDRs Training set Word 3 Word 2 Word 1 frog eats flies cow grain elephant leaves goat grass wolf rabbit cat likes ball water sheep salmon mice lion dog sleep coyote rodent squirrel ---- ----- Word 3 Word 2 Word 1

Sequences of Word SDRs ? Training set “fox” eats frog eats flies cow grain elephant leaves goat grass wolf rabbit cat likes ball water sheep salmon mice lion dog sleep coyote rodent squirrel ---- ----- “fox” eats ?

Sequences of Word SDRs rodent Word SDRs created unsupervised Training set frog eats flies cow grain elephant leaves goat grass wolf rabbit cat likes ball water sheep salmon mice lion dog sleep coyote rodent squirrel ---- ----- “fox” eats rodent Word SDRs created unsupervised Semantic generalization SDR: lexical CLA: grammatic Commercial applications Sentiment analysis Abstraction Improved text to speech Dialog, Reporting, etc. www.Cept.at

Cept and Grok use exact same code base “fox” eats rodent

NuPIC Open Source Project (created July 2013) Source code for: - Cortical Learning Algorithm - Encoders - Support libraries Single source tree (used by GROK), GPLv3 Active and growing community - 80 contributors as of 2/2014 - 533 mailing list subscribers Education Resources / Hackathons www.Numenta.org

Goals For 2014 Research - Implement and test L4 sensory/motor inference - Introduce hierarchy - Publish NuPIC - Grow open source community - Support partners, e.g. IBM Almaden, CEPT Grok - Demonstrate commercial value for CLA Attract resources (people and money) Provide a “target” and market for HW - Explore new application areas

The Limits of Software One layer, no hierarchy 2000 mini-columns 30 cells per mini-column Up to 128 dendrite segments per cell Up to 40 active synapses per segment Thousands of potential synapses per segment One millionth human cortex 50 msec per inference/training step (highly optimized) 30 128 40 2000 columns We need HW for the usual reasons. Speed Power Volume

Matt

Start of supplemental material A more detailed view of Ray and Murray’s idea Walk through slowly, end on CPGs (central pattern generators) Ray Guillery and Murray Sherman’s “efference copy”. I don’t know if this is true everywhere. But I like it. It is consistent with the idea that cortex is cortex. But it begs the question “How do you interpret this?” How do we understand these connections from a theoretical point of view. It is easier to imagine M1 driving muscles, but how can mid and high level connections drive motor behavior?

Requirements for Online learning Train on every new input If pattern does not repeat, forget it If pattern repeats, reinforce it Learning is the formation of connections 1 connected unconnected Connection permanence 0.2 Connection strength/weight is binary Connection permanence is a scalar Training changes permanence If permanence > threshold then connected