Science & Technology Centers Program Center for Science of Information Bryn Mawr Howard MIT Princeton Purdue Stanford Texas A&M UC Berkeley UC San Diego.

Slides:



Advertisements
Similar presentations
Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.
Advertisements

Entropy and Information Theory
Modeling Maze Navigation Consider the case of a stationary robot and a mobile robot moving towards a goal in a maze. We can model the utility of sharing.
Presentation at WebEx Meeting June 15,  Context  Challenge  Anticipated Outcomes  Framework  Timeline & Guidance  Comment and Questions.
Information Theory EE322 Al-Sanie.
Science & Technology Centers Program National Science Foundation Science & Technology Centers Program Bryn Mawr Howard MIT Princeton Purdue Stanford UC.
Science & Technology Centers Program National Science Foundation Science & Technology Centers Program Bryn Mawr Howard MIT Princeton Purdue Stanford UC.
Information Theory for Mobile Ad-Hoc Networks (ITMANET): The FLoWS Project FLoWS Team Update: Andrea Goldsmith ITMANET PI Meeting Jan 27, 2011.
ISE480 Sequencing and Scheduling Izmir University of Economics ISE Fall Semestre.
Michael A. Nielsen University of Queensland Quantum entropy Goals: 1.To define entropy, both classical and quantum. 2.To explain data compression, and.
An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.
Fundamental limits in Information Theory Chapter 10 :
MS DB Proposal Scott Canaan B. Thomas Golisano College of Computing & Information Sciences.
*Sponsored in part by the DARPA IT-MANET Program, NSF OCE Opportunistic Scheduling with Reliability Guarantees in Cognitive Radio Networks Rahul.
Advanced Topics COMP163: Database Management Systems University of the Pacific December 9, 2008.
Basic Models in Neuroscience Oren Shriki 2010 Associative Memory 1.
SING* and ToNC * Scientific Foundations for Internet’s Next Generation Sirin Tekinay Program Director Theoretical Foundations Communication Research National.
 Linear Programming and Smoothed Complexity Richard Kelley.
Triangulation of network metaphors The Royal Netherlands Academy of Arts and Sciences Iina Hellsten & Andrea Scharnhorst Networked Research and Digital.
Noise, Information Theory, and Entropy
1 Trends in Mathematics: How could they Change Education? László Lovász Eötvös Loránd University Budapest.
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
Framework for K-12 Science Education
National Science Foundation Science & Technology Centers Program Bryn Mawr Howard MIT Princeton Purdue Stanford UC Berkeley UC San Diego UIUC Biology Thrust.
Tennessee Technological University1 The Scientific Importance of Big Data Xia Li Tennessee Technological University.
Science & Technology Centers Program National Science Foundation Science & Technology Centers Program Bryn Mawr Howard MIT Princeton Purdue Stanford UC.
UNIVERSITY of NOTRE DAME COLLEGE of ENGINEERING Preserving Location Privacy on the Release of Large-scale Mobility Data Xueheng Hu, Aaron D. Striegel Department.
Science & Technology Centers Program Center for Science of Information Bryn Mawr Howard MIT Princeton Purdue Stanford Texas A&M UC Berkeley UC San Diego.
NSF Critical Infrastructures Workshop Nov , 2006 Kannan Ramchandran University of California at Berkeley Current research interests related to workshop.
Science & Technology Centers Program Center for Science of Information National Science Foundation Science & Technology Centers Program Bryn Mawr Howard.
Information Theory for Mobile Ad-Hoc Networks (ITMANET): The FLoWS Project Thrust 2 Layerless Dynamic Networks Lizhong Zheng, Todd Coleman.
National Science Foundation Science & Technology Centers Program Bryn Mawr Howard MIT Princeton Purdue Stanford UC Berkeley UC San Diego UIUC Biology Thrust.
Channel Capacity
RESOURCES, TRADE-OFFS, AND LIMITATIONS Group 5 8/27/2014.
Prof. Gautam Dasgupta Columbia University LiveDesign Quantification of Subjective Opinion in Mitigating Effects of Extreme Events by Gautam Dasgupta, Professor,
Science & Technology Centers Program National Science Foundation Science & Technology Centers Program Bryn Mawr Howard MIT Princeton Purdue Stanford UC.
1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2
Using Bayesian Networks to Analyze Whole-Genome Expression Data Nir Friedman Iftach Nachman Dana Pe’er Institute of Computer Science, The Hebrew University.
The roots of innovation Future and Emerging Technologies (FET) Future and Emerging Technologies (FET) The roots of innovation Proactive initiative on:
Chris DeWald Science Instructional Coordinator Montana Office of Public Instruction.
MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.
Wireless Communication Elec 534 Set I September 9, 2007 Behnaam Aazhang.
Science & Technology Centers Program National Science Foundation Science & Technology Centers Program Bryn Mawr Howard MIT Princeton Purdue Stanford UC.
Science & Technology Centers Program National Science Foundation Science & Technology Centers Program Bryn Mawr Howard MIT Princeton Purdue Stanford UC.
Science & Technology Centers Program Center for Science of Information Bryn Mawr Howard MIT Princeton Purdue Stanford Texas A&M UC Berkeley UC San Diego.
STATISTICAL COMPLEXITY ANALYSIS Dr. Dmitry Nerukh Giorgos Karvounis.
Coding Theory Efficient and Reliable Transfer of Information
Information Theory for Mobile Ad-Hoc Networks (ITMANET): The FLoWS Project Thrust 2 Overview: Layerless Dynamic Networks Lizhong Zheng.
Information Theory in an Industrial Research Lab Marcelo J. Weinberger Information Theory Research Group Hewlett-Packard Laboratories – Advanced Studies.
Outline The role of information What is information? Different types of information Controlling information.
Information Theory for Mobile Ad-Hoc Networks (ITMANET): The FLoWS Project Competitive Scheduling in Wireless Networks with Correlated Channel State Ozan.
Prof. Gautam Dasgupta Columbia University LiveDesign Security Engineering by Gautam Dasgupta, Professor, Civil Engineering Columbia University, New York,
Analyzing wireless sensor network data under suppression and failure in transmission Alan E. Gelfand Institute of Statistics and Decision Sciences Duke.
Analytical Foundations of Networked Computing Kirstie Bellman, Luiz DaSilva, Robert Kleinberg, Michael Mahoney, Amin Saberi, Ion Stoica, Eva Tardos, Shanghua.
Lecture 2: Statistical learning primer for biologists
National Research Council Of the National Academies
DISTIN: Distributed Inference and Optimization in WSNs A Message-Passing Perspective SCOM Team
Content caching and scheduling in wireless networks with elastic and inelastic traffic Group-VI 09CS CS CS30020 Performance Modelling in Computer.
Science & Technology Centers Program Center for Science of Information Bryn Mawr Howard MIT Princeton Purdue Stanford Texas A&M UC Berkeley UC San Diego.
Asymptotic Analysis for Large Scale Dynamic Stochastic Games Sachin Adlakha, Ramesh Johari, Gabriel Weintraub and Andrea Goldsmith DARPA ITMANET Meeting.
Modelling & Simulation of Semiconductor Devices Lecture 1 & 2 Introduction to Modelling & Simulation.
Science & Technology Centers Program National Science Foundation Science & Technology Centers Program Bryn Mawr Howard MIT Princeton Purdue Stanford UC.
Science & Technology Centers Program Center for Science of Information Bryn Mawr Howard MIT Princeton Purdue Stanford Texas A&M UC Berkeley UC San Diego.
Science & Technology Centers Program Center for Science of Information Bryn Mawr Howard MIT Princeton Purdue Stanford Texas A&M UC Berkeley UC San Diego.
Sub-fields of computer science. Sub-fields of computer science.
OPERATING SYSTEMS CS 3502 Fall 2017
Center for Science of Information
Emerging Frontiers of Science Information
Emerging Frontiers of Science of Information
Presentation transcript:

Science & Technology Centers Program Center for Science of Information Bryn Mawr Howard MIT Princeton Purdue Stanford Texas A&M UC Berkeley UC San Diego UIUC Frontiers of Science of Information: Wojciech Szpankowski IS4IS Summit, Vienna, National Science Foundation/Science & Technology Centers Program

Science & Technology Centers Program 1.Science of Information & Center Mission 2.Post-Shannon Challenges 3.Center Team 4.Research –Fundamentals: Structural Information –Data Science: Data Models & Inference –Life Sciences: Protein Folding 2

Science & Technology Centers Program The Information Revolution started in 1948, with the publication of: A Mathematical Theory of Communication. The digital age began. Claude Shannon: Shannon information quantifies the extent to which a recipient of data can reduce its statistical uncertainty. “semantic aspects of communication are irrelevant...” Objective: Reproducing reliably data: Fundamental Limits for Storage and Communication. Applications Enabler/Driver: CD, iPod, DVD, video games, Internet, Facebook, WiFi, mobile, Google,.. Design Driver: universal data compression, voiceband modems, CDMA, multiantenna, discrete denoising, space-time codes, cryptography, distortion theory approach to Big Data. 3

Science & Technology Centers Program 4  Claude Shannon laid the foundation of information theory, demonstrating that problems of data transmission and compression can be precisely modeled formulated, and analyzed.  SCIENCE OF INFORMATION builds on Shannon’s principles to address key challenges in understanding information that nowadays is not only communicated but also acquired, curated, organized, aggregated, managed, processed, suitably abstracted and represented, analyzed, inferred, valued, secured, and used in various scientific, engineering, and socio-economic processes. Georg Cantor ( ): “In re mathematica ars proponendi questionem pluris facienda est quam solvendi ” ( In mathematics the art of proposing a question must be held of higher value than solving it.) Georg Cantor ( ): “In re mathematica ars proponendi questionem pluris facienda est quam solvendi ” ( In mathematics the art of proposing a question must be held of higher value than solving it.)

Science & Technology Centers Program 5  Extend Information Theory to meet new challenges in biology, economics, data & social sciences, and distributed systems.  Understand new aspects of information (embedded) in structure, time, space, semantics, dynamic information, limited resources, complexity, representation- invariant information, and cooperation & dependency. CSoI MISSION: Advance science and technology through a new quantitative understanding of the representation, communication and processing of information in biological, physical, social and engineering systems.

Science & Technology Centers Program 1.Science of Information & Center Mission 2.Post-Shannon Challenges 3.Center Team 4.Research –Fundamentals: Structural Information –Data Science: Data Models & Inference –Life Sciences: Protein Folding 6

Science & Technology Centers Program Structure: Measures are needed for quantifying information embodied in structures (e.g., information in material structures, nanostructures, biomolecules, gene regulatory networks, protein networks, social networks, financial transactions). [F. Brooks, JACM, 2003] Szpankowski, Choi, Manger : Information contained in unlabeled graphs & universal graphical compression. Grama & Subramaniam : quantifying role of noise and incomplete data, identifying conserved structures, finding orthologies in biological network reconstruction. Neville : Outlining characteristics (e.g., weak dependence) sufficient for network models to be well-defined in the limit. Yu & Qi: Finding distributions of latent structures in social networks. Szpankowski, Baryshnikov, & Duda: structure of Markov fields and optimal compression. Szpankowski, Wolpert: Network-Constraints Networks (NCO).

Science & Technology Centers Program Time: Classical Information Theory is at its weakest in dealing with problems of delay (e.g., information arriving late may be useless of has less value). Verdu, Polyanskiy, Kostina: major breakthrough in extending Shannon capacity theorem to finite blocklength information theory for lossy data compression. Kumar: design reliable scheduling policies with delay constraints for wireless networks; new axiomatic approach to secure wireless networks. Weissman : real time coding system with lookahead to investigate the impact of delay on expected distortion and bit rate. Subramaniam: reconstruct networks from dynamic biological data. Representation-invariance: How to know whether two representations of the same information are information equivalent?

Science & Technology Centers Program 9 Limited Resources : In many scenarios, information is limited by available computational resources (e.g., cell phone, living cell). Bialek works on structure of molecular networks that optimize information flow, subject to constraints on the total number of molecules being used. Verdu, Goldsmith investigates the minimum energy per bit as a function of data length in Gaussian channels. Semantics : Is there a way to account for the meaning or semantics of information? Sudan argues that the meaning of information becomes relevant whenever there is diversity across communicating parties and when parties themselves evolve over time. New collaboration between Sudan and Tse on human communication.

Science & Technology Centers Program 10 Learnable Information (BigData): Data driven science focuses on extracting information from data. How much information can actually be extracted from a given data repository? Information Theory of Big Data? Big data domain exhibits certain properties: Large (peta and exa scale) Noisy (high rate of false positives and negatives) Multiscale (interaction at different levels of abstractions) Dynamic (temporal and spatial changes) Heterogeneous (high variability over space and time Distributed (collected and stored at distributed locations) Elastic (flexibility to data model and clustering capabilities) Complex dependencies (structural, long term) High dimensional Ad-hoc solutions do not work at scale! ``Big data has arrived but big insights have not..’’ Financial Times, J. Hartford.

Science & Technology Centers Program 11 Cooperation & Dependency : How does cooperation impact information (nodes should cooperate in their own self-interest)? Cover initiateed a theory of cooperation and coordination in networks, that is, they study the achievable joint distribution among network nodes, provided that the communication rates are given. Dependency and rational expectation are critical ingredients in Sims' work on modern dynamic economic theory. Coleman is studying statistical causality in neural systems by using Granger principles. Quantum Information : The flow of information in macroscopic systems and microscopic systems may posses different characteristics. Aaronson and Shor lead these investigations. Aaronson develops computational complexity of linear systems.

Science & Technology Centers Program Economic systems share many features with communication networks: acquiring, storing, sharing, and processing information. Yet one crucial feature that distinguishes economic networks from reliable communication is value of information. A major challenge in economics is to formalize the notion of information value for dynamic settings involving delay constrains. 12 Value of Information: For lattice and define value V(a) as: 1.V(a) 0; Flow of Information? 2.If a b then V(a) V(b) 3.V(a) is submodular: If, then for all :

Science & Technology Centers Program 13 Life of a bit Copier has finite energy information dissipates (slowly!) DRAM controller Does refresh every 64 ms! read-writes / year. Study: of Google servers have DRAM errors each year (median: errs/Mbit) Majority: data-path related Y Y. Polyanskiy and Y. Wu: ``Dissipation of Information’’

Science & Technology Centers Program 14

Science & Technology Centers Program 15 Information Knowledge Data Framing the Foundation Data: set of values of qualitative variables; individual pieces of information. Information: measure of distinguishibility. Knowledge: actionable information.

Science & Technology Centers Program 1.Information: Core Principle Structural Information Temporal Information Value of Information 2.Communication & Control: Fundamental Limit Flow of Information in Dynamic/Cooperative Networks Probable Security 3.Data: Framing the Foundations Information-Theoretic Models Precise Complexity (Small Data) Structural Insights into Data 4.Modeling and Analysis: Life Sciences Sequence Analysis Network Inferences, Modeling, and Analysis Information Flow in Human Brain From Energetics to Sequence and Conformations 16

Science & Technology Centers Program 1.Science of Information & Center Mission 2.Post-Shannon Challenges 3.Center Team 4.Research –Fundamentals: Structural Information –Data Science: Data Models & Inference –Life Sciences: Protein Folding 17

Science & Technology Centers Program Bryn Mawr College: D. Kumar Howard University: C. Liu, L. Burge MIT: P. Shor (co-PI), M. Sudan Purdue University (lead): W. Szpankowski (PI) Princeton University: S. Verdu (co-PI) Stanford University: A. Goldsmith (co-PI) Texas A&M: P.R. Kumar University of California, Berkeley: Bin Yu (co-PI) University of California, San Diego: S. Subramaniam UIUC: O. Milenkovic University of Hawaii: P. Santhanam Bin Yu, U.C. Berkeley Sergio Verdú, Princeton Peter Shor, MIT Andrea Goldsmith, Stanford 18 R. Aguilar, M. Atallah, S. Datta, A. Grama, A. Mathur, J. Neville, D. Ramkrishna, L. Si, V. Rego, A. Qi, M. Ward, D. Xu, C. Liu, L. Burge, S. Aaronson, N. Lynch, R. Rivest, Y. Polyanskiy, W. Bialek, S. Kulkarni, C. Sims, G. Bejerano, T. Cover, A. Ozgur, T. Weissman, V. Anantharam, J. Gallant, T. Courtade, M. Mahoney, D. Tse, T.Coleman, Y. Baryshnikov, M. Raginsky.. Wojciech Szpankowski, Purdue

Science & Technology Centers Program  Nobel Prize (Economics): Chris Sims  National Academies (NAS/NAE) – Bialek, Cover, Datta, Lynch, Kumar, Ramkrishna, Rice, Rivest, Shor, Sims, Verdu, Yu.  Turing award winner -- Rivest.  Shannon award winners -- Cover and Verdu.  Nevanlinna Prize (outstanding contributions in Mathematical Aspects of Information Sciences) – Sudan and Shor.  Richard W. Hamming Medal – Cover and Verdu.  Humboldt Research Award – Szpankowski.  Swartz Prize in Neuroscience – Bialek. 19

Science & Technology Centers Program 20 RESEARCH THRUSTS: 1. Information & Communication 2. Knowledge Extraction (Data Science) 3. Life Sciences S. Subramaniam T. Weissman J. Neville RESEARCH MISSION: Create a shared intellectual space, integral to the Center’s activities, providing a collaborative research environment that crosses disciplinary and institutional boundaries. A. Grama S. Kulkarni David Tse

Science & Technology Centers Program 1.Science of Information & Center Mission 2.Post-Shannon Challenges 3.Center Team 4.Research –Fundamentals: Structural Information –Data Science: Data Models & Inference –Life Sciences: Protein Folding 21

Science & Technology Centers Program 22 Paris FRANCE Lyon FRANCE Strasbourg FRANCE Berlin GERMANY Strasburg GERMANY Dortmund GERMANY Gdansk POLAND Warsaw POLAND Structure vs Labeled Structure How much information embedded in structure and (correlated) labels? Fundamental New Limits?

Science & Technology Centers Program 23

Science & Technology Centers Program 24 Information Content of Unlabeled Graphs: A structure model S of a graph G is defined for an unlabeled version. Some labeled graphs have the same structure. Graph Entropy vs Structural Entropy: The probability of a structure S is: P(S) = N(S) · P(G) where N(S) is the number of different labeled graphs having the same structure. Y. Choi and W.S., IEEE Trans. Information Theory, 2012.

Science & Technology Centers Program 25

Science & Technology Centers Program 26 Theorem. [Choi, WS, 2012] Let be the code length. Then for Erd ö s-R é nyi graphs: (i)Lower Bound: (I)Upper Bound: where c is an explicitly computable constant, and is a fluctuating function with a small amplitude or zero.

Science & Technology Centers Program 1.Science of Information & Center Mission 2.Post-Shannon Challenges 3.Center Team 4.Research –Fundamentals: Structural Information –Data Science: Data Models & Inference –Life Sciences: Protein Folding 27

Science & Technology Centers Program 28 Recommendation systems make suggestions based on prior information: Recommendations are usually lists indexed by likelihood. on server Prior search history Processor Distributed Data Original (large) database is compressed Compressed version can be stored at several locations Queries about original database can be answered reliably from a compressed version of it. Databases may be compressed for the purpose of answering queries of the form: “Is there an entry similar to y in the database?”. Data is often processed for purposes other than reproduction of the original data: (new goal: reliably answer queries rather than reproduce data!) Courtade, Weissman, IEEE Trans. Information Theory, 2013.

Science & Technology Centers Program 29 Fundamental Tradeoff: what is the minimum description (compression) rate required to generate a quantifiably good set of beliefs and/or reliable answers. General Results : queries can be answered reliably if and only if the compression rate exceeds the identification rate. Quadratic similarity queries on compressed data Distributed (multiterminal) source coding under logarithmic loss

Science & Technology Centers Program 1.Science of Information & Center Mission 2.Post-Shannon Challenges 3.Center Team 4.Research –Fundamentals: Structural Information –Data Science: Data Models & Inference –Life Sciences: Protein Folding 30

Science & Technology Centers Program 31 Probability of protein folds vs. sequence rank Protein Folds in Nature

Science & Technology Centers Program protein-fold channel s f HPHHPPHPHHPPHHPH Information-Theoretic Model Optimal input distribution from Blahut-Arimoto algorithm:

Science & Technology Centers Program 33

Science & Technology Centers Program 34 : set of self-avoiding walks of length : Energy of a walk over sequence : We can prove that: is free energy, and Phase transition of the free energy (hence the capacity ):. Then