Future directions in computer science research John Hopcroft Department of Computer Science Cornell University Heidelberg Laureate Forum Sept 27, 2013.

Slides:



Advertisements
Similar presentations
QUICK QUIZ 24.1 (For the end of section 24.1)
Advertisements

How to amaze your friends the link between mazes, murder and organised crime Chris Budd Chris Budd.
Sergey Bravyi, IBM Watson Center Robert Raussendorf, Perimeter Institute Perugia July 16, 2007 Exactly solvable models of statistical physics: applications.
Complex Networks: Complex Networks: Structures and Dynamics Changsong Zhou AGNLD, Institute für Physik Universität Potsdam.
B ETTI NUMBERS OF RANDOM SIMPLICIAL COMPLEXES MATTHEW KAHLE & ELIZABETH MECKE Presented by Ariel Szapiro.
Scale Free Networks.
The Theory of Zeta Graphs with an Application to Random Networks Christopher Ré Stanford.
Shortest Vector In A Lattice is NP-Hard to approximate
Algorithmic and Economic Aspects of Networks Nicole Immorlica.
Analysis and Modeling of Social Networks Foudalis Ilias.
Week 5 - Models of Complex Networks I Dr. Anthony Bonato Ryerson University AM8002 Fall 2014.
Week 4 – Random Graphs Dr. Anthony Bonato Ryerson University AM8002 Fall 2014.
Lecture 21 Network evolution Slides are modified from Jurij Leskovec, Jon Kleinberg and Christos Faloutsos.
VL Netzwerke, WS 2007/08 Edda Klipp 1 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Networks in Metabolism.
Beyond Trilateration: On the Localizability of Wireless Ad Hoc Networks Reported by: 莫斌.
Future directions in computer science research 23rd International Symposium on Algorithms and Computation John Hopcroft Cornell University ISAAC.
Future directions in computer science research John Hopcroft Department of Computer Science Cornell University CINVESTAV-IPN Dec 2,2013.
Scale-free networks Péter Kómár Statistical physics seminar 07/10/2008.
Graph Based Semi- Supervised Learning Fei Wang Department of Statistical Science Cornell University.
Network Statistics Gesine Reinert. Yeast protein interactions.
Application of Statistical Techniques to Neural Data Analysis Aniket Kaloti 03/07/2006.
Advanced Topics in Data Mining Special focus: Social Networks.
Future Directions in Computer Science John Hopcroft Cornell University Ithaca, New York.
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Network Measures Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Network Measures Klout.
Measurement and Evolution of Online Social Networks Review of paper by Ophir Gaathon Analysis of Social Information Networks COMS , Spring 2011,
Peer-to-Peer and Social Networks Random Graphs. Random graphs E RDÖS -R ENYI MODEL One of several models … Presents a theory of how social webs are formed.
Data Mining Techniques
Random Graph Models of Social Networks Paper Authors: M.E. Newman, D.J. Watts, S.H. Strogatz Presentation presented by Jessie Riposo.
(Social) Networks Analysis III Prof. Dr. Daning Hu Department of Informatics University of Zurich Oct 16th, 2012.
Lecture 6 - Models of Complex Networks II Dr. Anthony Bonato Ryerson University AM8002 Fall 2014.
CAS May 24, 2010 Creating a science base to support new directions in computer science John Hopcroft Cornell University Ithaca, New York.
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Eric Horvitz, Michael Mahoney,
Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.
MapReduce and Graph Data Chapter 5 Based on slides from Jimmy Lin’s lecture slides ( (licensed.
Building a Science Base for the Information Age John Hopcroft Cornell University Ithaca, NY Xiamen University.
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
An Efficient Approach to Clustering in Large Multimedia Databases with Noise Alexander Hinneburg and Daniel A. Keim.
Approximating the Minimum Degree Spanning Tree to within One from the Optimal Degree R 陳建霖 R 宋彥朋 B 楊鈞羽 R 郭慶徵 R
RESOURCES, TRADE-OFFS, AND LIMITATIONS Group 5 8/27/2014.
1 Burning a graph as a model of social contagion Anthony Bonato Ryerson University Institute of Software Chinese Academy of Sciences.
Murtaza Abbas Asad Ali. NETWORKOLOGY THE SCIENCE OF NETWORKS.
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Jon Kleinberg (Cornell), Christos.
The mathematics of graphs A graph has many representations, the simplest being a collection of dots (vertices) and lines (edges). Below is a cubic graph.
Social Network Analysis Prof. Dr. Daning Hu Department of Informatics University of Zurich Mar 5th, 2013.
Networks Igor Segota Statistical physics presentation.
Random Dot Product Graphs Ed Scheinerman Applied Mathematics & Statistics Johns Hopkins University IPAM Intelligent Extraction of Information from Graphs.
Lecture 10: Network models CS 765: Complex Networks Slides are modified from Networks: Theory and Application by Lada Adamic.
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College LAPP-Top Computer Science February 2005.
Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)
Most of contents are provided by the website Network Models TJTSD66: Advanced Topics in Social Media (Social.
Kaplan University AB140 Introduction to Management Welcome to our Unit 2 Seminar Foundations of Management.
Statistics in WR: Lecture 1 Key Themes – Knowledge discovery in hydrology – Introduction to probability and statistics – Definition of random variables.
Graphs, Vectors, and Matrices Daniel A. Spielman Yale University AMS Josiah Willard Gibbs Lecture January 6, 2016.
1 Travel Times from Mobile Sensors Ram Rajagopal, Raffi Sevlian and Pravin Varaiya University of California, Berkeley Singapore Road Traffic Control TexPoint.
Network (graph) Models
Cohesive Subgraph Computation over Large Graphs
Rocky K. C. Chang September 4, 2017
LECTURE 11: Advanced Discriminant Analysis
Polynomial integrality gaps for
Rocky K. C. Chang September 3, 2018
Subhash Khot Theory Group
Peer-to-Peer and Social Networks Fall 2017
On the effect of randomness on planted 3-coloring models
Quiz 1 (lecture 4) Ea
Modelling and Searching Networks Lecture 5 – Random graphs
Modelling and Searching Networks Lecture 6 – PA models
Discrete Mathematics and its Applications Lecture 5 – Random graphs
Discrete Mathematics and its Applications Lecture 6 – PA models
Presentation transcript:

Future directions in computer science research John Hopcroft Department of Computer Science Cornell University Heidelberg Laureate Forum Sept 27, 2013

Time of change The information age is a revolution that is changing all aspects of our lives. Those individuals, institutions, and nations who recognize this change and position themselves for the future will benefit enormously. Heidelberg Laureate Forum Sept 27,2013

Computer Science is changing Early years Programming languages Compilers Operating systems Algorithms Data bases Emphasis on making computers useful Heidelberg Laureate Forum Sept 27,2013

Computer Science is changing The future years Tracking the flow of ideas in scientific literature Tracking evolution of communities in social networks Extracting information from unstructured data sources Processing massive data sets and streams Extracting signals from noise Dealing with high dimensional data and dimension reduction The field will become much more application oriented Heidelberg Laureate Forum Sept 27,2013

Computer Science is changing Merging of computing and communication The wealth of data available in digital form Networked devices and sensors Drivers of change Heidelberg Laureate Forum Sept 27,2013

Implications for Theoretical Computer Science Need to develop theory to support the new directions Update computer science education Heidelberg Laureate Forum Sept 27,2013

Theory to support new directions Large graphs Spectral analysis High dimensions and dimension reduction Clustering Collaborative filtering Extracting signal from noise Sparse vectors Learning theory Heidelberg Laureate Forum Sept 27,2013

Sparse vectors There are a number of situations where sparse vectors are important. Tracking the flow of ideas in scientific literature Biological applications Signal processing Heidelberg Laureate Forum Sept 27,2013

Sparse vectors in biology plants Genotype Internal code Phenotype Observables Outward manifestation Heidelberg Laureate Forum Sept 27,2013

Digitization of medical records Doctor – needs my entire medical record Insurance company – needs my last doctor visit, not my entire medical record Researcher – needs statistical information but no identifiable individual information Relevant research – zero knowledge proofs, differential privacy Heidelberg Laureate Forum Sept 27,2013

A zero knowledge proof of a statement is a proof that the statement is true without providing you any other information. Heidelberg Laureate Forum Sept 27,2013

Zero knowledge proof Graph 3-colorability Problem is NP-hard - No polynomial time algorithm unless P=NP Heidelberg Laureate Forum Sept 27,2013

Zero knowledge proof Heidelberg Laureate Forum Sept 27,2013

Digitization of medical records is not the only system Car and road – gps – privacy Supply chains Transportation systems Heidelberg Laureate Forum Sept 27,2013

In the past, sociologists could study groups of a few thousand individuals. Today, with social networks, we can study interaction among hundreds of millions of individuals. One important activity is how communities form and evolve. Heidelberg Laureate Forum Sept 27,2013

Future work Consider communities with more external edges than internal edges Find small communities Track communities over time Develop appropriate definitions for communities Understand the structure of different types of social networks Heidelberg Laureate Forum Sept 27,2013

Our view of a community TCS Me Colleagues at Cornell Classmates Family and friends More connections outside than inside Heidelberg Laureate Forum Sept 27,2013

Structure of communities How many communities is a person in? Small, medium, large? How many seed points are needed to uniquely specify a community a person is in? Which seeds are good seeds? Etc. Heidelberg Laureate Forum Sept 27,2013

What types of communities are there? How do communities evolve over time? Are all social networks similar? Heidelberg Laureate Forum Sept 27,2013

Are the underlying graphs for social networks similar or do we need different algorithms for different types of networks? G(1000,1/2) and G(1000,1/4) are similar, one is just denser than the other. G(2000,1/2) and G(1000,1/2) are similar, one is just larger than the other. Heidelberg Laureate Forum Sept 27,2013

TU Berlin Sept 20, 2013

Two G(n,p) graphs are similar even though they have only 50% of edges in common. What do we mean mathematically when we say two graphs are similar? Heidelberg Laureate Forum Sept 27,2013

Theory of Large Graphs Large graphs with billions of vertices Exact edges present not critical Invariant to small changes in definition Must be able to prove basic theorems Heidelberg Laureate Forum Sept 27,2013

Erdös-Renyi n vertices each of n 2 potential edges is present with independent probability NnNn p n (1-p) N-n vertex degree binomial degree distribution number of vertices Heidelberg Laureate Forum Sept 27,2013

Generative models for graphs Vertices and edges added at each unit of time Rule to determine where to place edges Uniform probability Preferential attachment- gives rise to power law degree distributions Heidelberg Laureate Forum Sept 27,2013

Vertex degree Number of vertices Preferential attachment gives rise to the power law degree distribution common in many graphs. Heidelberg Laureate Forum Sept 27,2013

Protein interactions 2730 proteins in data base 3602 interactions between proteins Science 1999 July 30; 285: Only 899 proteins in components. Where are the 1851 missing proteins? Heidelberg Laureate Forum Sept 27,2013

Protein interactions 2730 proteins in data base 3602 interactions between proteins Science 1999 July 30; 285: Heidelberg Laureate Forum Sept 27,2013

Science Base What do we mean by science base? Example: High dimensions Heidelberg Laureate Forum Sept 27,2013

High dimension is fundamentally different from 2 or 3 dimensional space Heidelberg Laureate Forum Sept 27,2013

High dimensional data is inherently unstable. Given n random points in d-dimensional space, essentially all n 2 distances are equal. Heidelberg Laureate Forum Sept 27,2013

High Dimensions Intuition from two and three dimensions is not valid for high dimensions. Volume of cube is one in all dimensions. Volume of sphere goes to zero. Heidelberg Laureate Forum Sept 27,2013

Gaussian distribution Probability mass concentrated between dotted lines Heidelberg Laureate Forum Sept 27,2013

Gaussian in high dimensions Heidelberg Laureate Forum Sept 27,2013

Two Gaussians Heidelberg Laureate Forum Sept 27,2013

Distance between two random points from same Gaussian Points on thin annulus of radius Approximate by a sphere of radius Average distance between two points is (Place one point at N. Pole, the other point at random. Almost surely, the second point will be near the equator.) Heidelberg Laureate Forum Sept 27,2013

Expected distance between points from two Gaussians separated by δ Heidelberg Laureate Forum Sept 27,2013

Can separate points from two Gaussians if Heidelberg Laureate Forum Sept 27,2013

Dimension reduction Project points onto subspace containing centers of Gaussians. Reduce dimension from d to k, the number of Gaussians Heidelberg Laureate Forum Sept 27,2013

Centers retain separation Average distance between points reduced by Heidelberg Laureate Forum Sept 27,2013

Can separate Gaussians provided > some constant involving k and γ independent of the dimension Heidelberg Laureate Forum Sept 27,2013

We have just seen what a science base for high dimensional data might look like. For what other areas do we need a science base? Heidelberg Laureate Forum Sept 27,2013

Ranking is important Restaurants, movies, books, web pages Multi-billion dollar industry Collaborative filtering When a customer buys a product, what else is he or she likely to buy? Dimension reduction Extracting information from large data sources Social networks Heidelberg Laureate Forum Sept 27,2013

This is an exciting time for computer science. There is a wealth of data in digital format, information from sensors, and social networks to explore. It is important to develop the science base to support these activities. Heidelberg Laureate Forum Sept 27,2013

Remember that institutions, nations, and individuals who position themselves for the future will benefit immensely. Thank You! Heidelberg Laureate Forum Sept 27,2013