Building a Science Base for the Information Age John Hopcroft Cornell University Ithaca, NY Xiamen University.

Slides:



Advertisements
Similar presentations
Future directions in computer science research John Hopcroft Department of Computer Science Cornell University Heidelberg Laureate Forum Sept 27, 2013.
Advertisements

The internet. Background Created in 1969, connected computers at UCLA, Stanford Research Institute, U. of Utah, and UC at Santa Barbara With an estimated.
Chapter 5: Introduction to Information Retrieval
INFO624 - Week 2 Models of Information Retrieval Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.
Modern information retrieval Modelling. Introduction IR systems usually adopt index terms to process queries IR systems usually adopt index terms to process.
Future directions in computer science research 23rd International Symposium on Algorithms and Computation John Hopcroft Cornell University ISAAC.
Future directions in computer science research John Hopcroft Department of Computer Science Cornell University CINVESTAV-IPN Dec 2,2013.
An introduction to social networking Marga Navarrete 1 st June 2007.
Information Retrieval in Practice
Communities in Heterogeneous Networks Chapter 4 1 Chapter 4, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool,
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Chapter 9 Graph algorithms. Sample Graph Problems Path problems. Connectedness problems. Spanning tree problems.
Placement of Integration Points in Multi-hop Community Networks Ranveer Chandra (Cornell University) Lili Qiu, Kamal Jain and Mohammad Mahdian (Microsoft.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
CSE 222 Systems Programming Graph Theory Basics Dr. Jim Holten.
SING* and ToNC * Scientific Foundations for Internet’s Next Generation Sirin Tekinay Program Director Theoretical Foundations Communication Research National.
Future Directions in Computer Science John Hopcroft Cornell University Ithaca, New York.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Computer Vision I Instructor: Prof. Ko Nishino. Today How do we recognize objects in images?
Triangulation of network metaphors The Royal Netherlands Academy of Arts and Sciences Iina Hellsten & Andrea Scharnhorst Networked Research and Digital.
Information Retrieval
What’s The Difference??  Subject Directory  Search Engine  Deep Web Search.
Databases & Data Warehouses Chapter 3 Database Processing.
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα
Discovery Education Streaming Overview. Log in Screen.
© Ramesh Jain Ramesh Jain CTO, PRAJA inc. and Professor Emeritus, UCSD Emergent Semantics and Experiential Computing.
CAS May 24, 2010 Creating a science base to support new directions in computer science John Hopcroft Cornell University Ithaca, New York.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.
MapReduce and Graph Data Chapter 5 Based on slides from Jimmy Lin’s lecture slides ( (licensed.
Distributed Computing Rik Sarkar. Distributed Computing Old style: Use a computer for computation.
04/30/13 Last class: summary, goggles, ices Discrete Structures (CS 173) Derek Hoiem, University of Illinois 1 Image: wordpress.com/2011/11/22/lig.
“Graph theory” for the master degree program “Geographic Information Systems” Yulia Burkatovskaya Department of Computer Engineering Associate professor.
CHAPTER 1 THE READ/WRITE WEB Marquita Friend Resa Garvin October 17, 2012 EDUC 303.
Autumn Web Information retrieval (Web IR) Handout #1:Web characteristics Ali Mohammad Zareh Bidoki ECE Department, Yazd University
INTERNET. Objectives Explain the origin of the Internet and describe how the Internet works. Explain the difference between the World Wide Web and the.
Ch 14. Link Analysis Padmini Srinivasan Computer Science Department
Marina Drosou, Evaggelia Pitoura Computer Science Department
Digital Libraries Lillian N. Cassel Spring A digital library An informal definition of a digital library is a managed collection of information,
The Structure of the Web. Getting to knowing the Web How big is the web and how do you measure it? How many people use the web? How many use search engines?
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 30 Nov 11, 2005 Nanjing University of Science & Technology.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
V. Clustering 인공지능 연구실 이승희 Text: Text mining Page:82-93.
Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.
L&I SCI 110: Information science and information theory Instructor: Xiangming(Simon) Mu Sept. 9, 2004.
Information Retrieval and Web Search Link analysis Instructor: Rada Mihalcea (Note: This slide set was adapted from an IR course taught by Prof. Chris.
About Me Swaroop Butala  MSCS – graduating in Dec 09  Specialization: Systems and Databases  Interests:  Learning new technologies  Application of.
DM GROUP MEETING PRESENTATION PLAN Eigenvector-based Centrality Measures For Temporal Networks by D Taylor et.al. Uncovering the Small Community.
Progress in New Computer Science Research Directions John Hopcroft Cornell University Ithaca, New York CAS May 21, 2010.
“Pajek”: Large Network Analysis. 2 Agenda Introduction Network Definitions Network Data Files Network Analysis 2.
Graphs, Vectors, and Matrices Daniel A. Spielman Yale University AMS Josiah Willard Gibbs Lecture January 6, 2016.
CS798: Information Retrieval Charlie Clarke Information retrieval is concerned with representing, searching, and manipulating.
1 CS 430: Information Discovery Lecture 8 Collection-Level Metadata Vector Methods.
Presented by: Siddhant Kulkarni Spring Authors: Publication:  ICDE 2015 Type:  Research Paper 2.
Image Retrieval and Ranking using L.S.I and Cross View Learning Sumit Kumar Vivek Gupta
Crawling When the Google visit your website for the purpose of tracking, Google does this with help of machine, known as web crawler, spider, Google bot,
Data mining in web applications
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
INF 103 MART Successful Learning/inf103mart.com
Rocky K. C. Chang September 4, 2017
Dynamical Statistical Shape Priors for Level Set Based Tracking
INF 103 Education for Service-- tutorialrank.com
Information Retrieval on the World Wide Web
ISI Web of Knowledge Early updates
Rocky K. C. Chang September 3, 2018
Graph and Link Mining.
Unsupervised Learning and Clustering
Presentation transcript:

Building a Science Base for the Information Age John Hopcroft Cornell University Ithaca, NY Xiamen University

Time of change The information age is a revolution that is changing all aspects of our lives. Those individuals, institutions, and nations who recognize this change and position themselves for the future will benefit enormously. Xiamen University

Computer Science is changing Early years Programming languages Compilers Operating systems Algorithms Data bases Emphasis on making computers useful Xiamen University

Computer Science is changing The future years Tracking the flow of ideas in scientific literature Tracking evolution of communities in social networks Extracting information from unstructured data sources Processing massive data sets and streams Extracting signals from noise Dealing with high dimensional data and dimension reduction Xiamen University

Computer Science is changing Merging of computing and communication The wealth of data available in digital form Networked devices and sensors Drivers of change Xiamen University

Implications for TCS Need to develop theory to support the new directions Update computer science education Xiamen University

A short view of the future Xiamen University

Digitization of medical records Doctor – needs my entire medical record Insurance company – needs my last doctor visit, not my entire medical record Researcher – needs statistical information but no identifiable individual information Relevant research – zero knowledge proofs, differential privacy Xiamen University

Zero knowledge proof Graph 3-colorability Problem is NP-hard - No polynomial time algorithm unless P=NP Xiamen University

Zero knowledge proof Xiamen University

Digitization of medical records is not the only system Car and road – gps – privacy Supply chains Transportation systems Xiamen University

NEXRAD Radar Binghamton, Base Reflectivity 0.50 Degree Elevation Range 124 NMI — Map of All US Radar Sites Map of All US Radar Sites Animate Map Storm TracksTotal PrecipitationShow SevereRegional RadarZoom Map Click:Zoom InZoom OutPan Map ( Full Zoom Out ) Full Zoom Out »AdvancedRadarTypesCLICK»»AdvancedRadarTypesCLICK»

Xiamen University

When will my bus arrive? Xiamen University

IN 4 TO 5 MINUTES Xiamen University

Tracking the flow of ideas in scientific literature Yookyung Jo Page rank Web Link Graph Retrieval Query Search Text Web Page Search Rank Web Chord Usage Index Probabilistic Text File Retrieve Text Index Discourse Word Centering Anaphora Xiamen University

In the past, sociologists could study group of a few thousand individuals. Today with social networks we can study interaction among millions of individuals. One important activity is how communities form and evolve. Xiamen University

Early work  Min cut – two equal size communities  Conductance – minimizes cross edges Future work  Consider communities with more external edges than internal edges  Find small communities  Track communities over time  Develop appropriate definitions for communities  Understand the structure of different types of social networks Xiamen University

Our view of a community TCS Me Colleagues at Cornell Classmates Family and friends More connections outside than inside Xiamen University

On going research on finding communities Xiamen University

Spectral clustering with K-means. Xiamen University

Spectral clustering with K-means. Xiamen University

Spectral clustering with K-means. Xiamen University

Instead of two overlapping clusters, we find three clusters. Xiamen University

Instead of clustering the rows of the singular vectors, find the minimum 0- norm vector in the space spanned by the singular vectors. The minimum 0-norm vector is, of course, the all zero vector, so we will require one component to be 1. Xiamen University

Finding the minimum 0-norm vector is NP-hard. Use the minimum 1-norm vector as a proxy. Xiamen University

Minimum 1-norm vector is not an indicator vector. By thresh-holding the components, convert it to an indicator vector for the community. Xiamen University

Random walk How long? What dimension? Xiamen University

Krylov subspace Find orthonormal basis. Update subspace in each step rather than just the probability vector Xiamen University

Find minimum 1-norm vector in Krylov subspace. Actually allow vector to be close to subspace. Xiamen University

Structure of communities How many communities is a person in? Small, medium, large How many seed points are needed to uniquely specify a community a person is in? Which seeds are good seeds? Etc. Xiamen University

What types of communities are there? How do communities evolve over time? Xiamen University

This is an exciting time for computer science. There is a wealth of data in digital format, information from sensors, and social networks to explore. It is important to develop the science base to support these activities. Xiamen University