Science & Technology Centers Program Center for Science of Information Bryn Mawr Howard MIT Princeton Purdue Stanford Texas A&M UC Berkeley UC San Diego.

Slides:



Advertisements
Similar presentations
Recommender System A Brief Survey.
Advertisements

Applications of one-class classification
From rules to mechanisms: the emergence of animal territoriality Institute for Advanced Studies workshop Complexity and the Real World University of Bristol,
Report on an OpenSpace Technology Workshop on the Future of Earthquake Engineering.
Network biology Wang Jie Shanghai Institutes of Biological Sciences.
INTRODUCTION TO MODELING
Science & Technology Centers Program National Science Foundation Science & Technology Centers Program Bryn Mawr Howard MIT Princeton Purdue Stanford UC.
Science & Technology Centers Program National Science Foundation Science & Technology Centers Program Bryn Mawr Howard MIT Princeton Purdue Stanford UC.
Chapter 1 Introduction to Modeling DECISION MODELING WITH MICROSOFT EXCEL Copyright 2001 Prentice Hall.
Yoshiharu Ishikawa (Nagoya University) Yoji Machida (University of Tsukuba) Hiroyuki Kitagawa (University of Tsukuba) A Dynamic Mobility Histogram Construction.
FIN 685: Risk Management Topic 5: Simulation Larry Schrenk, Instructor.
7/18/05 Interdisciplinary Modeling for Aquatic Ecosystems Curriculum Development Workshop Water Quality Modeling John J. Warwick, Director Division of.
Advanced Topics COMP163: Database Management Systems University of the Pacific December 9, 2008.
Multi-Scale Analysis for Network Traffic Prediction and Anomaly Detection Ling Huang Joint work with Anthony Joseph and Nina Taft January, 2005.
Remote Mapping of River Channel Morphology March 9, 2003 Carl J. Legleiter Geography Department University of California Santa Barbara.
SING* and ToNC * Scientific Foundations for Internet’s Next Generation Sirin Tekinay Program Director Theoretical Foundations Communication Research National.
Improving the Accuracy of Continuous Aggregates & Mining Queries Under Load Shedding Yan-Nei Law* and Carlo Zaniolo Computer Science Dept. UCLA * Bioinformatics.
CS211/Fall /06 Outline for This Lecture Application of e2e over wireless Application Level Framing Integrated Layer Processing Course Project Introduction.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Sensor Networks Storage Sanket Totala Sudarshan Jagannathan.
ASCR Scientific Data Management Analysis & Visualization PI Meeting Exploration of Exascale In Situ Visualization and Analysis Approaches LANL: James Ahrens,
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
1 Building National Cyberinfrastructure Alan Blatecky Office of Cyberinfrastructure EPSCoR Meeting May 21,
Intrusion and Anomaly Detection in Network Traffic Streams: Checking and Machine Learning Approaches ONR MURI area: High Confidence Real-Time Misuse and.
Attention and Event Detection Identifying, attributing and describing spatial bursts Early online identification of attention items in social media Louis.
National Science Foundation Science & Technology Centers Program Bryn Mawr Howard MIT Princeton Purdue Stanford UC Berkeley UC San Diego UIUC Biology Thrust.
Database and Data-Intensive Systems. Data-Intensive Systems From monolithic architectures to diverse systems Dedicated/specialized systems, column stores.
Sensor Data Management: Challenges and (some) Solutions Amol Deshpande, University of Maryland.
Tennessee Technological University1 The Scientific Importance of Big Data Xia Li Tennessee Technological University.
Science & Technology Centers Program National Science Foundation Science & Technology Centers Program Bryn Mawr Howard MIT Princeton Purdue Stanford UC.
Community Detection by Modularity Optimization Jooyoung Lee
Modeling Complex Interactions of Overlapping River and Road Networks in a Changing Landscape Programmatic overview Hypothesis Preliminary findings.
An Integration Framework for Sensor Networks and Data Stream Management Systems.
IE 585 Introduction to Neural Networks. 2 Modeling Continuum Unarticulated Wisdom Articulated Qualitative Models Theoretic (First Principles) Models Empirical.
Science & Technology Centers Program Center for Science of Information National Science Foundation Science & Technology Centers Program Bryn Mawr Howard.
National Science Foundation Science & Technology Centers Program Bryn Mawr Howard MIT Princeton Purdue Stanford UC Berkeley UC San Diego UIUC Biology Thrust.
David S. Ebert David S. Ebert Visual Analytics to Enable Discovery and Decision Making: Potential, Challenges, and.
2 2  Background  Vision in Human Brain  Efficient Coding Theory  Motivation  Natural Pictures  Methodology  Statistical Characteristics  Models.
RESOURCES, TRADE-OFFS, AND LIMITATIONS Group 5 8/27/2014.
Science & Technology Centers Program National Science Foundation Science & Technology Centers Program Bryn Mawr Howard MIT Princeton Purdue Stanford UC.
Science & Technology Centers Program Center for Science of Information Bryn Mawr Howard MIT Princeton Purdue Stanford Texas A&M UC Berkeley UC San Diego.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Visual Analytics Research and Education Kwan-Liu Ma Department of Computer Science University of California, Davis.
Group-oriented Modelling Tools with Heterogeneous Semantics Niels Pinkwart COLLIDE Research Group University of Duisburg, Germany.
Science & Technology Centers Program Center for Science of Information Bryn Mawr Howard MIT Princeton Purdue Stanford Texas A&M UC Berkeley UC San Diego.
Performance evaluation on grid Zsolt Németh MTA SZTAKI Computer and Automation Research Institute.
Biocomplexity Teacher Workshop May 31 – June 2, 2008 University of Puerto Rico.
Large Scale Data Representation Erik Goodman Daniel Kapellusch Brennen Meland Hyunjae Park Michael Rogers.
DDM Kirk. LSST-VAO discussion: Distributed Data Mining (DDM) Kirk Borne George Mason University March 24, 2011.
Computational Tools for Population Biology Tanya Berger-Wolf, Computer Science, UIC; Daniel Rubenstein, Ecology and Evolutionary Biology, Princeton; Jared.
Co-funded by the European Union WeKnowIt Emerging, Collective Intelligence for personal, organisational and social use Event Detection.
Automatic Discovery and Processing of EEG Cohorts from Clinical Records Mission: Enable comparative research by automatically uncovering clinical knowledge.
DISTIN: Distributed Inference and Optimization in WSNs A Message-Passing Perspective SCOM Team
Shankar Subramaniam University of California at San Diego Data to Biology.
Generating and Using a Qualified Medical Knowledge Graph for Patient Cohort Retrieval from Big Clinical Electroencephalography (EEG) Data Sanda Harabagiu,
Science & Technology Centers Program Center for Science of Information Bryn Mawr Howard MIT Princeton Purdue Stanford Texas A&M UC Berkeley UC San Diego.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
INFERENCE FOR BIG DATA Mike Daniels The University of Texas at Austin Department of Statistics & Data Sciences Department of Integrative Biology.
Science & Technology Centers Program National Science Foundation Science & Technology Centers Program Bryn Mawr Howard MIT Princeton Purdue Stanford UC.
Data Mining - Introduction Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Science & Technology Centers Program Center for Science of Information Bryn Mawr Howard MIT Princeton Purdue Stanford Texas A&M UC Berkeley UC San Diego.
What contribution can automated reasoning make to e-Science?
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Northwestern Lab for Internet and Security Technology (LIST) Yan Chen Department of Computer Science Northwestern University.
University of California at San Diego
Towards Next Generation Panel at SAINT 2002
Data Warehousing and Data Mining
Programmatic overview Hypothesis Preliminary findings
Jana Diesner, PhD Associate Professor, UIUC
Presentation transcript:

Science & Technology Centers Program Center for Science of Information Bryn Mawr Howard MIT Princeton Purdue Stanford Texas A&M UC Berkeley UC San Diego UIUC Center for Science of Information: From Information Reduction to Knowledge Extraction Big Data Workshop Waikiki, March 18-20, National Science Foundation/Science & Technology Centers Program/December 2012

Science & Technology Centers Program Advance science and technology through a new quantitative understanding of the representation, communication and processing of information in biological, physical, social and engineering systems. Some Specific Center’s Goals: define core theoretical principles governing transfer of information, develop metrics and methods for information, apply to problems in physical and social sciences, and engineering, offer a venue for multi-disciplinary long-term collaborations, explore effective ways to educate students, train the next generation of researchers, broaden participation of underrepresented groups, transfer advances in research to education and industry 2 Education & Diversity RESEARCHRESEARCH Knowledge Transfer

Science & Technology Centers Program 3 Research Thrusts: 1. Life Sciences 2. Communication 3. Knowledge Extraction (Big/Small Data) S. Subramaniam A. Grama David Tse T. Weissman S. Kulkarni M. Atallah Create a shared intellectual space, integral to the Center’s activities, providing a collaborative research environment that crosses disciplinary and institutional boundaries.

Science & Technology Centers Program Big Data Characteristics :  Large (peta and exa scale)  Noisy (high rate of false positives and negatives)  Multiscale (interaction at vastly different levels of abstractions)  Dynamic (temporal and spatial changes)  Heterogeneous (high variability over space and time)  Distributed (collected and stored at distributed locations)  Elastic (flexibility to data model and clustering capabilities) Small Data : Limited amount of data, often insufficient to extract information (e.g., classification of twitter messages) 4

Science & Technology Centers Program Some characteristics of analysis techniques :  Results are typically probabilistic (formulations must quantify and optimize statistical significance - deterministic formulations on noisy data are not meaningful).  Overfitting to noisy data is a major problem.  Distribution agnostic formulations (say, based on simple counts and frequencies are not meaningful).  Provide rigorously validated solutions.  Dynamic and heterogeneous datasets require significant formal basis : Ad-hoc solutions do not work at scale! We need new foundation and fundamental results 5

Science & Technology Centers Program 1. Data Representation and Analysis (Big Data) Succinct Data Structures ( compressed structures that lend themselves to operations efficiently ) Metrics: tradeoff storage efficiency and query cost Validations on specific data ( sequences, networks, structures, high dim. sets ) Streaming Data ( correlation and anomaly detection, semantics, association ) Generative Models for data (dynamic model generations) Identifying statistically correlated graph modules ( e.g., given a graph with edges weighted by node correlations and a generative model for graphs, identify the most statistically significantly correlated sun-networks) Information theoretic methods for inference of causality from data and modeling of agents. Complexity of flux models/networks 6

Science & Technology Centers Program 2. Challenges in Life Science Sequence Analysis –Reconstructing sequences for emerging nano-pore sequencers. –Optimal error correction of NGS reads. Genome-wide associations –Rigorous approach to associations while accurately quantifying the prior in data Darwin Channel and Repetitive Channel Semantic and syntactic integration of data from different datasets (with same or different abstractions) Construction of flux models guided measures of information theoretic complexity of models Quantification of query response quality in the presence of noise Rethinking interactions and signals from fMRI Develop in-vivo system to monitor 3D of animal brain ( will allow to see the interaction between different cells in cortex – flow of information, structural relations ) 7

Science & Technology Centers Program 8