Information Infrastructure for the Social Sciences in the 21st Century

Slides:



Advertisements
Similar presentations
Distributed Data Processing
Advertisements

National Computational Science Alliance “The Coming of the Grid” Building a Computational Grid Workshop Argonne National Laboratory September 8-10,1997.
SDSC Computing the 21st Century Talk Given to the NSF Sugar Panel May 27, 1998.
Lecture 13 Information and History. Objectives Revolution or Paradigms of Information Systems Development of Information Systems in historical context.
National Computational Science Alliance NCSA is the Leading Edge Site for the National Computational Science Alliance
Computer Science Prof. Bill Pugh Dept. of Computer Science.
Data Mining – Intro.
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
UNIVERSITY of MARYLAND GLOBAL LAND COVER FACILITY High Performance Computing in Support of Geospatial Information Discovery and Mining Joseph JaJa Institute.
National Computational Science Alliance Coupling the Leading Edge Site with the Alliance Partners Talk given to First Annual ITEA Workshop on High Performance.
Data Management Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
National Computational Science Alliance Introducing the National Computational Science Alliance Panel Presentation to Supercomputing ‘97 in San Jose November.
Open Science Grid For CI-Days Internet2: Fall Member Meeting, 2007 John McGee – OSG Engagement Manager Renaissance Computing Institute.
National Computational Science Alliance Coupling the Leading Edge Site with the Alliance Partners Talk given to Alliance ‘98 at the University of Illinois.
National Computational Science Alliance Supercomputing: Directions in Technology, Architecture and Applications Keynote Talk to Supercomputer’98 in Mannheim,
National Computational Science Alliance University of Illinois at Urbana-Champaign Keeping Illinois in a Leadership Position Presentation to Secretary.
Exploring the Applicability of Scientific Data Management Tools and Techniques on the Records Management Requirements for the National Archives and Records.
Chapter 1 Introduction to Data Mining
GumTree Feature Overview Tony Lam Data Acquisition Team Bragg Institute eScience Workshop 2006.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
National Computational Science Alliance Expanding Participation in Computing and Communications -- the NSF Partnerships for Advanced Computational Infrastructure.
National Computational Science Alliance Bringing the Grid to Chemical Engineering Opening Talk at the 1998 Foundations of Computer Aided Process Operations.
What is Cyberinfrastructure? Russ Hobby, Internet2 Clemson University CI Days 20 May 2008.
National Computational Science Alliance Knowledge Management and Corporate Intranets Talk to visiting team from Fisher Scientific January 13, 1998.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
National Computational Science Alliance Tele-Immersion - The Killer Application for High Performance Networks Panel Talk at a Vanguard Meeting in San Francisco,
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
DOE 2000, March 8, 1999 The IT 2 Initiative and NSF Stephen Elbert program director NSF/CISE/ACIR/PACI.
National Computational Science Alliance Increasing Competitiveness Through the Utilization of Emerging Technologies Leader to Leader Speaker Series, Allstate.
National Computational Science Alliance From Supercomputing to the Grid Invited Talk at SGI Booth, Supercomputing ‘98 Orlando, Florida, November 10,1998.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
1 CS 430: Information Discovery Lecture 25 Cluster Analysis 2 Thesaurus Construction.
Indexing Mathematical Abstracts by Metadata and Ontology IMA Workshop, April 26-27, 2004 Su-Shing Chen, University of Florida
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
CODE (Committee on Digital Environment) July 26, 2000 Rice University THE NET OF THE 21st CENTURY: Concepts across the Interspace Bruce Schatz CANIS Laboratory.
Graduate School of Informatics Kyoto University, November 21, 2001 Technologies of the Interspace Peer-Peer Semantic Indexing Bruce Schatz CANIS Laboratory.
National Computational Science Alliance Visualization and GIS at NCSA (Polly Baker, Group
Revolutionary System Models, The Net, & The Public Interest The Interspace Prototype ( ) Digital Libraries Initiative ( ) Worm Community.
Revolution & Kids: Building the Future of the Net & Understanding the Structures of the World Bruce R. Schatz CANIS - Community Systems Laboratory University.
National Computational Science Alliance Overview of the Alliance Kickoff Course in Alliance Streaming Video Series January 22, 1998.
National Computational Science Alliance The Emerging National Technology Grid-Coupling Supercomputers, Networks, Virtual Reality to the Researchers Desktops.
National Computational Science Alliance A Review of User Projects at the Alliance Leading Edge Site Opening Talk to the Alliance Allocation Board Hosted.
Cyberinfrastructure Overview Russ Hobby, Internet2 ECSU CI Days 4 January 2008.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
National Computational Science Alliance Bringing Science to the Grid Keynote Talk at the High Performance Distributed Computing Conference in Chicago,
Internet2. Yesterday’s Internet  Thousands of users  Remote login, file transfer  Applications capitalize on underlying technology.
National Computational Science Alliance Industrial Supercomputing Opening Talk to NCSA Strategic Industrial Partners Program Advisory Committee NCSA, University.
Data Mining - Introduction Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Dr.S.Sridhar,Ph.D., RACI(Paris),RZFM(Germany),RMR(USA),RIEEEProc.
VisIt Project Overview
Data Mining – Intro.
Clouds , Grids and Clusters
Joslynn Lee – Data Science Educator
Access Grid and USAID November 14, 2007
Design and Manufacturing in a Distributed Computer Environment
به نام خدا Big Data and a New Look at Communication Networks Babak Khalaj Sharif University of Technology Department of Electrical Engineering.
DOE 2000 PI Retreat Breakout C-1
Chapter 11-Business and Technology
Data Warehousing and Data Mining
Polly Baker Division Director: Data, Mining, and Visualization
Unit# 5: Internet and Worldwide Web
AGMLAB Information Technologies
Defining the Grid Fabrizio Gagliardi EMEA Director Technical Computing
Big DATA.
Welcome! Knowledge Discovery and Data Mining
Mark Quirk Head of Technology Developer & Platform Group
Presentation transcript:

Information Infrastructure for the Social Sciences in the 21st Century A Talk in the Hubert M. Blalock, Jr. Memorial Lecture Series on Advanced Topics in Social Research at University of Michigan July 13, 1998

Emerging Computational Trends and the Quantitative Social Sciences Grid Technologies Document and Data Management Information Visualization Web Computing Scalable Computing

Integrate Current Data Sets Behavioral and Social Sciences in the 21st Century Philip Smith and Barbara Torrey Integrate Current Data Sets Improve the Coverage of Longitudinal Studies Experiment with Nonlinear Dynamic Systems Develop Comparable International Research Integrate Quantitative and Qualitative Research to Advance New Theory Science Feb. 2, 1996

The Emerging Concept of a National Scale Information Power Grid http://science.nas.nasa.gov/Groups/Tools/IPG

The Grid Links People with Distributed Resources on a National Scale http://science.nas.nasa.gov/Groups/Tools/IPG

The National Center for Supercomputing Applications Is a Federal / State / University / Industry Funded Center Budget $50 Million/Year 500 Work at NCSA Is a Unit of the University of Illinois at Urbana-Champaign Has a Mission of Providing Access to Leading Edge Information Technologies to Universities and Industry Had Major Influence on the Creation of: The Internet The Web Scientific Visualization Computational Science, Engineering, and Knowledge Management

NCSA is the Leading Edge Site for the National Computational Science Alliance Alliance National Technology Grid www.ncsa.uiuc.edu

The Alliance Team Structure to Prototype the 21st Century Information Infrastructure Leading Edge Center Enabling Technology Parallel Computing Distributed Computing Data and Collab. Computing Partners for Advanced Computational Services Communities Training Technology Deployment Comp. Resources & Services Strategic Industrial and Technology Partners Application Technologies Cosmology Environmental Hydrology Chemical Engineering Nanomaterials Bioinformatics Scientific Instruments EOT Education Evaluation Universal Access Government

NSF vBNS and PACI - Mutually Interdependent NCSA Alliance NPACI Both NCSA Alliance and NPACI Other High Performance Connection sites Current vBNS “Backbone” sites

FY99 Qwest Nationwide Network - Backbone for Internet2 Abilene - More Links Qwest Partnering with Cisco and Nortel http://www.qwest.net/network/Mainmaps.html Source: Randy Butler, NCSA

Alliance National Technology Grid Workshop and Training Facilities Being Deployed Across the Alliance Jason Leigh and Tom DeFanti, EVL; Rick Stevens, ANL

Integrating Digital Video With the Grid Interactive Virtual Environments Application Teams Desktop Video Conferencing Internet, vBNS Individual Desktops Digital Video Server Create Digital Video Animation Concurrently with Supercomputing

Alliance Emerging Technologies Course on Streaming Video NCSA has 20 courses Alliance Goal of 100 by end of 1998 Alliance’98 Talks Were Webcast and Archived http://www.ncsa.uiuc.edu/edu/course98/lecturers/week/

High Performance Geographic Information Systems HPGIS (NCSA) Large Datasets Spatially or Temporally Use of CAVE to Render GIS Objects Parallel Computing and I/O Collaborative Interactive Investigations Drivers NSF PACI-Environmental Hydrology Digital Government (Federal Application Council) Digital Earth (Gore) NASA / Mission to Planet Earth DOE Strategic Simulation Program-Global Change Source: Doug Johnston, NCSA, UIUC

The Killer Application for the Grid - Collaborative Tele-Immersion CAVE ImmersaDesk Different Physical Implementations of the Alliance CAVE Software Libraries Image courtesy: Electronic Visualization Laboratory, UIUC

Goal-Analyze and Record Complex Data sets Using Interactive Virtual Environments Cave5d Enables Interactive Visualizations of Time-Varying, 3-Dimensional Vis5d Data Sets in CAVE Environments Donna Cox, Robert Patterson, Stuart Levy, NCSAVirtual Director Team Glenn Wheless, Cathy Lascara, Old Dominion Univ.

Avatars Show Head & Hand Pointing in Shared Virtual Space Donna Cox, Robert Patterson, Stuart Levy, NCSAVirtual Director Team

Goal-Create Shared Virtual Environment CVD -- Collaborative Virtual Director Desktop CAVE ImmersaDesk Power Wall Donna Cox, Robert Patterson, Stuart Levy, NCSAVirtual Director Team Glenn Wheless, Old Dominion Univ.

Goal-Linking the CAVE to the Desktop: Collaborative Java3D Java 3D API HPC Application: VisAD Environ. Hydrology Team, (Bill Hibbard, Wisconsin) Steve Pietrowicz, NCSA Java Team Standalone or CAVE-to-Laptop-Collaborative NASA IPG is Adding Funding To Collaborative Java3D

Coupling Data Formats to Visualization - NCSA’s Hierarchical Data Format HDF & Project Horizon Internet Access to Earth and Space Science Data Science Data Browser (SDB) To Provide Data Service for HDF & Other Formats Java-based Viewers Java-based HDF Browser Standalone and Collaborative (Habanero™) Versions General-purpose Image Viewer HDF & ASCI The Data Models and Formats (DMF) Group HDF As the Open Standard Exchange Format and I/O Library ASCI HDF Requirements Must Support Large (> a Terabyte) Datasets Must Handle ASCI Data Types, Especially Meshes Must Perform Well in Massive Parallel Environments Store Unstructured Data for Efficient Visualization http://hdf.ncsa.uiuc.edu/

Vision of the Java/Collaborative Future “Everybody Benefits” From HPC Science High-End Environments Others Researcher Workstations Office & Home Computers Win-Tel Mac Linux Others Java / Habanero® Object Sharing Web CORBA Java RMI GLOBUS Highly-Variable Available Internet Bandwidth Source: Larry Jackson, NCSA

Alliance Distance Education - Using JAVA Plug-ins to Web Browsers Source: Geoffrey Fox, NPAC/Syracuse; DoD Army CEWES

parameters in solution Goal-Create Collaborative Interface to Link Multiple Investigators With the Grid Status of Simulation Interactive Discussion Detailed Visualization Current parameters in solution Reactor Simulation Ken Bishop, U Kansas Using NCSA Habanero

The Grid Links Remote Sensors With Supercomputers, Controls, & Digital Archives Starburst Galaxy M82 Alliance Scientific Instrument Team Radio Astronomy and Biomedicine Collaborative Web Interface Real Time Control and Steering

The Third Wave of Net Evolution ARPANET Internet Interspace FUNCTION Access Organization Analysis 1995 2010 SERVICES Distributed Files Global Hypermedia Distributed Objects Global Semantics Distributed Paths 1975 2000 UNITS Packets Files Links Objects Concepts Categories 1985 1965 PROTOCOLS IP FTP HTTP CORBA CP SMP Bruce Schatz (www.canis.uiuc.edu/interspace/ThirdWave.html)

NCSA / UIUC Digital Library Initiative: Towards Scalable Semantic Retrieval Bruce Schatz, UIUC and Hsinchun Chen, U Arizona Automatic Indexing of Concepts Find Context of Phrases within Documents Concept Space Based on Term Frequency Useful for Interactive Searching Given a Term, Can Suggest Other Terms Concept Spaces Supports Vocabulary Switching Concept Spaces Require Supercomputing Inspec Space (400K abstracts) 1 day on 16-node SGI Challenge 575 Spaces for Compendex (4M abstracts) 3 days on 48-node HP Convex Exemplar Science: June 7, 1996 and January 17, 1997

Visualizing Relationships Between Documents- 6500 News Stories from the WWW in 1997 SPIRIX software ThemeScapes www.thememedia.com

Visualizing Relationships Between Documents - Need Extension to Millions of Web Documents SPIRIX software Galaxies www.thememedia.com

NCSA Knowledge Management Workspaces Object and Relational Databases Distributed Object Technology Simulation Engine Optimization Collections Agents CORBA / ActiveX / RMI Scripting JavaBeans / Enterprise Objects Java Data Warehouses Optimization Tools Knowledge Discovery and Visualization Analysis CAVE Devices SGI Mineset Collaborations (Habanero, Tango) VRML/Java3D Browser AVS, VDI Automated Discovery Application Specific Browser

Knowledge Discovery Process Logical DB Selected Data Preprocessed Data Transformed Data Extracted Information Mine Transform Preprocess Select Analyze and Assimilate Feedback Assimilated Knowledge Michael Welge, Tilt Thompkins, NCSA

Automated Discovery and Learning - NCSA Techniques Automated Discovery Tools Creation of Predication and Classification Models Link Analysis Deviation Detection Database Segmentation Automated Learning Research Topics Automatic Text Document Classification Knowledge Source Integration Parallel Algorithms for Induction Interactive Self-organizing Maps

Automated Discovery By Machine Learning Creation of Prediction & Classification Models Past Data Predicts Future Response Typical Technique: Supervised Learning Neural Nets Decision Trees Naïve Bayesian Link Analysis Discover Relations Between Records in Datasets Association Sequential Pattern Similar Time Sequence Typical Techniques: Genetic Algorithms

Automated Discovery By Machine Learning Database Segmentation Regroup Information Sets Neural Clustering Similar Characteristics, eg.Demographic Clustering Typical Technique: Unsupervised Learning SOM (Self-organizing Maps) K-Means Deviation Detection Identify Outliers in a Data Sample Visualization Typical Techniques: Stochastic Model Analysis Probability Distribution Contrasts Statistical Model Determination

Data Mining - NCSA Industrial Partner Projects Caterpillar Effluent Quality Control Smart Selling Warranty Claims Analysis Customer Value Analysis Ford Product Compatibility Harshness, Noise, Vibration Marketing Sears Transaction Management Boeing Post-Flight Diagnostics Allstate Medical Claims Financial Impact May Be Greater Than $30 Million

NCSA Information Visualization Laboratory Databases In3D™ for C++ and Java VizIt/In3D™ Immersa Desk™ Graphics Workstations MineSet S-PLUS Cave™ Flat Panel Wall

Information Visualization - Network Traffic Robert Patterson, Donna Cox, NCSA

Sears Pioneers Massive Data Mining and Information Visualization at NCSA 1998 VLDB Survey Program Grand Prize Winner Largest Database 4.7 Terabytes of Data 10 Terabyte Total Disk Space Capacity Storage Provided by EMC Image Courtesy of Michael Welge, NCSA and Sears

Information Visualization - Insurance Process Cost Drivers Automated Discovery Using SGI MineSet Allstate Insurance, NCSA

Workbench Server User Web Browser The NCSA Information Workbench - An Architecture for Web-Based Computing User Web Browser Output to User User Input Format Translator, Query Engine and Program Driver Workbench Server Results to User User Instructions and queries Application Programs (May have varying interfaces and be written in different languages) Results Instructions Information Sources (May be of varying formats) Queries NCSA Computational Biology Group

The NCSA Biology Workbench - Web Computing with Distributed Datasets Powered by SGI Origin Supercomputer http://biology.ncsa.uiuc.edu/

Toward a Social Sciences Workbench Potential New Project with Alliance Partner with ICPSR? Web Interface to Social Science: Programs Data

The Continuing Exponential Agent of Change 1985 Cray X-MP Cost: $8,000,000 60,000 watts of power No Built in Graphics 56 kbps NSFnet Backbone 1997 Nintendo 64 Cost: $149 5 watts of power Interactive 3D Graphics 64 kbps ISDN to Home

Growth Rate of the NSF Supercomputer Capacity is 70% Compounded Per Year! 10,000 100,000 1,000,000 10,000,000 100,000,000 1,000,000,000 1986 1988 1990 1992 1994 1996 1998 2000 2002 Fiscal Year Normalized CPU Hours Total NU 70% Annual Growth This Year 1000 x 1985 Source: Quantum Research; Lex Lane, NCSA

TOP500 Systems by Vendor - A Market Revolution Other Japanese Other DEC 400 Intel Japanese TMC Sun DEC Intel 300 HP TMC IBM Number of Systems Sun Convex HP 200 Convex SGI IBM SGI 100 CRI CRI Jun-93 Nov-93 Jun-94 Nov-94 Jun-95 Nov-95 Jun-96 Nov-96 Jun-97 Nov-97 Jun-98 TOP500 Reports: http://www.netlib.org/benchmark/top500.html

NCSA is Combining Shared Memory Programming with Massive Parallelism Doubling Every Nine Months! SN1 Origin Power Challenge Challenge

Proposed NCSA Silicon Graphics Cray Origin Array - 1024 Processors 6x128 3x64 2x32 Subject to NSF Approval of Funds

JP Morgan Hero Calculation HPC Strategic Business Analysis Calculations Used 128-Processor SGI Origin Two Week Period in January 1998 NCSA and SGI Doubled Memory in a Week Extended JPM's Risk Management Capabilities Hundreds of Market Scenarios Simulated NCSA, Strategic Vendor, Industrial Partner Existing Relationships Facilitated Quick Startup Win-Win-Win Result Andrew Abrahams, Jeff Saltz, JP Morgan

Challenge-How to Increase the Number of Social Scientists Using High Performance Computing? NSF Supercomputer Centers in FY97 Consider All 900 Projects Using More Than 10 CPU-Hours 7 out of 900 Projects Were Social Science Social Science Project Areas Testing Time Series Dynamic Optimization Large Scale GIS Economics Competitiveness Models and Strategies Economic Behaviour Capital Structures Stock Market Models

Computing on the University of Wisconsin Condor Pool Condor Cycles CondorView, Courtesy of Miron Livny, Todd Tannenbaum(UWisc)

NT Workstation Shipments Rapidly Surpassing UNIX Source: IDC, Wall Street Journal, 3/6/98

The University of Illinois NT Supercluster - 256 Intel Pentium II Processors “Supercomputer performance at mail-order prices”-- Jim Gray, Microsoft Andrew Chien, Computer Science UIUC Rob Pennington, NCSA 192 Hewlett Packard 300 MHz 64 Compaq 333 MHz

NCSA Symbio - A Distributed Object Framework Bringing Scalable Computing to NT Desktops Parallel Computing on NT Clusters Briand Sanderson, NCSA, Microsoft Microsoft Co-Funds Development Features Based on Microsoft DCOM Batch or Interactive Modes Application Development Wizards Current Status & Future Plans Symbio Developer Preview 2 Released Princeton University Testbed http://access.ncsa.uiuc.edu/Features/Symbio/Symbio.html

NSF / NCSA Federal Consortium Member Agencies: Bureau of Census Central Intelligence Agency Defense Technical Information Center Rural Development, Department of Agriculture Department of Education Department of Housing and Urban Development National Biological Service National Institutes of Health National Oceanic and Atmospheric Administration NASA National Science Foundation National Security Agency Nuclear Regulatory Commission Funding IT Development Security Universal Access Distance Learning Intranet Technology Staff Training Electronic Meeting Spaces http://skydive.ncsa.uiuc.edu/

How to Find Out More About the Alliance See also http://alliance.ncsa.uiuc.edu