Presentation on theme: "1 Telepresence: An Umbrella Research Topic Jim Gray Microsoft Research"— Presentation transcript:
1 Telepresence: An Umbrella Research Topic Jim Gray Microsoft Research
2 NSF: Nerve Center of Science If its not broke, dont fix it. But…. l US Science is the engine of progress BUT….. l Best and brightest are spending increasing time fundraising l Seems excessive to me. l Venture capital community is richer and more generous than NSF
3 Outline (ambitious!) l Microsoft Research (census) l Tele-Presentations (Gordon Bell, Jim Gemmell) l Microsoft Research initiative on Telepresence l What if you could record everything you see & hear? l The architecture revolution: processing moves to transducers
4 Microsoft Research l Founded in 1991 l Goal: pursue strategic technologies for Microsoft l Original research groups: –Natural Language Processing –Operating Systems –Programming Languages l Overall size < 20 at the end of 1992
5 Microsoft Research l 280 Researchers in 25 areas –Operating systems to Statistical Physics l Research lab locations: –Redmond, Cambridge, San Francisco l Internationally recognized research teams –Hundreds of publications, presentations –Leadership roles in professional societies, journals, conferences
6 MS Research Areas l Operating systems, languages, compilers, virtual machines, networking, wireless computing, fault-tolerance, large scale servers, security l Natural language, speech, vision, graphics, decision theory, information retrieval, UI, collaboration, statistics, signal processing l Cryptography, statistical physics and discrete mathematics
7 Growing Fast l Grew 4x from 94 to 97 l Decided in 97 to grow by a 3x in 3 years –200 in FY97 => 600 in FY00, primarily in Redmond l Major impact on MS products –Virtually all MS products shipped today use technology from MS Research l Critical role in MS growth –Pioneering research in software that allows computers to see, hear, speak and understand
8 Microsoft Research Philosophy l University organizational model –Flat structure, critical mass groups l Open research environment –Aggressive publication of research results in literature and on world wide web –Frequent visitors, daily seminars –Over 70 visiting professors and interns in 1997 –Over 110 visiting researchers in 1998
9 Some Key Senior Researchers l Systems –Rick Rashid, Butler Lampson, Gordon Bell –Anoop Gupta, Roger Needham, Chuck Thacker l Databases & Data Mining –David Lomet, Jim Gray, Usama Fayyad l Graphics –Jim Kajiya, Jim Blinn, Alvy Ray Smith, Michael Cohen l Speech & Language –Karen Jensen, George Heidorn, X.D. Huang, Alex Acero, Hsiao-Wuen Hon, Scott Meredith
10 Some Key Senior Researchers l UI Design, Intelligent Systems, IR –George Robertson, Linda Stone, Susan Dumais, David Heckerman, Eric Horvitz, Jack Breese l Computer Vision & Signal Processing –Steve Shafer, Rick Szeliski, P. Anandan, Rico Malvar l Cryptography & Theory –Yacov Yacobi, Jennifer Chayes, Christian Borg, Michael Freedman l Languages & Compilers –Daniel Weise, Chris Fraser, Amitabh Srivastava, Luca Cardelli, David Hanson, Charles Simonyi, Todd Proebsting
11 Microsoft Research l 1997 BusinessWeek Poll of Academia: –Voted #7 lab (overall) in Computer Science –Voted #3 industrial research lab (after Bell Labs and IBM Research) –Voted #2 most desirable lab to work (after Stanford)
12 Outline (ambitious!) l Microsoft Research (census) l Tele-Presentations (Gordon Bell, Jim Gemmell) l Microsoft Research initiative on Telepresence l What if you could record everything you see & hear? l The architecture revolution: processing moves to transducers
13 Gordon Bell on Tele Presentations
14 Motivation: Telepresentations Presenter and/or audience telepresent NOT: meeting or collaboration settings Forget the nasty social issues! Mostly one-way
15 Telepresentation Elements è Slides è Audio l Video l Script, text comments, hyperlinks, etc.
16 Telepresentations: The Essentials l Slide and audio a must l Add some video (low quality) to make us feel good l Storage and transmission costs low
17 Telepresentations: The Killer App l Increased attendance & lower travel costs l Practical and low-cost NOW l e.g. ACM97 - 2,000 visitors in real space, 20,000 visitors on Internet
18 Todays Experiment l Would you like to pause, rewind, browse? l Do you wish you could have seen this –At home? –At another time? l How much does a present speaker add? How much would you pay for real presence?
19 Outline (ambitious!) l Microsoft Research (census) l Tele-Presentations (Gordon Bell, Jim Gemmell) l Microsoft Research initiative on Telepresence l What if you could record everything you see & hear? l The architecture revolution: processing moves to transducers
20 Changing role of computation l Past: Computers for: –computing (Cray) –business data processing (IBM) –document creation (PC) l Future: Computers for: –understanding & learning –communicating –consuming & entertaining l Requires new User Interface to machines
22 Making Flows a Reality l Computer Graphics –Creating realistic looking environments, people l Computer Vision –Analyzing posture, gaze, gestures l Speech input/output l Natural Language –Analysis, IR l Implicit requests for information
Building life-like human characters
Recognizing gestures Live video Area of motion H flow V flow
25 Generating life-like speech from textual data l Data-driven stochastic speech –Natural sounding –Rapid, automatic customizability l Examples –Synthetic voice w/ transplanted speech contours
26 l AT&T Voder, 1962, by Homer Dudley –Daisy (Inspiration for HALs voice in 2001) l Microsoft Research Whistler, 1997 –Scarborough Fair Artificial singing
27 Analyzing language l Language recognition shipped in Word 97 l General purpose text-critiquing, summarization, Japanese word-breaking
28 Inside The Office Grammar Checker
29 Understanding language: MindNet l A huge language knowledge base l Automatically created from dictionaries l Words (nodes) linked by relationships l Millions of links l Recently added (Encarta) encyclopedia knowledge
Is_a mouth Locn_of MindNet -- Going to the birds face peck limb goose creature make Is_a Typ_obj sound Typ_subj preen Is_a Part feather Not_is_a plant Is_a gaggle Is_a Part Part_of catch Typ_subj_of Typ_obj claw wing Is_a turtle beak Is_a strike Means hawk Typ_subj opening Is_a chatter Means Typ_subj Typ_obj Is_a cleansmoothbill Is_a duck Is_a Typ_0bj_of keepanimal quack Is_a Cause Purpose bird meat egg poultry Is_a supply Purpose Typ_obj Quesp hen chicken Is_a leg arm Is_a Typ_subj_of fly
31 Changing balance between user & software systems l Yesterday: –Applications were single programs running in isolation –Users used to (more or less) understand systems that they used l Today: –Componentized applications operate in concert –Sophisticated users understand only small percentage of systems they use
32 Tomorrows Systems and Applications l Users will not be able to predict –where computations will be performed, –when they will be performed or –by what software components l Gap between system capabilities and user understanding will grow to the point that the only way user will be able to use system is through assisting agents
33 Examples of user agents & implicit actions l Lumiere (Office 97) –Monitoring user and program events to provide user help and assistance l Implicit queries –Inferring information needs from browsing l Lookout/SpamKiller –Monitoring mail activity to auto-categorize it
User Modeling l Models of a users informational goals –Users query (when available…) –Users background –Acute and long-term search activity –Acute actions with objects and documents –Program data structures l Explicit and implicit information access and display
35 Outline (ambitious!) l Microsoft Research (census) l Tele-Presentations (Gordon Bell, Jim Gemmell) l Microsoft Research initiative on Telepresence l What if you could record everything you see & hear? l The architecture revolution: processing moves to transducers
36 Some Tera-Byte Databases Kilo Mega Giga Tera Peta Exa Zetta Yotta l The Web: 1 TB of HTML l TerraServer 1 TB of images l Several other 1 TB (file) servers l Hotmail: 7 TB of l Sloan Digital Sky Survey: 40 TB raw, 2 TB cooked l EOS/DIS (picture of planet each week) –15 PB by 2007 l Federal Clearing house: images of checks –15 PB by 2006 (7 year history) l Nuclear Stockpile Stewardship Program –10 Exabytes (???!!)
37 Kilo Mega Giga Tera Peta Exa Zetta Yotta A novel A letter Library of Congress (text) All Disks All Tapes A Movie LoC (image) Info Capture l You can record everything you see or hear or read. l What would you do with it? l How would you organize & analyze it? Video 8 PB per lifetime (10GBph) Audio 30 TB (10KBps) Read or write:8 GB (words) See:
38 Kilo Mega Giga Tera Peta Exa Zetta Yotta A novel A letter Library of Congress (text) All Disks All Tapes A Movie LoC (image) All Photos LoC (sound + cinima) All Information!
39 Michael Lesks Points l Soon everything can be recorded and kept l Most data will never be seen by humans l Precious Resource: Human attention Auto-Summarization Auto-Search will be a key enabling technology.
40 Outline (ambitious!) l Microsoft Research (census) l Tele-Presentations (Gordon Bell, Jim Gemmell) l Microsoft Research initiative on Telepresence l What if you could record everything you see & hear? l The architecture revolution: processing moves to transducers
41 Put Everything in Future (Disk) Controllers (its not if, its when?) Acknowledgements : Dave Patterson explained this to me a year ago Kim Keeton Erik Riedel Catharine Van Ingen Helped me sharpen these arguments
42 Remember Your Roots
43 Technology Drivers: Disks l Disks on track l 100x in 10 years 2 TB 3.5 drive l Shrink to 1 is 200GB l Disk replaces tape? l Disk is super computer! Kilo Mega Giga Tera Peta Exa Zetta Yotta
44 Data Gravity Processing Moves to Transducers l Move Processing to data sources l Move to where the power (and sheet metal) is l Processor in –Modem –Display –Microphones (speech recognition) & cameras (vision) –Storage: Data storage and analysis
45 Its Already True of Printers Peripheral = CyberBrick l You buy a printer l You get a –several network interfaces –A Postscript engine l cpu, l memory, l software, l a spooler (soon) –and… a print engine.
46 Tera Byte Backplane l TODAY –Disk controller is 10 mips risc engine with 2MB DRAM –NIC is similar power l SOON –Will become 100 mips systems with 100 MB DRAM. l They are nodes in a federation (can run Oracle on NT in disk controller). l Advantages –Uniform programming model –Great tools –Security –economics (CyberBricks) –Move computation to data (minimize traffic) All Device Controllers will be Cray 1s Central Processor & Memory
47 Basic Argument for x-Disks l Future disk controller is a super-computer. –1 bips processor –128 MB dram –100 GB disk plus one arm l Connects to SAN via high-level protocols –RPC, HTTP, DCOM, Kerberos, Directory Services,…. –Commands are RPCs –Management, security,…. –Services file/web/db/… requests –Managed by general-purpose OS with good dev environment l Apps in disk saves data movement –need programming environment in controller
48 The Slippery Slope l If you add function to server l Then you add more function to server l Function gravitates to data. Nothing = Sector Server Everything = App Server Something = Fixed App Server
49 Why Not a Sector Server? (lets get physical!) l Good idea, thats what we have today. l But –cache added for performance –Sector remap added for fault tolerance –error reporting and diagnostics added –SCSI commends (reserve,.. are growing) –Sharing problematic (space mgmt, security,…) l Slipping down the slope to a 2-D block server
50 Why Not a 1-D Block Server? Put A LITTLE on the Disk Server l Tried and true design –HSC - VAX cluster –EMC –IBM Sysplex (3980?) l But look inside –Has a cache –Has space management –Has error reporting & management –Has RAID 0, 1, 2, 3, 4, 5, 10, 50,… –Has locking –Has remote replication –Has an OS –Security is problematic –Low-level interface moves too many bytes
51 Why Not a 2-D Block Server? Put A LITTLE on the Disk Server l Tried and true design –Cedar -> NFS –file server, cache, space,.. –Open file is many fewer msgs l Grows to have –Directories + Naming –Authentication + access control –RAID 0, 1, 2, 3, 4, 5, 10, 50,… –Locking –Backup/restore/admin –Cooperative caching with client l File Servers are a BIG hit: NetWare –SNAP! is my favorite today
52 Why Not a File Server? Put a Little on the Disk Server l Tried and true design –Auspex, NetApp,... – Netware l Yes, but look at NetWare –File interface gives you app invocation interface –Became an app server l Mail, DB, Web,…. –Netware had a primitive OS l Hard to program, so optimized wrong thing
53 Why Not Everything? Allow Everything on Disk Server (thin clients) l Tried and true design –Mainframes, Minis,... –Web servers,… –Encapsulates data –Minimizes data moves –Scaleable l It is where everyone ends up. l All the arguments against are short-term.
54 The Slippery Slope l If you add function to server l Then you add more function to server l Function gravitates to data. Nothing = Sector Server Everything = App Server Something = Fixed App Server
55 Disk = Node l has magnetic storage (100 GB?) l has processor & DRAM l has SAN attachment l has execution environment OS Kernel SAN driverDisk driver File SystemRPC,... ServicesDBMS Applications
56 Technology Drivers: System on a Chip l Integrate Processing with memory on one chip –chip is 75% memory now –1MB cache >> 1960 supercomputers –256 Mb memory chip is 32 MB! –IRAM, CRAM, PIM,… projects abound l Integrate Networking with processing on one chip –system bus is a kind of network –ATM, FiberChannel, Ethernet,.. Logic on chip. –Direct IO (no intermediate bus) l Functionally specialized cards shrink to a chip.
57 Technology Drivers: What if Networking Was as Cheap As Disk IO? l TCP/IP –Unix/NT 100% 40MBps l Disk –Unix/NT 8% 40MBps Why the Difference? Host Bus Adapter does SCSI packetizing, checksum,… flow control DMA Host does TCP/IP packetizing, checksum,… flow control small buffers
58 Technology Drivers: The Promise of SAN/VIA:10x in 2 years l Today: –wires are 10 MBps (100 Mbps Ethernet) –~20 MBps tcp/ip saturates 2 cpus –round-trip latency is ~300 us l In the lab –Wires are 10x faster Myrinet, Gbps Ethernet, ServerNet,… – Fast user-level communication l tcp/ip ~ 100 MBps 10% of each processor l round-trip latency is 15 us
59 Gbps Ethernet: 110 MBps SAN: Standard Interconnect PCI: 70 MBps UW Scsi: 40 MBps FW scsi: 20 MBps scsi: 5 MBps l LAN faster than memory bus? l 1 GBps links in lab. l 100$ port cost soon l Port is computer RIP FDDI RIP ATM RIP SCI RIP SCSI RIP FC RIP ?
60 Technology Drivers: 100 GBps Ethernet replaces SCSI l Why I love SCSI –Its fast (40MBps) –The protocol uses little processor power l Why I hate SCSI –Wires must be short –Cables are pricey –pins bend
61 Functionally Specialized Cards l Storage l Network l Display M MB DRAM P mips processor ASIC Today: P=50 mips M= 2 MB In a few years P= 200 mips M= 64 MB
62 Technology Drivers Plug & Play Software l RPC is standardizing: (DCOM, IIOP, HTTP) –Gives huge TOOL LEVERAGE –Solves the hard problems for you: l naming, l security, l directory service, l operations,... l Commoditized programming environments –FreeBSD, Linix, Solaris,…+ tools –NetWare + tools –WinCE, WinNT,…+ tools –JavaOS + tools l Apps gravitate to data. l General purpose OS on controller runs apps.
63 Basic Argument for x-Disks l Future disk controller is a super-computer. –1 bips processor –128 MB dram –100 GB disk plus one arm l Connects to SAN via high-level protocols –RPC, HTTP, DCOM, Kerberos, Directory Services,…. –Commands are RPCs –management, security,…. –Services file/web/db/… requests –Managed by general-purpose OS with good dev environment l Move apps to disk to save data movement –need programming environment in controller
64 Summary l Microsoft Research (census) l Tele-Presentations (Gordon Bell, Jim Gemmell) l Microsoft Research initiative on Telepresence l What if you could record everything you see & hear? l The architecture revolution: processing moves to transducers