Presentation is loading. Please wait.

Presentation is loading. Please wait.

Systems to Capture Everything: Beyond cameras and desktops www.MyLifeBits.com www.MyLifeBits.com Gordon Bell, Jim Gemmell, Roger Lueder.

Similar presentations


Presentation on theme: "Systems to Capture Everything: Beyond cameras and desktops www.MyLifeBits.com www.MyLifeBits.com Gordon Bell, Jim Gemmell, Roger Lueder."— Presentation transcript:

1

2 Systems to Capture Everything: Beyond cameras and desktops Gordon Bell, Jim Gemmell, Roger Lueder

3 Summary: New Systems for Continuous Archival and Retrieval of Personal Experiences (CARPE) For the past seven years, our MyLifeBits research has been based on the premise that in the future, a person will be able to record everything that they have ever seen and heard – especially those items they have created. The initial phase encoded articles, books, correspondence (e.g. s, letters, memos), financial documents, music, papers, photos, presentations, videos, and web pages visited. For the past seven years, our MyLifeBits research has been based on the premise that in the future, a person will be able to record everything that they have ever seen and heard – especially those items they have created. The initial phase encoded articles, books, correspondence (e.g. s, letters, memos), financial documents, music, papers, photos, presentations, videos, and web pages visited. For the last five years we have concentrated on: For the last five years we have concentrated on: 1. building a system using a database to handle all of the different data items in a uniform fashion in order to be able to relate them to one another and to additional meta-data; 2. capturing telephone calls, meetings, TV programs viewed, interactivity of a user to understand how his time is spent; and 3. beginning phase of capturing real time personal health information, daily photos taken from wearing a SenseCam*, and arrays of sensors using wireless sensor networks. In effect, our system is a transaction processing system that is capturing significant events in a persons life. In the next phase, we hope to enhance the real time capture aspects to make this vast amount of CARPE data useful by filtering, image processing, additional meta-data e.g. location and voice annotation. A principle goal, as in the past, is to gain an understanding of what is technically feasible, and personally useful, while being economically viable at some future time. In the next phase, we hope to enhance the real time capture aspects to make this vast amount of CARPE data useful by filtering, image processing, additional meta-data e.g. location and voice annotation. A principle goal, as in the past, is to gain an understanding of what is technically feasible, and personally useful, while being economically viable at some future time. The capture of everything gets into a large number of issues that we have NOT addressed including timely preservation, scaling, location of all the data, privacy, security, intellectual property ownership, getting access to public records e.g. medical health, and rights to access what amounts to an individuals surrogate memory. The capture of everything gets into a large number of issues that we have NOT addressed including timely preservation, scaling, location of all the data, privacy, security, intellectual property ownership, getting access to public records e.g. medical health, and rights to access what amounts to an individuals surrogate memory. *SenseCam is a device built by Microsofts Cambridge Lab that records 1000s of pictures per day at intervals and changes in temperature, light level, vibration, etc.

4 Outline MyLifeBits aka Memex MyLifeBits aka Memex How has the project evolved? How has the project evolved? How do we use MyLifeBits? How do we use MyLifeBits? How is it built? How is it built? Shape of the database? Shape of the database? CARPE- Continuous archiving and recording of personal experience CARPE- Continuous archiving and recording of personal experience What is the vision? What is the vision? Relevance for devices and software? Relevance for devices and software?

5 I am data

6 History: Telepresence Tele-presentations Tele-meetings

7 Ambience and Presence: Being there while being here Dining at home on the Orient Express

8 History: The remote worker re- discovers the PERSONAL computer

9 Oct 1998 Can we scan your books and put them online? Raj Reddy Sure! Dont worry about copyright stuff. Microsoft has lots of lawyers

10 1999 – Scanning starts in earnest we start to scan, put content into folders & files

11 My docs and archive Self.. Biographical X- Employer Employer X-Employer Project Employer Library/file cab Active Employer Library/file cab <1980s Library/file cab Library/file cab Project Business Invests, family $s, & Legal Personal, including Medical Library/file cab

12 Now that its in Cyberspace How do you remember the 20,000+ file names? Or in which of 1500 folders they live? Whats about a tool for finding stuff?

13 Jan 2001 CACM A Personal Digital Store 16 GB; +2/yr 16 GB; +2/yr A good place to stop A good place to stop Began search for search engines, especially for . Began search for search engines, especially for . Jim suggests that we build a system that would be easier to use and have many more capabilities. Jim suggests that we build a system that would be easier to use and have many more capabilities.

14 2001 Capture goes beyond paper

15 Jim, I dont need no stinkin database! Gordon, You should be using a database.

16 Re-discovery of Memex As We May Think, Vannevar Bush, 1945 A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility Full-text search, text & audio annotations, and hyperlinks Full-text search, text & audio annotations, and hyperlinks

17 Even more capture Telephone calls, more video, all web pages visited, keyboard and mouse usage logging, radio, TV… Telephone calls, more video, all web pages visited, keyboard and mouse usage logging, radio, TV…

18 SenseCam

19 Feb 2005 Epiphany! Memex is a database & personal TP system

20 Demo Clips & Screens

21 747 Screen…

22 Vue de jour

23 Timeline

24 Pivoting: contact> call> t> web page

25 GPS Photo location

26 Reports

27 The Stew family tree Copyright Mark Stewart, 2004

28 Vibe report

29 Quindi Meeting Capture

30 SenseCam

31 SenseCam around Cambridge

32 MyLifeBits Software

33 Everything goes in a database MyLIfeBits need all the features of a database (Consistency, Indexing, Pivoting, Queries, Speed/scalability, Backup, Replication) MyLIfeBits need all the features of a database (Consistency, Indexing, Pivoting, Queries, Speed/scalability, Backup, Replication) If we didnt use one, well eventually create one! If we didnt use one, well eventually create one! Files as blobs; sync with file system for legacy apps Files as blobs; sync with file system for legacy apps We are part of Jim Grays Bay Area Research Lab We are part of Jim Grays Bay Area Research Lab SQL

34 MyLifeBits Software MyLifeBits store database Voice annotation tool Telephone capture tool TV capture tool TV EPG download tool Radio capture & EPG PocketPC transfer tool PocketRadio player Import files MyLifeBits Shell Browser tool Internet IM capture GPS import & Map display SenseCam Screen saver Text annotation tool MAPI interface Legacy client Outlookinterface files Legacy applications VIBElogging RoomCapture

35 Common ground with WinFS: Items, Links & Meta-data Annotates Caller in Phone Call Photo of Event

36 PhotoFinder - Shneiderman and Kang

37 The Shape & Size of Gordons LifeBits

38 MyLifeBits 10/31/ K items 110 GB by number of Items.

39 MyLifeBits 10/31/ GB 242 K items By Size (GB) Bell Growth: 1GB/month =1.1 TB/lifetime Size (MB) by Type

40 YearMpixManufacturer Ricoh 19991Kodak 20012Canon 20023Sony 20034Sony 20055Panasonic YearMpixManufacturer Ricoh 19991Kodak 20012Canon 20023Sony 20034Sony 20055Panasonic 15,000 photos

41 Monthly & Lifetime Storage Use ItemDaily number Total* MB|GB Month|Life 1 MB Books|reports0.13 5KB s KB Image scans MB Photos KB Web pages|docs MB Music KB/s Listened audio, speech40,0001, KB Daily photos1,0001,250 2 GB/hr TV4200,000

42 Observations about use(rs) 1. Cell phone sized device (CPSD) will be the platform! 2. On Applications… think about CPSD as the platform and context Search is the killer app pretty much as Bush described. Search is the killer app pretty much as Bush described. Screen savers memory refreshers also provide ambience Screen savers memory refreshers also provide ambience Where did my day to? Where did my day to? 3. Users are unwilling to spend time managing their computers or data. Meta-data, classification, etc. must be automatic Meta-data, classification, etc. must be automatic User-input meta-data e.g. Dublin Core – naïve Librarians dream. User-input meta-data e.g. Dublin Core – naïve Librarians dream. We have nice scheme for classification using facets. It requires work. We have nice scheme for classification using facets. It requires work. 4. Time is the most important meta-data. Photos: place (GPS), subject. 5. Folders are a good and bad idea. Most users dont know what they are or how they work Most users dont know what they are or how they work If used, over time, they become useless: too many, miss-file, etc. If used, over time, they become useless: too many, miss-file, etc. 6. User should put every information fragment into the system. e.g., to dos, call backs, business cards numbers, attention events. It pays. 7. Same information in multiple places always becomes obsolete.

43 Experience Performance Performance Abstracting away SQL can lead to problems Abstracting away SQL can lead to problems We like lots of cache – 2GB RAM machines… We like lots of cache – 2GB RAM machines… Schema update Schema update We got tired of losing data on every change! Schema diff tool We got tired of losing data on every change! Schema diff tool Logging Logging Helps us with performance & debug Helps us with performance & debug User studies User studies Beginning of a useful application! Beginning of a useful application! Folders are still the first class citizens Folders are still the first class citizens We need to break their power or eliminate them for anything else to have a chance We need to break their power or eliminate them for anything else to have a chance

44 Capturing Everything: Phone calls in context of cell phone as a platform for communication and capture Phone calls in context of cell phone as a platform for communication and capture Formal Meetings Formal Meetings Rooms Rooms Everything in daily life Everything in daily life Personal health and medical monitoring Personal health and medical monitoring Memex for scientists and engineers Memex for scientists and engineers

45 BodyMedia Output

46 Real time health monitoring Polysomnogram for sleep apnea.

47 Microsoft Research SensCam II Sensors: VGA camera w/ wide-angle lens VGA camera w/ wide-angle lens light level in R,G,B and white light level in R,G,B and white ambient temperature ambient temperature passive infrared for person detection passive infrared for person detection accelerometers accelerometers three, programmable buttons, LEDs, sounder three, programmable buttons, LEDs, sounder audio level & audio recording audio level & audio recording USB 2 and SD memory. 1-2 K photos/day USB 2 and SD memory. 1-2 K photos/day Not GPS Not GPS

48 SenseCam University Grant Program MSFT supplies money, software, SenseCams Memex vision: Notebook for engineers & scientists Memex vision: Notebook for engineers & scientists Medical & health: observations & memory recall, including diet and exercise Medical & health: observations & memory recall, including diet and exercise Education: How do people learn? Help me learn/remember! Education: How do people learn? Help me learn/remember! Tourist e.g. museum experience Tourist e.g. museum experience Plumbing Plumbing Security Security Filtering many images, voice & location annotation Filtering many images, voice & location annotation

49 More real time experience capture Real time medical & health monitoring Real time medical & health monitoring MIT. Deb Roy home capture to understan how his children learn MIT. Deb Roy home capture to understan how his children learn U. of Tokyo. Ubiquitous home U. of Tokyo. Ubiquitous home Columbia U. Voice & sound record & profile Columbia U. Voice & sound record & profile MIT. iDat. Electronic lab that records everything into your notebook MIT. iDat. Electronic lab that records everything into your notebook

50 Total Recall: Deb Roy, MIT Media Lab The basic database object is an event that is grounded in one or more channel of sensor data, and that can be layered with annotations/transcriptions that are generated by a combination of humans and automated algorithms. There is three sets of data: raw data (camera and mic recordings), transformed data which is never touched by humans (e.g., FFTs, motion pixels, etc.), and meta-data (person/object ID, speech transcription, etc.). The raw and transformed data live in parallel hierarchical trees, the meta-data in a database. The basic database object is an event that is grounded in one or more channel of sensor data, and that can be layered with annotations/transcriptions that are generated by a combination of humans and automated algorithms. There is three sets of data: raw data (camera and mic recordings), transformed data which is never touched by humans (e.g., FFTs, motion pixels, etc.), and meta-data (person/object ID, speech transcription, etc.). The raw and transformed data live in parallel hierarchical trees, the meta-data in a database. We have 10 megapixel omnidirection video cameras embedded in the ceiling of each room of my single family home. We are capturing about 14 frames per second from each camera, and CD quality audio from 14 boundary layer microphones situated throughout the house. We have 10 megapixel omnidirection video cameras embedded in the ceiling of each room of my single family home. We are capturing about 14 frames per second from each camera, and CD quality audio from 14 boundary layer microphones situated throughout the house. We have installed ipaq PDAs next to each light switch in the house, using them as touch displays for controlling recording. People in the house can pause and restart audio and/or video in each room using the PDA of that room. There is also a global pause function. When video is paused, a motorized shutter moves over the camera and conceals it -- we have found this physical removal of the camera is important for social reasons (when visitors don't want to be recorded etc.). Each PDA also has an "oops" button. When pressed, a dialog box pops up and lets the user specify how many minutes of data to delete retroactively. This has turned out to be essential to remove unintended private moments from the permanent archives. We have installed ipaq PDAs next to each light switch in the house, using them as touch displays for controlling recording. People in the house can pause and restart audio and/or video in each room using the PDA of that room. There is also a global pause function. When video is paused, a motorized shutter moves over the camera and conceals it -- we have found this physical removal of the camera is important for social reasons (when visitors don't want to be recorded etc.). Each PDA also has an "oops" button. When pressed, a dialog box pops up and lets the user specify how many minutes of data to delete retroactively. This has turned out to be essential to remove unintended private moments from the permanent archives. Data is moved from the house to MIT via sneakernet: LTO-3 tapes (400G/tape x 15 tapes per magazine). Data is moved from the house to MIT via sneakernet: LTO-3 tapes (400G/tape x 15 tapes per magazine). For storage at MIT we are working with Zetera (www.zetera.com) on a petabyte disk array that has distributed access points enabling very high bandwidth access to the data. The planned array will have 2500 x 400G SATA drives. We have an initial batch of about 550 drives from Seagate schedule to go online in January. In the interim we are using a bunch of Apple XRAIDS. We currently have a small cluster of CPUs connected to the disks which is fine since we have not yet developed the serious mining algorithms that will need more horsepower. Soon, however, I will be looking to build up a large fast cluster (if you know of anyone who would like to donate a couple of hundred rack mounted CPUs, do let me know. For storage at MIT we are working with Zetera (www.zetera.com) on a petabyte disk array that has distributed access points enabling very high bandwidth access to the data. The planned array will have 2500 x 400G SATA drives. We have an initial batch of about 550 drives from Seagate schedule to go online in January. In the interim we are using a bunch of Apple XRAIDS. We currently have a small cluster of CPUs connected to the disks which is fine since we have not yet developed the serious mining algorithms that will need more horsepower. Soon, however, I will be looking to build up a large fast cluster (if you know of anyone who would like to donate a couple of hundred rack mounted CPUs, do let me know.www.zetera.com We are creating a set of search, browse, and visualization tools for proving multiscale non-linear access to 3 years x 24 channels of audio-video recordings. The tools will integrate automated analysis of motion, voice activity, speaker identification, object tracking, etc. with human annotation. We are creating a set of search, browse, and visualization tools for proving multiscale non-linear access to 3 years x 24 channels of audio-video recordings. The tools will integrate automated analysis of motion, voice activity, speaker identification, object tracking, etc. with human annotation. One personal memory augmentation idea which we are just starting work on is to use computer vision to analyze the video and semi-automatically populate a virtual model of the house with locations of salient objects and people. A memory browser would then allow people to relive moments by moving through the virtual world in space and time and tune into the 14-track audio with location and orientation dependent playback. One personal memory augmentation idea which we are just starting work on is to use computer vision to analyze the video and semi-automatically populate a virtual model of the house with locations of salient objects and people. A memory browser would then allow people to relive moments by moving through the virtual world in space and time and tune into the 14-track audio with location and orientation dependent playback. This project is really at an early stage. We just started the raw recordings in late July and are putting out fires wrt the volume of data, various issues in running sensors and capture hardware 24/7. Most of the fun stuff is yet to come. I'd love to hear any suggestions you might have. This project is really at an early stage. We just started the raw recordings in late July and are putting out fires wrt the volume of data, various issues in running sensors and capture hardware 24/7. Most of the fun stuff is yet to come. I'd love to hear any suggestions you might have.

51 Experience Retrieval in a Ubiquitous Home (chamds, byon, yamasaki, Experience Retrieval in a Ubiquitious Home

52 MIT iDAT Project aka notebook

53 Samsung challenge Going beyond plain old photography and videography Going beyond plain old photography and videography Print, view, and file in scrapbook or shoebox Print, view, and file in scrapbook or shoebox Digitized bits offers worldwide sharing and easy sharing Digitized bits offers worldwide sharing and easy sharing Screensaver is useful, but is it a killer app? Screensaver is useful, but is it a killer app? The cell phone sized device (CPSD)… one device The cell phone sized device (CPSD)… one device Next generation platform Next generation platform Phones and messaging e.g. sms, mail, web, iM, blogging Phones and messaging e.g. sms, mail, web, iM, blogging Audio, photo, video record and viewing (incl. broadcast) Audio, photo, video record and viewing (incl. broadcast) Within 5 years and with supplemental devices, will take on the PC Within 5 years and with supplemental devices, will take on the PC

54 Capture, storage, retrieval, and display Challenge putting them together Capture …. Capture …. Cell phone sized devices (CPSD). The killer app!! Cell phone sized devices (CPSD). The killer app!! Consumer… photo, video, audio… experience Consumer… photo, video, audio… experience Professional Professional Storage Storage Capture Capture Archival Archival Retrieval = f(use). Archive… ambience Retrieval = f(use). Archive… ambience Display Display Personal: Cell phone Personal: Cell phone PC PC Wall Wall

55

56 BONUS SLIDES

57 Challenges Data-types Data-types Quantity expanding i.e. info explosion Quantity expanding i.e. info explosion New capabilities e.g. real time create new data-types New capabilities e.g. real time create new data-types Meta-data to increase value & provide pivots Meta-data to increase value & provide pivots Going beyond a PC to a distributed environment Going beyond a PC to a distributed environment Network environment, including media center Network environment, including media center Into the cloud. Especially important for social aspects Into the cloud. Especially important for social aspects Periphery… smart buildings, objects, Periphery… smart buildings, objects, Backup, migration, and caching for beyond a Terabyte Backup, migration, and caching for beyond a Terabyte Expanding network: PC > LANs > web > p2p(eer) Expanding network: PC > LANs > web > p2p(eer) Schema sharing among disparate systems Schema sharing among disparate systems CARPE (real time data capture) CARPE (real time data capture) Rooms, phone calls, SenseCam, Health transducers, etc. Rooms, phone calls, SenseCam, Health transducers, etc. Security, privacy, forgetfulness, deniability, etc. Security, privacy, forgetfulness, deniability, etc.

58 More challenges Dear Appy: Monitoring and automatic migration of files that are unlikely to be understood on future platforms as well as platform migration. Dear Appy: Monitoring and automatic migration of files that are unlikely to be understood on future platforms as well as platform migration. Get What I Need: GWIN…Endless, but evolutionary improvements in search: misspellings, stemming synonyms Get What I Need: GWIN…Endless, but evolutionary improvements in search: misspellings, stemming synonyms Endless frontier of schema and extensions to them for new applications e.g. making org charts, family relationships. Endless frontier of schema and extensions to them for new applications e.g. making org charts, family relationships. CARPE… a whole new game! CARPE… a whole new game! Versioning is essential Versioning is essential Scaling.. We dont know what happens at a Terabyte Scaling.. We dont know what happens at a Terabyte What can, should be, or will be in the cloud? Books… videos What can, should be, or will be in the cloud? Books… videos Will we be allowed to use such systems? Copyright laws vary: E.g. ripping CDs, copy of anything, photos, conversations Will we be allowed to use such systems? Copyright laws vary: E.g. ripping CDs, copy of anything, photos, conversations

59 The dear appy problem Dear Appy, How committed are you? Please come back to me. Forever yours truly, Lost and forgotten data Whos responsible? Whos responsible? Media or 8 track cassette, 8 floppy Media or 8 track cassette, 8 floppy Evolving platform, file, and database Evolving platform, file, and database Evolving, incompatible standards & formats for legacy data that disregard ancestors Evolving, incompatible standards & formats for legacy data that disregard ancestors Evolving and/or disappearing apps Evolving and/or disappearing apps

60 Is Cyberspace a safe store? Dont your physical records e.g. paper last forever? What about information on your CDs, tapes, hard drives, solid state devices?

61 Automatic classification problem XML on bills and imported content… transactions XML on bills and imported content… transactions We need to download classifications rather than build them We need to download classifications rather than build them Definitions & synonyms should help find what I want Definitions & synonyms should help find what I want Today it is too expensive to manually classify scanned paper. E.g. right time meta-data is critical! Today it is too expensive to manually classify scanned paper. E.g. right time meta-data is critical! We hope the system can classify papers and other documents e.g. bills. Ideally, build Dublin Core We hope the system can classify papers and other documents e.g. bills. Ideally, build Dublin Core In 10 years we need all documents to appear electronically & classified with a little help from me In 10 years we need all documents to appear electronically & classified with a little help from me


Download ppt "Systems to Capture Everything: Beyond cameras and desktops www.MyLifeBits.com www.MyLifeBits.com Gordon Bell, Jim Gemmell, Roger Lueder."

Similar presentations


Ads by Google