Presentation is loading. Please wait.

Presentation is loading. Please wait.

Testimony to the Advisory Committee on CyberInfrastructure v2.0 Gordon Bell Microsoft Bay Area Research Center 15 February 2002 (with post-testimony reprise)

Similar presentations


Presentation on theme: "Testimony to the Advisory Committee on CyberInfrastructure v2.0 Gordon Bell Microsoft Bay Area Research Center 15 February 2002 (with post-testimony reprise)"— Presentation transcript:

1 Testimony to the Advisory Committee on CyberInfrastructure v2.0 Gordon Bell Microsoft Bay Area Research Center 15 February 2002 (with post-testimony reprise)

2 NSF Post-Testimony Reprise 2/15/02 15:00 cgb Teleconference of voice bridge & access grid both very poor! Significant need for an access grid & distributed teleconferencing appliance for … (a trip to testify for 30 min. or hold the meeting should be virtual!) Same old concerns: “I don’t have as many flops as their lab.” Much of the facilities should be distributed and with build-it yourself Beowulf clusters to get extraordinary cycles and bytes. Centers need to be re-centered see Bell & Gray, “What’s Next in High Performance Computing, Comm. ACM, Feb. 2002, pp91-95. Cost of networking, cycles, and bytes requires rethinking how to do various kinds of science, in light of the centrality versus distributed nature of the work e.g. instrumentation that generates lots of data. (Last mile problem is significant.) –Fedex’d hard drive is cheap. Cost of hard drive < network cost. Net is very expensive! –Centers flops and bytes are expensive. Distributed likely to be less so. –Many sciences need to be reformulated as a distribute computing/dbase Network costs are a disgrace. $1 billion boondoggle with NGI, Internet II. Have put too much money in tool builder hands. They are reinventing industry tools, but not with the cognizance of WWW technology! Give funding to scientists in joint grants with tool builders e.g. www came from user Database technology is not understood by users and computer scientists –Training, tool funding, & combined efforts especially when large & distributed –Equipment, problems, etc are dramatically outstripping our capabilities! Time for an NSF reboot!

3 Network concerns Very high cost –$(1 + 1) / GByte to send on the net; Fedex and 160 GByte shipments are cheaper –DSL at home is $0.15 - $0.30 Disks cost less than $2/GByte to purchase Low availability of fast links (last mile problem) –Labs & universities have DS3 links at most, and they are very expensive –Traffic: Instant messaging, music stealing Performance at desktop is poor –1- 10 Mbps; very poor communication links Manage: trade-in fast links for cheap links!!

4 Data delivery costs 1$/GB today Using the Network!!! Rent for “big” customers: 300$/megabit per second per month Improved 3x in last 6 years (!). That translates to 1$/GB at each end. You can mail a 160 GB disk for 20$. –That’s 16x cheaper –If overnight it’s 3 MBps. 3x160 GB ~ ½ TB

5 Some science is hitting a wall FTP and GREP are not adequate (Jim Gray) You can GREP 1 MB in a second You can GREP 1 GB in a minute You can GREP 1 TB in 2 days You can GREP 1 PB in 3 years. Oh!, and 1PB ~10,000 disks At some point you need indices to limit search parallel data search and analysis This is where databases can help Goal Make it easy to –Publish: Record structured data –Find: Find data anywhere in the network Get the subset you need –Explore datasets interactively You can FTP 1 MB in 1 sec You can FTP 1 GB / min (= 1 $/GB) … 2 days and 1K$ … 3 years and 1M$

6 Collaborative research sharing instrumentation, data, and programs We’ve talked about it for decades e.g. accelerators to telelescopes and zoology –Doer / “User, talker & meeter” = 4%. –http://www.all-species.org/ has the problem…http://www.all-species.org/ Focus has been and is on ops not bytes! –E.g. Pittsburgh center funded with no storage –Why have centers for computation at all? Don’t we need datacenter? –By having no storage, re-compute everything –Adding indexes i.e. databases, increases speed, lessens computation, and increases experimentation Computation centers become data centers since everyone/anyone builds a center Need for computational scientist database talent!

7 Recommendations (given as testimony) The “system” is fundamentally broken that is going to move to a different level. Give the funding to users, not VLGs to tool builders to reinvent: –HTTP and XML etc. (starting over) using FTP –Naming and discovery services –Security –Scheduling and accounting, etc. Goal has to be publishing programs & data has to be as easy as publishing web pages!

8 An Example: SkyServer and SkySurvey Database; A Prototype for other sciences? Gray, Szalay, et al First paper on the SkyServer http://research.microsoft.com/~gray/Papers/MSR_ TR_2001_77_Virtual_Observatory.pdf http://research.microsoft.com/~gray/Papers/MSR_ TR_2001_77_Virtual_Observatory.doc Later, more detailed paper for database community http://research.microsoft.com/~gray/Papers/MSR_ TR_01_104_SkyServer_V1.pdf http://research.microsoft.com/~gray/Papers/MSR_ TR_01_104_SkyServer_V1.doc

9 What can be learned from Sky Server? It’s about data, not about harvesting flops 1-2 hr. query programs versus 1 wk programs 10 minute runs versus 3 day compute & searches Database viewpoint. –Avoid costly re-computation and searches –Use indices and PARALLEL I/O. Read / Write >>1. (parallelism is automatic and transparent) –The talent appears to be non-existent to do this.

10 Heuristics for building communities that need to share data & programs Always go from working to working Do it by induction in time and space (Why version 3 is pretty good.) Put ONE database in place that’s useful by itself in terms of UI, content, & queries Invent and demo 10-20 instances of use Get two working in a single location Extend to include a second community, with an appropriate superset capability

11 Gigabit per second workstation to workstation bet. 7 March 1997 Bet against optimists & big programs! Raj Reddy, J. Gray, & Dan Ling versus A. van Dam, J. Hennessy, Ed Lazowka and G Bell. Decide 12/31/2000. (Dinner and wine... wine cost not to exceed cost of dinner.) RR, JG, and DL bet that at least 10K Workstations, located in at least 10 sites, in at least 3 states will be able to communicate with one another over an end to end path operating at least at a 1 Gigabit per second rate. (Workstation to backbone, backbone to WAN, and WAN must all operate at this rate.) The phone bill has been several hundred million. It remains undone (2002), independent of computers size.


Download ppt "Testimony to the Advisory Committee on CyberInfrastructure v2.0 Gordon Bell Microsoft Bay Area Research Center 15 February 2002 (with post-testimony reprise)"

Similar presentations


Ads by Google