Presentation is loading. Please wait.

Presentation is loading. Please wait.

A View from the Top Al Geist February 22-23 Houston TX.

Similar presentations

Presentation on theme: "A View from the Top Al Geist February 22-23 Houston TX."— Presentation transcript:

1 A View from the Top Al Geist February 22-23 Houston TX

2 Coordinator: Al Geist Participating Organizations ORNL ANL LBNL PNNL PSC SDSC IBM Compaq SNL LANL Ames NCSA SGI Scyld Intel Unlimited Scale Participating Organizations Main Web Site Recently web server experienced problems

3 Scalable Systems Software Center November 28-29 Dallas TX Review of Last Meeting Details on page 12 Main project notebook

4 Progress Reports at Nov. mtg Al Geist – working groups, notebooks, telecoms Fred Johnson – Program management is a big focus Projects are under the microscope Interactions chart – encouraged to show utility Wants to see software prototypes not just API Working Group Leaders – What areas their working group is addressing Progress report on what their group has done Present problems being addressed Next steps for the group Discussion items for the larger group to consider

5 Consensus and Voting: Wire Protocol Proposal: Passed strawvote 10 for / 1 against / 1 abstain ClusterBIOS Proposal: Passed strawvote 10 for / 0 against / 3 abstain Three Data Delivery Models: Discussion, refinement suggestions made to have two modes vote deferred until revised proposal written up XML Schema Formalization: Discussion of issues that must be addressed no vote requested SC2001 Cluster BOFS: suggestion that Scalable Systems have a BOF at SC2002

6 Scalable Systems Software Center November - February Progress Since Last Meeting

7 January 15-16 SciDAC PI Meeting Al Geist gave 15 minute overview presentation on the Scalable Systems Software Project Paul Hargrove and Al Geist manned a poster session on Morning of January 16 Opportunity to meet new groups who need our stuff NERSC, CCS, and OSC among others expressed interest Meeting with Fred the afternoon of January 16 Open source issues still being worked CCA is the first test case

8 Five Notebooks in place and filling up A main notebook for general information And individual notebooks for each working group Allows groups to keep track of other groups progress and comment on the items of overlap Allows Center members and interested parties to see what is being defined and implemented Over 130 total pages – 56 added since last meeting Get to all notebooks through main web site Click on side bar or at “project notebooks” at bottom of page

9 Four Weekly Working Group Telecoms Resource management, scheduling, and accounting Tuesday 3:00 pm (Eastern) 1-800-664-0771 keyword “SSS mtg” Validation and Testing Wednesday 1:00 pm (Eastern) 1-877-540-9892 mtg code 999157 Proccess management, system monitoring, and checkpointing Thursday 1:00 pm (Eastern) 1-877-252-5250 mtg code 160910 Node build, configuration, and information service Friday 3:00 pm (Eastern) 1-888-469-1934 mtg code 58145

10 Working Group Mailing Lists Resource management working group Proccess management working group Node build, configuration working group Validation and testing working group Mailing lists used for notification of new notebook entries meeting setup or changes Not for ideas and proposals

11 Scalable Systems Software Center November 28-29,2001 This Meeting

12 Agenda – February 22 8:00 Breakfast 8:30 Al Geist – “View from the Top” 9:00 Fred Johnson – MICS report Working Group Reports 9:30 Narayan Desi – Node Build, Configure 10:30 Break 11:00 Scott Jackson – Resource Management 12:00 Lunch (on own but go somewhere as group) 1:00 Paul Hargrove – Process Management 2:00 Erik Debenedictis – Validation and Testing 3.00 Break 3:30 Prototype Demos Rusty – process manager, job manager, service directory Ron – Scalable Linux monitor Eric – CPlant XML interface JP – information service Scott Jackson – QBank 5:30 Adjourn Working groups may wish to get together in evening

13 Agenda – February 23 8:00 Breakfast 8:30 Discussion, proposals, strawvotes Scott – Demo allocation manager Rusty – Security wire protocol 8/3/3 Mike –monitoring method Eric—XML issues Naryan—Service directory 10:30 Break 11:00 Al Geist – Summary next steps overall and for working groups next meeting June 13-14 Houston 12:00 meeting ends

14 Meeting notes Naryan – Service directory status Ssslib – simple helper functions send_message, receive_message, sd_register, sd_unregister future – multi-protocol support Node manager – functions: power up/down, boot/halt/ reboot, getImage, setImage, rebuild node, configUpdate node ORNL is working on a prototype due by next meeting Information service – scalable data repository intended to store well formed info from other components data management and memory leaks are still being thought about key-value pairs, vs. Db approach (this is perferred) data has to be registered, and schemas need to be defined. Discussion-why not use SQL? Are we reinventing the wheel? PSC uses database and is completely integrated to their system.

15 Meeting notes Scott Jackson – Proposed component architecture diagrams Creation of XML marshaller/unmarshaller Establish of CVS repository at Ames Scheduler progress – internal Resource manager API, Allocation manager API Initial support for checkpoint/restart – info it needs to know are: what resources tied up, when last Checkpoiont done. Locality requirements. Meta scheduler progress – support for data staging, proximity optimization Uses globus RSL queries thus has ties to Grid Job manager progress – initial study on PBS to determine viability of Dissection possibilities and functionality enhancements. Answer was NO, use all of PBS or use something else, don’t break up PBS Allocation Manager progress – draft requirements document underway prototype working, backend is SQL database. Page 37-38 RMnotebook Current issues— Next work—all components under CVS, design XML interface to RM scheduler demos by next mtg Discussion of “metadata” and validation of XML schema

16 Meeting notes Bigger questions- do we need SSS-wide CVS? Documentation? Problem tracking? Bit-keeper? Faster than SourceForge. Paul- XML is now a secondary issue, prototypes will sort it out Refined boundaries with RM working group Influencing the monitor discussion Resurrect the job manager. Collect PM steps Process manager – prototype by ANL and stub scheduler to feed it complete set of interfaces defined Checkpoint Manager – a separate component w/ five entry points migration interface is now 2 phase requirements document published as tech report. hard copy handed out for comments Monitors – two basic query types NCSA studying implementation issues initial capabilities will be PBS mon functionality Data Migration – no interface work yet would be invoked by Job Manager, could invoke PM

17 Meeting notes Next steps – continue work with RMwg Interface work for process manager and checkpoint mgr begin for job manager and monitors Prototyping and refinement Implementation survey report Discussion of two delivery models compressed to one. Single Method(metric, rate, threshhold, extended data) Jose says “one more step and we are done” Eric – XML proposed schema structure and style (proposal tomorrow) Source repository – one or multiple? Copyright of source. Is it accessible to public? Nightly regression tests. Supported machines? Discussion on options for source repository Teleconference being set up – time is tentatively Wed 1:00 eastern

18 Meeting notes Demos Naryan – service directory started, process manager registers with sd Start miniScheduler which finds PM and starts submitting jobs. Uses basic wire protocol, and ssslib, and the XML interface Next step- get RM group to supply the scheduler XML. Eric- Secure Wire Protocol through XML and Browser Dual mode accepts XML and returns XML accepts https request and returns HTML wrapped XML OpenSSL with 128bit encryption, certificate server, Security Plan can be written. https Web page form has boxes (name) (password) And text box to fill in XML (a default XML form is supplied) Submitted a job on 2 nodes of Cplant Next step-browser interface good (demos, GUI to Cplant, control console)

19 Meeting notes Demos Matt Sottile – Supermon: Scalable cluster monitoring Reactive and periodic monitoring modes Low Perturbation was a large focus Hierarchical architecture of supermon Describes the “S” (lisp-like) API Shows performance graphs to 20 processors 750hz sample rate Gives demo Would like to get users Al asks about how this could be used in scalable systems center Ron says could make supermonXML component And for scaling to 5000 processors build a k-way tree with k=50

20 Meeting notes Demos Scott Jackson – Qbank demo – didn’t work. JP Navaro – information service prototype Info service registers itself with service directory thru XML Runs client who finds info service thru directory Does query to show metadata contents of info service Client puts in data with DeclareData(), then InsertData() Wants to hear from WG how to make this more useful. Stephen describes CCS needs for User management Db Build configure group store node information Paul asks about access control to prevent others from deleting Long discussion on security and access

21 Meeting notes Day 2 Demo Scott Jackson – Allocation manager basic operations Next step building framework with allocations, people, machines, etc. Proposals Rusty – Basic Authentication to go with basic wire protocol Classic Shared secret key challenge/response algorithm Requires no machinery except Unix crypt or MD5 Question about password management and answer this is proposal Just for wire protocol. No assumptions on password management. Discussion of whether this is required for all components Some thought the proposal should be stronger, others thought It should be weaker (just basic wire protocol), some who abstained wanted to it a while before deciding. Proposal that basic wire protocol is augmented by a challenge/response Strawvote – 8 for, 3 against, 3 abstain

22 Meeting notes Day 2 Proposals Mike – Monitoring Gives his history of monitoring at NCSA Gives live demo of NCSA Platinum Cluster Monitor Looked at scalable protocol design (tested on 550 machines) XML increases data volume by 4X differential ASCII has lowest volume Current design in the pm notebook (diagrams) Single request method to cover event, streaming, polling, query Straw vote to take this approach for developing XML interfaces And components 12 for, 0 against, 1 abstain Naryan – Service Directory Problem: components need to locate each other and how to connect SD allows components to register, deregistration, lookup Name conflict – returns info about all matches Straw vote to have service directory as part of architecture 13/0/1

23 Meeting notes Day 2 Jose asks: RAS considerations – what happens if component doesn’t respond? Discussion follows. Including high availability considerations. Eric – XML issues Multiple roles for XML schema Global definitions in a Global schema – host, ports, authentication, etc XML namespaces – propose[global, pm, shed, acct] Proposed type names f.e. host-type, env-type, stdio-type, args-type Style Consistency – decide later Discussion – good idea to have a global namespace need to consider version numbers Straw vote 12 for, 0 against, 2 abstain

Download ppt "A View from the Top Al Geist February 22-23 Houston TX."

Similar presentations

Ads by Google