Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Astronomy Applications in the TeraGrid Environment Roy Williams, Caltech with thanks for material to: Sandra Bittner, ANL; Sharon Brunett, Caltech; Derek.

Similar presentations


Presentation on theme: "1 Astronomy Applications in the TeraGrid Environment Roy Williams, Caltech with thanks for material to: Sandra Bittner, ANL; Sharon Brunett, Caltech; Derek."— Presentation transcript:

1 1 Astronomy Applications in the TeraGrid Environment Roy Williams, Caltech with thanks for material to: Sandra Bittner, ANL; Sharon Brunett, Caltech; Derek Simmel, PSC; John Towns, NCSA; Nancy Wilkins-Diehr, SDSC

2 NVO Summer School Sept 2004 The TeraGrid Vision Distributing the resources is better than putting them at one site Build new, extensible, grid-based infrastructure to support grid-enabled scientific applications New hardware, new networks, new software, new practices, new policies Expand centers to support cyberinfrastructure Distributed, coordinated operations center Exploit unique partner expertise and resources to make whole greater than the sum of its parts Leverage homogeneity to make the distributed computing easier and simplify initial development and standardization Run single job across entire TeraGrid Move executables between sites

3 NVO Summer School Sept 2004 What is Grid Really? A set of powerful Beowulf clusters Lots of disk storage Fast interconnection Unified account management Interesting software The Grid is not Magic Infinite Simple A universal panacea The hype that you have read

4 NVO Summer School Sept 2004 Grid as Federation Teragrid as a federation independent centers flexibility unified interface power and strength Large/small state compromise

5 NVO Summer School Sept 2004 TeraGrid Wide Area Network

6 6 Grid Astronomy

7 NVO Summer School Sept 2004 Quasar Science An NVO-Teragrid project PennState, CMU, Caltech 60,000 quasar spectra from Sloan Sky Survey Each is 1 cpu-hour: submit to grid queue Fits complex model (173 parameter) derive black hole mass from line widths clusters globusrun manager NVO data services

8 NVO Summer School Sept 2004 N-point galaxy correlation An NVO-Teragrid project Pitt, CMU Finding triple correlation in 3D SDSS galaxy catalog (RA/Dec/z) Lots of large parallel jobs kd-tree algorithms

9 NVO Summer School Sept 2004 Palomar-Quest Survey Caltech, NCSA, Yale P48 Telescope CaltechYale NCSA Transient pipeline computing reservation at sunrise for immediate followup of transients Synoptic survey massive resampling (Atlasmaker) for ultrafaint detection TG NCSA and Caltech and Yale run different pipelines on the same data 50 Gbyte/night 5 Tbyte ALERT

10 NVO Summer School Sept 2004 Transient from PQ from catalog pipeline

11 NVO Summer School Sept 2004 PQ stacked images from image pipeline

12 NVO Summer School Sept 2004 Wide-area Mosaicking (Hyperatlas) An NVO-Teragrid project C altech High-quality flux-preserving, spatial accuracy Stackable Hyperatlas Edge-free Pyramid weight Mining AND Outreach DPOSS 15º Griffith Observatory "Big Picture"

13 NVO Summer School Sept 2004 2MASS Mosaicking portal An NVO-Teragrid project Caltech IPAC

14 14 Teragrid hardware

15 NVO Summer School Sept 2004 TeraGrid Components Compute hardware Intel/Linux Clusters, Alpha SMP clusters, POWER4 cluster, … Large-scale storage systems hundreds of terabytes for secondary storage Very high-speed network backbone bandwidth for rich interaction and tight coupling Grid middleware Globus, data management, … Next-generation applications

16 NVO Summer School Sept 2004 Overview of Distributed TeraGrid Resources HPSS UniTree External Networks Site Resources NCSA/PACI 10.3 TF 240 TB SDSC 4.1 TF 225 TB CaltechArgonne

17 NVO Summer School Sept 2004 Compute Resources – NCSA 2.6 TF ~10.6 TF w/ 230 TB GbE Fabric Myrinet Fabric 2p 1.3 GHz 4 or 12 GB memory 73 GB scratch Brocade 12000 Switches 256 2x FC 2.6 TF Madison 256 nodes 2p Madison 4 GB memory 2x73 GB 2p Madison 4 GB memory 2x73 GB 8 TF Madison 667 nodes Storage I/O over Myrinet and/or GbE 230 TB Interactive+Spare Nodes Login, FTP 8 4p Madison Nodes 30 Gbps to TeraGrid Network 2p Madison 4 GB memory 2x73 GB 92 2x FC 250MB/s/node * 670 nodes250MB/s/node * 256 nodes

18 NVO Summer School Sept 2004 Compute Resources – SDSC 1.3 TF ~4.3 + 1.1 TF w/ 500 TB GbE Fabric Myrinet Fabric 2p 1.3 GHz 4 GB memory 73 GB scratch Brocade 12000 Switches 128 2x FC 1.3 TF Madison 128 nodes 2p Madison 4 GB memory 2x73 GB 2p Madison 4 GB memory 2x73 GB 500 TB Login, FTP 30 Gbps to TeraGrid Network 256 2x FC 128 2x FC 128 250MB/s 3 TF Madison 256 nodes Interactive+Spare Nodes 6 4p Madison Nodes

19 NVO Summer School Sept 2004 Compute Resources – Caltech ~ 100 GF w/ 100 TB GbE Fabric Myrinet Fabric 2p Madison 6 GB memory 73 GB scratch 34 GF Madison 17 HP/Intel nodes 2p Madison 6 GB memory 2x73 GB 13 Tape drives 1.2 PB silo raw capacity Login, FTP 30 Gbps to TeraGrid Network 13 2xFC 36 250MB/s 72 GF Madison 36 IBM/Intel nodes Interactive Node 17 250MB/s 2p IBM Madison Node 4p Opteron 8 GB memory 66 TB RAID5 HPSS Datawulf 6 Opteron nodes 2p ia32 6 GB memory 100 TB /pvfs 33 IA32 storage nodes 100 TB /pvfs 33 250MB/s 2p Madison 6 GB memory 73 GB scratch

20 20 Using Teragrid

21 NVO Summer School Sept 2004 Wide Variety of Usage Scenarios Tightly coupled jobs storing vast amounts of data, performing visualization remotely as well as making data available through online collections (ENZO) Thousands of independent jobs using data from a distributed data collection (NVO) Science Gateways – "not a Unix prompt"! from web browser with security from application eg IRAF, IDL

22 NVO Summer School Sept 2004 Traditional Parallel Processing Single executables to be on a single remote machine big assumptions runtime necessities (e.g. executables, input files, shared objects) available on remote system! login to a head node, choose a submission mechanism Direct, interactive execution mpirun –np 16./a.out Through a batch job manager qsub my_script where my_script describes executable location, runtime duration, redirection of stdout/err, mpirun specification…

23 NVO Summer School Sept 2004 Traditional Parallel Processing II Through globus globusrun -r [some-teragrid-head- node].teragrid.org/jobmanager -f my_rsl_script where my_rsl_script describes the same details as in the qsub my_script! Through Condor-G condor_submit my_condor_script where my_condor_script describes the same details as the globus my_rsl_script!

24 NVO Summer School Sept 2004 Distributed Parallel Processing Decompose application over geographically distributed resources functional or domain decomposition fits well take advantage of load balancing opportunities think about latency impact Improved utilization of a many resources Flexible job management

25 NVO Summer School Sept 2004 Pipelined/dataflow processing Suited for problems which can be divided into a series of sequential tasks where multiple instances of problem need executing series of data needs processing with multiple operations on each series information from one processing phase can be passed to next phase before current phase is complete

26 NVO Summer School Sept 2004 Security ssh with password Too much password-typing Not very secure-- big break-in at TG April 04 One failure is a big failure all TG! Caltech and Argonne no longer allow this SDSC does not allow password change

27 NVO Summer School Sept 2004 Security ssh with public key: single sign-on! use ssh-keygen on Unix or puttykeygen on Windows public key file (eg id_rsa.pub) AND private key file (eg id_rsa) AND passphrase on remote machine, put public ke.ssh/authorized_keys on local machine, combine private key and passphrase ATM card model On TG, can put public key on application form immediate login, no snailmail

28 NVO Summer School Sept 2004 Security X.509 certificates: single sign-on! from a Certificate Authority (eg verisign, US navy, DOE, etc etc) It is: Distinguished Name (DN) AND /C=US/O=National Center for Supercomputing Applications/CN=Roy Williams Private file (usercert.p12) AND passphrase Remote machine needs entry in gridmap file (maps DN to account) use gx-map command Can create certificate with ncsa-cert-request etc Certificates can be lodged in web browser

29 NVO Summer School Sept 2004 3 Ways to Submit a Job 1. Directly to PBS Batch Scheduler Simple, scripts are portable among PBS TeraGrid clusters 2. Globus common batch script syntax Scripts are portable among other grids using Globus 3. Condor-G Nice interface atop Globus, monitoring of all jobs submitted via Condor-G Higher-level tools like DAGMan

30 NVO Summer School Sept 2004 PBS Batch Submission ssh tg-login.[caltech|ncsa|sdsc|uc].teragrid.org qsub flatten.sh –v "FILE=f544" qstat or showq ls *.dat pbs.out, pbs.err files

31 NVO Summer School Sept 2004 globus-job-submit For running of batch/offline jobs globus-job-submit Submit job same interface as globus-job-run returns immediately globus-job-status Check job status globus-job-cancel Cancel job globus-job-get-output Get job stdout/err globus-job-clean Cleanup after job

32 NVO Summer School Sept 2004 Condor-G Job Submission tg-login.sdsc.teragrid.org PBS Globus job manager mickey.disney.edu Globus API Condor-G executable=/wd/doit universe=globus globusscheduler= globusrsl=(maxtime=10) queue executable=/wd/doit universe=globus globusscheduler= globusrsl=(maxtime=10) queue

33 NVO Summer School Sept 2004 Condor-G Combines the strengths of Condor and the Globus Toolkit Advantages when managing grid jobs full featured queuing service credential management fault-tolerance DAGman (== pipelines)

34 NVO Summer School Sept 2004 Condor DAGMan Manages workflow interdependencies Each task is a Condor description file A DAG file controls the order in which the Condor files are run

35 NVO Summer School Sept 2004 Wheres the disk Home directory $TG_CLUSTER_HOME example /home/roy Shared writeable global areas $TG_CLUSTER_PFS example /pvfs/MCA04N009/roy

36 NVO Summer School Sept 2004 GridFtp Moving a Test File % globus-url-copy "`grid-cert-info -subject`" \ gsiftp://localhost:5678/tmp/file1 \ file:///tmp/file2 Also uberftp and scp

37 NVO Summer School Sept 2004 Storage Resource Broker (SRB) Single logical namespace while accessing distributed archival storage resources Effectively infinite storage (first to 1TB wins a t-shirt) Data replication Parallel Transfers Interfaces: command-line, API, web/portal.

38 NVO Summer School Sept 2004 Storage Resource Broker (SRB): Virtual Resources, Replication NCSA SDSC workstation SRB Client (cmdline, or API) hpss-sdscsfs-tape-sdschpss-caltech …

39 NVO Summer School Sept 2004 Allocations Policies TG resources allocated via the PACI allocations and review process modeled after NSF process TG considered as single resource for grid allocations Different levels of review for different size allocation requests DAC: up to 10,000 PRAC/AAB: <200,000 SUs/year NRAC: 200,000+ SUs/year Policies/procedures posted at: http://www.paci.org/Allocations.html Proposal submission through the PACI On-Line Proposal System (POPS) https://pops-submit.paci.org/ minimal review, fast turnaround

40 NVO Summer School Sept 2004 Requesting a TeraGrid Allocation http://www.paci.org

41 NVO Summer School Sept 2004 24/7 Consulting Support help@teragrid.org advanced ticketing system for cross-site support staffed 24/7 866-336-2357, 9-5 Pacific Time http://news.teragrid.org/ Extensive experience solving problems for early access users Networking, compute resources, extensible TeraGrid resources

42 NVO Summer School Sept 2004 Links www.teragrid.org/userinfo getting an account help@teragrid.org news.teragrid.org site monitors

43 43 Demo Data intensive computing with NVO services

44 NVO Summer School Sept 2004 DPOSS flattening 2650 x 1.1 Gbyte files Cropping borders Quadratic fit and subtract Virtual data SourceTarget

45 NVO Summer School Sept 2004 Driving the Queues for f in os.listdir(inputDirectory): # if the file exists, with the right size and age, then we keep it ofile = outputDirectory +"/"+ f if os.path.exists(ofile): osize = os.path.getsize(ofile) if osize != 1109404800: print " -- wrong target size, remaking", osize else: time_tgt = filetime(ofile) time_src = filetime(file) if time_tgt < time_src: print(" -- target too old or nonexistant, making") else: print " -- already have target file " continue cmd = "qsub flat.sh -v \"FILE=" + f +"\"" print " -- submitting batch job: ", cmd os.system(cmd) Here is the driver that makes and submits jobs

46 NVO Summer School Sept 2004 PBS script #!/bin/sh #PBS -N dposs #PBS -V #PBS -l nodes=1 #PBS -l walltime=1:00:00 cd /home/roy/dposs-flat/flat./flat \ -infile /pvfs/mydata/source/${FILE}.fits \ -outfile /pvfs/mydata/target/${FILE}.fits \ -chop 0 0 1500 23552 \ -chop 0 0 23552 1500 \ -chop 0 22052 23552 23552 \ -chop 22052 0 23552 23552 \ -chop 18052 0 23552 4000 A PBS script. Can do "qsub script.sh –v "FILE=f345"

47 NVO Summer School Sept 2004 Atlasmaker a service-oriented application on Teragrid VO Registry SIAP Hyperatlas Federated Images: wavelength, time,... source detection average/max subtraction

48 NVO Summer School Sept 2004 Hyperatlas Standard naming for atlases and pages TM-5-SIN-20 Page 1589 Standard Scales: scale s means 2 20-s arcseconds per pixel SIN projection TAN projection TM-5 layout HV-4 layout Standard Projections Standard Layout

49 NVO Summer School Sept 2004 Hyperatlas is a Service All Pages: /getChart?atlas=TM-5-SIN-20 02.77777778E-4'RA---SIN 'DEC--SIN' 0.0 -90.0 12.77777778E-4'RA---SIN'DEC--SIN' 0.0 -85.0 22.77777778E-4'RA---SIN'DEC--SIN' 36.0 -85.0... 17312.77777778E-4'RA---SIN'DEC--SIN' 288.0 85.0 17322.77777778E-4'RA---SIN'DEC--SIN' 324.0 85.0 17332.77777778E-4'RA---SIN'DEC--SIN' 0.0 90.0 Best Page: /getChart?atlas=TM-5-SIN- 20&RA=182&Dec=62 1604 2.77777778E-4 'RA---SIN 'DEC--SIN' 184.6153860.0 Numbered Page: /getChart?atlas=TM-5-SIN- 20&page=1604 1604 2.77777778E-4 'RA---SIN' 'DEC--SIN' 184.6153860.0 Replicated Implementations baseURL = http://mercury.cacr.caltech.edu:8080/hyperatlas (try services)(try services) baseURL = http://virtualsky.org/servlet

50 NVO Summer School Sept 2004 GET services from Python hyperatlasURL = self.hyperatlasServer + "/getChart?atlas=" + atlas \ + "&RA=" + str(center1) + "&Dec=" + str(center2) stream = urllib.urlopen(hyperatlasURL) # result is a tab-separated line, so use split() to tokenize tokens = stream.readline().split('\t') print "Using page ", tokens[0], " of atlas ", atlas self.scale = float(tokens[1]) self.CTYPE1 = tokens[2] self.CTYPE2 = tokens[3] rval1 = float(tokens[4]) rval2 = float(tokens[5]) This code uses a service to find the best hyperatlas page for a given sky location

51 NVO Summer School Sept 2004 VOTable parser in Python stream = urllib.urlopen(SIAP_URL) doc = xml.dom.minidom.parse(stream) #Make a dictionary for the columns col_ucd_dict = {} for XML_TABLE in doc.getElementsByTagName("TABLE"): for XML_FIELD in XML_TABLE.getElementsByTagName("FIELD"): col_ucd = XML_FIELD.getAttribute("ucd") col_ucd_dict[col_title] = col_counter urlColumn = col_ucd_dict["VOX:Image_AccessReference"] formatColumn = col_ucd_dict["VOX:Image_Format"] raColumn = col_ucd_dict["POS_EQ_RA_MAIN"] deColumn = col_ucd_dict["POS_EQ_DEC_MAIN"] From a SIAP URL, we get the XML, and extract the columns that have the image references, image format, and image RA/Dec (need exception catching here)

52 NVO Summer School Sept 2004 VOTable parser in Python table=[] for XML_TABLE in doc.getElementsByTagName("TABLE"): for XML_DATA in XML_TABLE.getElementsByTagName("DATA"): for XML_TABLEDATA in XML_DATA.getElementsByTagName("TABLEDATA"): for XML_TR in XML_TABLEDATA.getElementsByTagName("TR"): row=[] for XML_TD in XML_TR.getElementsByTagName("TD"): data = "" for child in XML_TD.childNodes: data += child.data row.append(data) table.append(row) Table is a list of rows, and each row is a list of table cells

53 NVO Summer School Sept 2004 SOAP client in Python from SOAPpy import * # get fitsheader string as FITS header # get x1, x2 as coordinates on image server = SOAPProxy("http://mercury.cacr.caltech.edu:9091") wcsR = server.xy2sky(fitsheader, x1, x2) ra = wcsR["c1"] dec = wcsR["c2"] status = wcsR["status"] message = wcsR["message"] print "Sky coordinates are:", ra, dec print "status is: ", status print "Message is: ", message WCSTools (xy2sky and sky2xy) as web services

54 NVO Summer School Sept 2004 Future: Science Gateways

55 NVO Summer School Sept 2004 Teragrid Impediments Learn Globus Learn MPI Learn PBS Port code to Itanium Get certificate Get logged in Wait 3 months for account Write proposal and now do some science....

56 NVO Summer School Sept 2004 A better way: Graduated Security for Science Gateways Web form - anonymous some science.... Register - logging and reporting more science.... Authenticate X.509 - browser or cmd line big-iron computing.... Write proposal - own account power user

57 NVO Summer School Sept 2004 Secure Web services for Teragrid Access web form (browser has certificate) auto-generated client API for scripted submission (certificate in.globus/) Clarens BOSS PBS Gridport Xforms distribute jobs on grid Embedded in existing client application (Root, IRAF, IDL,...) Embedded as part of other service (proxy agent)

58 NVO Summer School Sept 2004 Shell command List files, get files Submit job to TG queue (Condor / Dagman / globusrun) Monitor running jobs Secure Web services for Teragrid Access

59 NVO Summer School Sept 2004 Teragrid Wants YOU! Your astronomy applications Your science gateway projects Teragrid has 100's of processors and 100's of terabytes


Download ppt "1 Astronomy Applications in the TeraGrid Environment Roy Williams, Caltech with thanks for material to: Sandra Bittner, ANL; Sharon Brunett, Caltech; Derek."

Similar presentations


Ads by Google