Presentation is loading. Please wait.

Presentation is loading. Please wait.

Overview of Wisconsin Campus Grid Dan Bradley Center for High-Throughput Computing.

Similar presentations


Presentation on theme: "Overview of Wisconsin Campus Grid Dan Bradley Center for High-Throughput Computing."— Presentation transcript:

1 Overview of Wisconsin Campus Grid Dan Bradley Center for High-Throughput Computing

2 Technology Dan Bradley2

3 HTCondor Dan Bradley3 submit machine Condor Pool firewall flocking Open one port and use shared_port on submit machine. submit machine If execute nodes are behind NAT but have outgoing net, use CCB. pools: 5 submit nodes: 50 user groups: 106 execute nodes: 1,600 cores: 10,000 executable = a.out RequestMemory = 1000 output = stdout error = stderr queue 1000 executable = a.out RequestMemory = 1000 output = stdout error = stderr queue 1000 CCB NAT

4 Accessing Files No campus-wide shared FS HTCondor file transfer for most cases: – Send software + input files to job – Grind, grind, … – Send output files back to submit node Some other cases: – AFS: works on most of campus, but not across OSG – httpd + SQUID(s): when xfer from submit node doesn’t scale – CVMFS: read-only http FS (see talk tomorrow) – HDFS: big datasets on lots of disks – Xrootd: good for access from anywhere Used on top of HDFS and local FS Dan Bradley4

5 Managing Workflows A simple submit file works for many users – We provide an example job wrapper script to help download and set up common software packages: MATLAB, python, R DAGMan is used by many others – Common pattern: User drops files into a directory structure Script generates DAG from that Rinse, lather, repeat Some application portals are also used – e.g. NEOS Online Optimization Service Dan Bradley5

6 Overflowing to OSG glideinWMS – We run a glideinWMS “frontend” – Uses OSG glidein factories – Appears to users as just another pool to flock to But jobs must opt-in: +WantGlidein = True Dan Bradley6 million hours used We customize glideins to make them look more like other nodes on campus: publish OS version, glibc version, CVMFS availability

7 A Clinical Health Application Tyler Churchill: modeling cochlear implants to improve signal processing. Used OSG + campus resources to run simulations that include important acoustic temporal fine structure, which is typically ignored due to difficulty. “We can't do much about sound resolution given hardware limitations, but we can improve the integrated software. OSG and distributed high-throughput computing are helping us rapidly produce results that directly benefit CI wearers.” 7Dan Bradley

8 Engaging Users Dan Bradley8

9 Engaging Users Meet with individuals (PI + techs) – Diagram workflow – How much input, output, memory, time? – Suitable for exporting to OSG? – Where will the output go? – What software is needed? Licenses? Tech support as needed Periodic reviews Dan Bradley9

10 Training Users Workshops on campus – New users can learn about HTCondor, OSG, etc. – Existing groups can send new students – Show examples of what others have done Classes – Scripting for scientific users: python, perl, submitting batch jobs, DAGMan Dan Bradley10

11 User Resources Many bring only their (big) brains – Use central or local department submit nodes – Use only modest scratch space Some have their own submit node – Can attach their own storage – Control user access – Install system software packages Dan Bradley11

12 Submitting Big Kick started work with big run in EC2, now continuing on campus. Building a database to quickly classify stem cells and identify important genes active in cell states useful for clinical applications. Victor Ruotti, winner of Cycle Computing’s Big Science Challenge 12Dan Bradley

13 Users with Clusters Three flavors: – condominium User provides cash, we do the rest – neighborhood association User provides space, power, cooling, machines Configuration is standardized – sister cities Independent pools that people want to share e.g. student computer labs Dan Bradley13

14 Laboratory for Molecular and Computational Genomics Cluster integrated into campus grid Combined resources can map data representing the equivalent of one human genome in 90 minutes. Tackling challenging cases such as the important maize genome, which is difficult for traditional sequence assembly approaches. Using whole genome single molecule optical mapping technique. 14Dan Bradley

15 Reaching Further Dan Bradley15 Research Groups by Discipline


Download ppt "Overview of Wisconsin Campus Grid Dan Bradley Center for High-Throughput Computing."

Similar presentations


Ads by Google