Presentation is loading. Please wait.

Presentation is loading. Please wait.

Large Input in DHTC Thursday PM, Lecture 1 Lauren Michael CHTC, UW-Madison.

Similar presentations


Presentation on theme: "Large Input in DHTC Thursday PM, Lecture 1 Lauren Michael CHTC, UW-Madison."— Presentation transcript:

1 Large Input in DHTC Thursday PM, Lecture 1 Lauren Michael CHTC, UW-Madison

2 OSG User School 2016 Hardware transfer limits 2 submit server exec server HTCondor submit file executable dir/ input output (exec dir)/ executable input output exec server <10MB/file, 1GB total <1GB/file and total

3 OSG User School 2016 Reducing data needs An HTC best practice! split large input for better throughput and less per-job data eliminate unnecessary data compress and combine files 3

4 OSG User School 2016 Large input in HTC and OSG 4 file sizemethod of delivery wordswithin executable or arguments? tiny – 10MB per fileHTCondor file transfer (up to 1GB total per-job) 10MB – 1GB, shareddownload from web proxy (network-accessible server) 1GB - 10GB, unique or shared StashCache (regional replication) 10 GB - TBsshared file system (local copy, local execute servers) exec server

5 OSG User School 2016 Using a Web Proxy Place the file onto a local, proxy-configured web server Have HTCondor download via HTTP address 5 submit server exec server proxy web server

6 OSG User School 2016 Using a Web Proxy Place the file onto a proxy-configured web server Have HTCondor download via HTTP address 6 submit server exec server proxy web server file

7 OSG User School 2016 Using a Web Proxy Place the file onto a proxy-configured web server Have HTCondor download via HTTP address 7 submit server exec server proxy web server file proxy web cache

8 OSG User School 2016 proxy web cache Using a Web Proxy Place the file onto a proxy-configured web server Have HTCondor download via HTTP address 8 submit server exec server proxy web server HTCondor file

9 OSG User School 2016 proxy web cache Using a Web Proxy Place the file onto a proxy-configured web server Have HTCondor download via HTTP address 9 submit server exec server proxy web server HTCondor file

10 OSG User School 2016 proxy web cache Using a Web Proxy Place the file onto a proxy-configured web server Have HTCondor download via HTTP address 10 submit server exec server proxy web server HTCondor file exec server

11 OSG User School 2016 Downloading Proxy Files HTCondor submit file: (recommended) transfer_input_files = http://host.univ.edu/path/to/shared.tar.gz Anywhere (in-executable, or test download) wget http://host.univ.edu/path/to/shared.tar.gz  in-executable: make sure to delete after un-tar or at the end of the job!!! (HTCondor thinks it’s ‘new’) 11

12 OSG User School 2016 Web Proxy Considerations Managed per-VO Memory limited, max file size: 1 GB Local caching at OSG sites  good for shared input files, only  perfect for software and common input  need to rename changed files!!! Files are downloadable by ANYONE who has the specific HTTP address  Will work on 100% of OSG sites, though not all sites will have a local cache 12

13 OSG User School 2016 place files in /squid/user on a local submit server address : http://proxy.chtc.wisc.edu/SQUID/user/shared.tar.gz proxy web cache At UW-Madison (Ex. 3.1) 13 any HTC submit exec server proxy web server HTCondor file exec server /squid/user local server learn.chtc.wisc.edu file

14 OSG User School 2016 Large input in HTC and OSG 14 file sizemethod of delivery wordswithin executable or arguments? tiny – 10MB per fileHTCondor file transfer (up to 1GB total per-job) 10MB – 1GB, shareddownload from web proxy (network-accessible server) 1GB - 10GB, unique or shared StashCache (regional replication) 10 GB - TBsshared file system (local copy, local execute servers) exec server

15 OSG User School 2016 Using StashCache for Input regionally-cached repository managed by OSG Connect 15

16 OSG User School 2016 place files in /home/user/public on login.osgconnect.net regional cache Placing Files in StashCache 16 any OSG submit exec server “Stash” origin file exec server local server login.osgconnect.net /home/username/public

17 OSG User School 2016 regional cache updates from origin every hour regional cache Placing Files in StashCache 17 exec server “Stash” origin file exec server file any OSG submit local server login.osgconnect.net /home/username/public

18 OSG User School 2016 Use HTCondor transfer for other files regional cache Obtaining Files in StashCache 18 exec server “Stash” origin HTCondor file exec server file any OSG submit local server login.osgconnect.net /home/username/public

19 OSG User School 2016 Download using stashcp command (available as an OASIS software module) regional cache Obtaining Files in StashCache 19 exec server “Stash” origin HTCondor file exec server file stashcp any OSG submit /home/username/public local server login.osgconnect.net

20 OSG User School 2016 Require StashCashe sites in the submit file +WantsStashCache Require sites with OASIS modules (for stashcp ) Requirements = (HAS_MODULES =?= true) In the Submit File 20

21 OSG User School 2016 #!/bin/bash # setup:. /cvmfs/oasis.opensciencegrid.org/osg/modules/lmod/current/init/bash module load stashcp stashcp /user/username/public/file.tar.gz./ # END In the Job Executable 21

22 OSG User School 2016 Available at ~90% of OSG sites Regional caches on very fast networks  Max file size: 10 GB  shared OR unique data Caches are updated ~hourly  rename files to update them (safest) Currently in transition to a new method, but staschcp will stay around!! StashCache Considerations 22

23 OSG User School 2016 StashCache Speed 23

24 OSG User School 2016 Large input in HTC and OSG 24 file sizemethod of delivery wordswithin executable or arguments? tiny – 10MB per fileHTCondor file transfer (up to 1GB total per-job) 10MB – 1GB, shareddownload from web proxy (network-accessible server) 1GB - 10GB, unique or shared StashCache (regional replication) 10 GB - TBsshared file system (local copy, local execute servers) exec server

25 OSG User School 2016 Some distributed projects with LARGE, shared datasets may have project-specific repositories that exist only on certain sites  (e.g. CMS, Atlas, LIGO?, FIFE?, others?)  Jobs will require specific sites with local copies and use project- specific access methods OASIS?  Best for lots of small files per job (e.g. software)  StashCache and Proxies better for fewer larger files per job Other Options? 25

26 OSG User School 2016 For StashCache AND web proxies: make sure to delete data when you no longer need it in the origin!!! StashCache and VO-managed web proxy servers do NOT have unlimited space!  Some may regularly clean old data for you. Check with local support. Cleaning Up Old Data 26

27 OSG User School 2016 Only use these options if you MUST!!  Each comes with limitations on site accessibility and/or job performance, and extra data management concerns Other Considerations 27 file sizemethod of delivery wordswithin executable or arguments? tiny – 10MB per fileHTCondor file transfer (up to 1GB total per-job) 10MB – 1GB, shareddownload from web proxy (network-accessible server) 1GB - 10GB, unique or shared StashCache (regional replication) 10 GB - TBsshared file system (local copy, local execute servers)

28 OSG User School 2016 Exercises 3.1 Using a web proxy for shared input  place the blast database on the web proxy 3.2 StashCache for shared input  place the blast database in StashCache 3.3 StashCache for unique input  convert movie files 28

29 OSG User School 2016 Questions? Feel free to contact me:  lmichael@wisc.edu lmichael@wisc.edu Next: Exercises 3.1-3.3 Later: Large output and shared filesystems 29


Download ppt "Large Input in DHTC Thursday PM, Lecture 1 Lauren Michael CHTC, UW-Madison."

Similar presentations


Ads by Google