Thursday AM, Lecture 2 Lauren Michael CHTC, UW-Madison

Slides:



Advertisements
Similar presentations
Workflows: from Development to Automated Production Thursday morning, 10:00 am Lauren Michael Research Computing Facilitator University of Wisconsin -
Advertisements

Dealing with real resources Wednesday Afternoon, 3:00 pm Derek Weitzel OSG Campus Grids University of Nebraska.
More HTCondor 2014 OSG User School, Monday, Lecture 2 Greg Thain University of Wisconsin-Madison.
Intermediate HTCondor: More Workflows Monday pm Greg Thain Center For High Throughput Computing University of Wisconsin-Madison.
Intermediate Condor: DAGMan Monday, 1:15pm Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
K.Harrison CERN, 23rd October 2002 HOW TO COMMISSION A NEW CENTRE FOR LHCb PRODUCTION - Overview of LHCb distributed production system - Configuration.
Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller Center for High Throughput Computing Department of Computer Sciences.
Israel Cluster Structure. Outline The local cluster Local analysis on the cluster –Program location –Storage –Interactive analysis & batch analysis –PBS.
Intermediate HTCondor: Workflows Monday pm Greg Thain Center For High Throughput Computing University of Wisconsin-Madison.
NITAAC Customer Support Phone: Website: NITAAC.nih.gov Customer Guide for using the electronic Government.
Communicating with Users about HTCondor and High Throughput Computing Lauren Michael, Research Computing Facilitator HTCondor Week 2015.
When and How to Use Large-Scale Computing: CHTC and HTCondor Lauren Michael, Research Computing Facilitator Center for High Throughput Computing STAT 692,
Utilizing Condor and HTC to address archiving online courses at Clemson on a weekly basis Sam Hoover 1 Project Blackbird Computing,
22-Aug-15 | 1 |1 | Help! I need more servers! What do I do? Scaling a PHP application.
Matlab, R and Other Jobs in CHTC. chtc.cs.wisc.edu No suitable R No Matlab runtime Missing shared libraries Missing compilers … Running On Bare Bones.
Breathing New Life Into An Old Laptop. Give an Old Laptop New Life with Cheap (or Free) Projects Picture frame Wireless Bridge File Server Printer server.
Ways to Connect to OSG Tuesday afternoon, 3:00 pm Lauren Michael Research Computing Facilitator University of Wisconsin-Madison.
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
IntroductiontotoHTCHTC 2015 OSG User School, Monday, Lecture1 Greg Thain University of Wisconsin– Madison Center For High Throughput Computing.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Workflows: from Development to Production Thursday morning, 10:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
Building a Real Workflow Thursday morning, 9:00 am Greg Thain University of Wisconsin - Madison.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
Click here to download this powerpoint template : Colorful Networks Free Powerpoint TemplateColorful Networks Free Powerpoint Template For more : Powerpoint.
Turning science problems into HTC jobs Wednesday, July 29, 2011 Zach Miller Condor Team University of Wisconsin-Madison.
9 th Weekly Operation Report on DIRAC Distributed Computing YAN Tian From to
LOCKBOX PROCESSING A Study in Efficiency. Introduction & Services  Sunwest Bank internal Lockbox – 30+ years  AQ2 Technologies AQURIT 7 Solution  Types.
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
Intermediate Condor: Workflows Rob Quick Open Science Grid Indiana University.
Intermediate HTCondor: More Workflows Monday pm Greg Thain Center For High Throughput Computing University of Wisconsin-Madison.
Unified scripts ● Currently they are composed of a main shell script and a few auxiliary ones that handle mostly the local differences. ● Local scripts.
BOINC: Progress and Plans David P. Anderson Space Sciences Lab University of California, Berkeley BOINC:FAST August 2013.
IR Homework #1 By J. H. Wang Mar. 5, Programming Exercise #1: Indexing Goal: to build an index for a text collection using inverted files Input:
Intermediate Condor: Workflows Monday, 1:15pm Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
HUBbub 2013: Developing hub tools that submit HPC jobs Rob Campbell Purdue University Thursday, September 5, 2013.
1 Cloud paradigm, standards and middleware for PGS * ESRIN *
Geant4 GRID production Sangwan Kim, Vu Trong Hieu, AD At KISTI.
Five todos when moving an application to distributed HTC.
Ways to Connect to OSG Tuesday, Wrap-Up Lauren Michael, CHTC.
Getting the Most out of HTC with Workflows Friday Christina Koch Research Computing Facilitator University of Wisconsin.
Turning science problems into HTC jobs Tuesday, Dec 7 th 2pm Zach Miller Condor Team University of Wisconsin-Madison.
Large Input in DHTC Thursday PM, Lecture 1 Lauren Michael CHTC, UW-Madison.
Christina Koch Research Computing Facilitators
Advanced Computing Facility Introduction
Intermediate HTCondor: More Workflows Monday pm
Large Output and Shared File Systems
Licenses and Interpreted Languages for DHTC Thursday morning, 10:45 am
HPC usage and software packages
XNAT at Scale June 7, 2016.
Welcome to Indiana University Clusters
Examples Example: UW-Madison CHTC Example: Global CMS Pool
IW2D migration to HTCondor
PYTHON: AN INTRODUCTION
OSG Connect and Connect Client
Troubleshooting Your Jobs
High-Throughput Computing in Atomic Physics
Final Project Details Note: To print these slides in grayscale (e.g., on a laser printer), first change the Theme Background to “Style 1” (i.e., dark.
Map Reduce Workshop Monday November 12th, 2012
Introduction to High Throughput Computing and HTCondor
Brian Lin OSG Software Team University of Wisconsin - Madison
Communications & Computer Networks Resource Notes - Introduction
The Challenge Collaboration Teachers PTO Students Anyone.
gLite Job Management Christos Theodosiou
Quick Tutorial on MPICH for NIC-Cluster
Credential Management in HTCondor
Troubleshooting Your Jobs
Brian Lin OSG Software Team University of Wisconsin - Madison
Thursday AM, Lecture 1 Lauren Michael
Thursday AM, Lecture 2 Brian Lin OSG
Presentation transcript:

Thursday AM, Lecture 2 Lauren Michael CHTC, UW-Madison Data Considerations Thursday AM, Lecture 2 Lauren Michael CHTC, UW-Madison

Overview – Data Handling Review of HTCondor Data Handling Data Management Tips What is ‘Large’ Data? Dealing with Large Data Next talks: local and OSG-wide methods for large-data handling

Overview – Data Handling Review of HTCondor Data Handling Data Management Tips What is ‘Large’ Data? Dealing with Large Data Next talks: local and OSG-wide methods for large-data handling

Review: HTCondor Data Handling exec server exec server exec server submit server exec server HTCondor submit file executable dir/ input output (exec dir)/ executable input output

Network bottleneck: the submit server exec server exec server exec server submit server exec server HTCondor submit file executable dir/ input output (exec dir)/ executable input output

Overview – Data Handling Review of HTCondor Data Handling Data Management Tips What is ‘Large’ Data? Dealing with Large Data Next talks: local and OSG-wide methods for large-data handling

Data Management Tips Determine your job needs Determine your batch needs Leverage HTCondor data handling features! Reduce per-job data needs

Determining In-Job Needs “Input” includes any files transferred by HTCondor executable transfer_input_files data and software “Output” includes any files copied back by HTCondor output, error

Data Management Tips Determine your job needs Determine your batch needs Leverage HTCondor data handling features! Reduce per-job data needs

First! Try to reduce your data split large input for better throughput eliminate unnecessary data file compression and consolidation job input: prior to job submission job output: prior to end of job moving data between your laptop and the submit server

Overview – Data Handling Review of HTCondor Data Handling Data Management Tips What is ‘Large’ Data? Dealing with Large Data Next talks: local and OSG-wide methods for large-data handling

What is big large data? For researchers “big data” is relative What is ‘big’ for you? Why?

What is big large data? For researchers “big data” is relative What is ‘big’ for you? Why? Volume, velocity, variety! think: a million 1-KB files, versus one 1-GB file

Network bottleneck: the submit server exec server exec server exec server submit server exec server HTCondor submit file executable dir/ input output (exec dir)/ executable input output

‘Large’ input data: The collaborator analogy What method would you use to send data to a collaborator? amount method of delivery words email body tiny – 10MB email attachment (managed transfer) 10MB – GBs download from Google Drive, Drop/Box, other web-accessible server TBs ship an external drive (local copy needed)

Large input in HTC and OSG What methods should you use for HTC and OSG? amount method of delivery words within executable or arguments? tiny – 10MB per file HTCondor file transfer (up to 1GB total) 10MB – 1GB, shared download from web proxy (network-accessible server) 1GB - 10GB, unique or shared StashCache (regional replication) 10 GB - TBs shared file system (local copy, local execute servers)

Large input in HTC and OSG What methods should you use for HTC and OSG? amount method of delivery words within executable or arguments? tiny – 10MB per file HTCondor file transfer (up to 1GB total) 10MB – 1GB, shared download from web proxy (network-accessible server) 1GB - 10GB, unique or shared StashCache (regional replication) 10 GB - TBs shared file system (local copy, local execute servers)

Network bottleneck: the submit server Input transfers for many jobs will coincide exec server exec server exec server submit server exec server HTCondor submit file executable dir/ input output (exec dir)/ executable input output

Network bottleneck: the submit server Input transfers for many jobs will coincide exec server exec server exec server submit server exec server HTCondor submit file executable dir/ input output (exec dir)/ executable input output Output transfers are staggered

Output for HTC and OSG amount method of delivery words within executable or arguments? tiny – 1GB, total HTCondor file transfer 1GB+ shared file system (local copy, local execute servers)

Output for HTC and OSG Why are there fewer options? amount method of delivery words within executable or arguments? tiny – 1GB HTCondor file transfer 1GB+ shared file system (local copy, local execute servers)

Exercises 2.1 Understanding a job’s data needs 2.2 Using data compression with HTCondor file transfer 2.3 Splitting input (prep for large run in 3.1)

Questions? Feel free to contact me: Next: Exercises 2.1-2.3 lmichael@wisc.edu Next: Exercises 2.1-2.3 Later: Handling large input data

blah Activate modules: Load a software module: module load modulename . /cvmfs/oasis.opensciencegrid.org/osg/modules/lmod/current/init/bash Load a software module: module load modulename List loaded modules: module list Unload a module (to prepare for another) module unload modulename

Example: Check Python from login.osgconnect.net $ module load python/2.7 $ module list Currently Loaded Modules: 1) python/2.7 $ which python /cvmfs/oasis.opensciencegrid.org/osg/modules/python-2.7.7/bin/python

Example: Python Wrapper Script #!/bin/bash # activate modules and load python2.7: . /cvmfs/oasis.opensciencegrid.org/osg/modules/lmod/current/init/bash module load python/2.7 # run my python script: python myscript.py # END