Presentation is loading. Please wait.

Presentation is loading. Please wait.

Large, Fast, and Out of Control: Tuning Condor for Film Production Jason A. Stowe Software Engineer Lead - Condor CORE Feature Animation.

Similar presentations


Presentation on theme: "Large, Fast, and Out of Control: Tuning Condor for Film Production Jason A. Stowe Software Engineer Lead - Condor CORE Feature Animation."— Presentation transcript:

1 Large, Fast, and Out of Control: Tuning Condor for Film Production Jason A. Stowe Software Engineer Lead - Condor CORE Feature Animation

2 Submitter Session Manager FAMDB Condor View CORE User Facing Back End CORE's Farm & Middleware 1000 2.8 GHz. Processors Linux 4GB RAM 70-100 Terabytes Several Filers 50 Million Renders so far (Vanilla Universe) Condor_startd starter Condor_render Condor_schedd 64 Mac Procs 4 Managing Machines

3 Goals and Software Goals ●High Throughput & Efficiency ●Easy Condor Submission and Integration Priority Management – Key to Throughput

4 Initial Configuration Software/Policies ●User Priority ●Behavior Flags - STARTD Issues ●NFS issues ●Out of Order Execution ●Priority Management 320 Procs 1 Main Filer RenderMan Schedd Server Workstation Schedds (Sched Everything Else) Middleware CentralMgr

5 How CG Productions Work Traditionally, Movie scripts = Group of Sequences Movie's Sequences ~ Play's Scenes Sequence = Group of Shots Assets = Sets/Characters/Props/... Prioritize work-units instead of users? Design Model Texture Surfacing Assets Design Layout Animation Lighting Composite Shots Two Pipelines

6 Accounting Groups: Take 1 Software/Policies ●Contracted Wisconsin: Accounting Groups(AG) ●Job =unique AG ●Added Filers, Fix drivers Issues ●Accountant Overload ●Slow Finishing... 360 Procs Many Filers General Schedd Server Workstation Schedds (Sched Certain Jobs) Middleware Central Mgr 16 Mac Procs

7 Accounting Groups: Take 1 Every job got some resources, but not enough to finish fast for Production. Moved quickly to Take 2...

8 Accounting Groups: Take 2 Software/Policies ●Shots Get Unique AG ●Unify Schedds to fix out of order cases Issues ●Wanted: Farm % Priority ●Classic Schedd Overload: “Claimed Idle”s 360 Procs Many Filers General Schedd Server Fewer Workstation Schedds (Sched Certain Jobs) Middleware Central Mgr 32 Mac Procs

9 Accounting Groups: Final? Software/Policies ●“Priority User” - p1 p2 p3 ●Multiple Server & Schedds ●ASAP & Department Flags Issues ●Department “Pools” ●Preemption = Bad 500 Procs Many Filers 3 Schedd Servers Middleware Central Mgr 32 Mac Procs

10 Accounting Groups: Final? Sharing Power is a difficult task for anyone, especially users with deadlines. Need a Quality of Service guarantee: resources will always be available without preemptive department pools...

11 Group Quotas save the day 1000 Procs Many Filers 3 Schedd Servers Middleware Central Mgr 64 Mac Procs Software/Policies ●Department Groups g_lfx, g_mdl, g_chr, etc. ●Quality Of Service ●Nighttime Priority Issues ●Long negotiation Cycles Total Cycle: 6 minutes Server loads >6

12 Middle ware Performance Optimization 2 Schedd Servers Central Mgr 64 Mac Procs Goal: Speed Negotiator ●Remove Many Groups ●Significant Attributes (SIGNIFICANT_ATTRIBUTES) ●Schedd Submit Algorithm ●Separate Middleware & Central Manager Servers ●Negotiator Cycle 20 sec delay => 3 sec (NEGOTIATOR_CYCLE_DELAY) 1000 Procs Many Filers

13 Optimization Results Performance Before => After: ● Removed Groups: 6 => 5.5 min ● Significant Attributes: 5.5 => 3 min ● Schedd Algorithm: 3 => 1.5min ● Separate Servers:1.5 => 0.6min ● Cycle delay:0.6 => 0.33 min ● Server Loads:<1 Middleware <2 Central Manager

14 Lessons Learned ● Remove pre-emption where possible ● Simplify Startd/Negotiator (Control) policies: ● Make Consistent/remove special cases ● Understandable farm behavior ● Keep Server Functions Simple ● Use Accounting Groups to guarantee relative percentage allocation of resources ● Use Group Quotas instead of machine-specific RANK policies for better throughput

15 Thank you Condor Team University of Wisconsin CORE Any Questions? stowe@corefa.com


Download ppt "Large, Fast, and Out of Control: Tuning Condor for Film Production Jason A. Stowe Software Engineer Lead - Condor CORE Feature Animation."

Similar presentations


Ads by Google