Blackbird: Accelerated Course Archives Using Condor with Blackboard Sam Hoover, IT Systems Architect Matt Garrett, System Administrator.

Slides:



Advertisements
Similar presentations
Info to Enterprise Migration Implementation Case Study: SBC Corporation Presented to the Crystal Decisions Regional Users Group for the Bay Area on October.
Advertisements

Complete Event Log Viewing, Monitoring and Management.
Introducing FailSafeSolutions Online Backup Software.
Introduction to Oracle
Visit : Call Us: US: , India:
Visit : Call Us: US: , India:
2 Copyright © 2005, Oracle. All rights reserved. Installing the Oracle Database Software.
Week 6: Chapter 6 Agenda Automation of SQL Server tasks using: SQL Server Agent Scheduling Scripting Technologies.
Complete Event Log Viewing, Monitoring and Management.
 Management has become a multi-faceted complex task involving:  Storage Management  Content Management  Document Management  Quota Management.
11 BACKING UP AND RESTORING DATA Chapter 4. Chapter 4: BACKING UP AND RESTORING DATA2 CHAPTER OVERVIEW Describe the various types of hardware used to.
DIY SIS Integration An Overview and Explanation of the Missouri Baptist University Snapshot System Copyright © 2010, All Rights Reserved.
Wilma Hodges  Began faculty training and moving content in Nov  Original plan was to be fully migrated to Sakai by.
Network Redesign and Palette 2.0. The Mission of GCIS* Provide all of our users optimal access to GCC’s technology resources. *(GCC Information Services:
Delphix User Experience
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 12: Managing and Implementing Backups and Disaster Recovery.
Report Distribution Report Distribution in PeopleTools 8.4 Doug Ostler & Eric Knapp 7264.
2 Copyright © 2009, Oracle. All rights reserved. Installing your Oracle Software.
WDK Driver Test Manager. Outline HCT and the history of driver testing Problems to solve Goals of the WDK Driver Test Manager (DTM) Automated Deployment.
DevCon ‘11 Center for Instructional Delivery. DevCon ‘11 Enrolling in Blackboard Learn for Campus Edition Alumni.
Setting Up a Sandbox Presented by: Kevin Brunson Chief Technology Officer.
Zach Miller Condor Project Computer Sciences Department University of Wisconsin-Madison Flexible Data Placement Mechanisms in Condor.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 12: Managing and Implementing Backups and Disaster Recovery.
Learning Information Services Exchanging Data Between Enterprise Systems.
Backup & Recovery 1.
Utilizing Condor and HTC to address archiving online courses at Clemson on a weekly basis Sam Hoover 1 Project Blackbird Computing,
Maintaining a Microsoft SQL Server 2008 Database SQLServer-Training.com.
5 Copyright © 2004, Oracle. All rights reserved. Using Recovery Manager.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 12: Managing and Implementing Backups and Disaster Recovery.
Chapter 18: Windows Server 2008 R2 and Active Directory Backup and Maintenance BAI617.
15 Copyright © 2005, Oracle. All rights reserved. Performing Database Backups.
Condor Tugba Taskaya-Temizel 6 March What is Condor Technology? Condor is a high-throughput distributed batch computing system that provides facilities.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Robert Fourer, Jun Ma, Kipp Martin Copyright 2006 An Enterprise Computational System Built on the Optimization Services (OS) Framework and Standards Jun.
Nightly Releases and Testing Alexander Undrus Atlas SW week, May
Maintaining File Services. Shadow Copies of Shared Folders Automatically retains copies of files on a server from specific points in time Prevents administrators.
15 Copyright © 2007, Oracle. All rights reserved. Performing Database Backups.
CourseLife: An Archival Solution Kelly Roark, Northwestern University Patricia Goldweic, Northwestern University Brian Nielsen, Northwestern University.
Selling the Storage Edition for Oracle November 2000.
Introduction to the Adapter Server Rob Mace June, 2008.
Condor Team Welcome to Condor Week #10 (year #25 for the project)
CERN IT Department CH-1211 Geneva 23 Switzerland t Daniel Gomez Ruben Gaspar Ignacio Coterillo * Dawid Wojcik *CERN/CSIC funded by Spanish.
11 DISASTER RECOVERY Chapter 13. Chapter 13: DISASTER RECOVERY2 OVERVIEW  Back up server data using the Backup utility and the Ntbackup command  Restore.
Module 10 Administering and Configuring SharePoint Search.
Introduction to CS520/CS596_026 Lecture Two Gordon Tian Fall 2015.
A Networked Machine Management System 16, 1999.
Diagnostic Pathfinder for Instructors. Diagnostic Pathfinder Local File vs. Database Normal operations Expert operations Admin operations.
DDM Monitoring David Cameron Pedro Salgado Ricardo Rocha.
The Million Point PI System – PI Server 3.4 The Million Point PI System PI Server 3.4 Jon Peterson Rulik Perla Denis Vacher.
3 Copyright © 2006, Oracle. All rights reserved. Using Recovery Manager.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
SPI NIGHTLIES Alex Hodgkins. SPI nightlies  Build and test various software projects each night  Provide a nightlies summary page that displays all.
Maria Girone CERN - IT Tier0 plans and security and backup policy proposals Maria Girone, CERN IT-PSS.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Condor and (the) Grid (one of.
Active-HDL Server Farm Course 11. All materials updated on: September 30, 2004 Outline 1.Introduction 2.Advantages 3.Requirements 4.Installation 5.Architecture.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
SQL Database Management
Backups for Azure SQL Databases and SQL Server instances running on Azure Virtual Machines Session on backup to Azure feature (manual and managed) in SQL.
Cameron Blashka | Informer Implementation Specialist
Application Maintenance Toolset (AMT) Applying Patches (CTPs)
Understanding and Improving Server Performance
SQL Replication for RCSQL 4.5
U.S. ATLAS Grid Production Experience
Maximum Availability Architecture Enterprise Technology Centre.
POP: Building Automation Around Secure Server Deployment
The Condor JobRouter.
data backup & system report
IBM Tivoli Storage Manager
Best Practices in Higher Education Student Data Warehousing Forum
Application Maintenance Toolset (AMT) Applying Patches (CTPs)
Presentation transcript:

Blackbird: Accelerated Course Archives Using Condor with Blackboard Sam Hoover, IT Systems Architect Matt Garrett, System Administrator

End of Semester archives of all online courses in Blackboard since implementation in GB Oracle DB tied to a 1.3 TB Content system with over 13 million files Spring 2010: 4610 active Blackboard courses, 31,372 total courses in Blackboard Full system backups once a week, nightly incremental backups of entire system

The Archive Problem Blackboard is a mission critical system Why is 85.5 hours for archives a problem? Start of new semester vs. normal operations Time between semesters is short and getting shorter Faculty have to wait to set up next semester’s courses End of semester processes

Why do we need course archives?

Student Add / Drop at start of semester

Loss of course content or an entire course

CRLT archive uses Grade disputes

Blackboard EoS archives

The Archive Problem

Blackboard provides a script for executing batch archives given a list of courses as input. Weekly archive process at Clemson began in Fall 2006 after an accidental deletion of many courses. Started out splitting the course list into four equal chunks and giving each server ¼ of the total course list. All four servers usually finished within 2 hours of each other, total time for the batch was < 24 hours. By Fall 2008, archiving the active courses took 85.5 hours, and the servers finished at widely varying times.

The Archive Problem

Who wants to work weekends?

Blackboard archive script /usr/local/blackboard/apps/content- exchange/bin/batch_ImportExport.sh Archive/Restore: The Archive Course function creates a record of the Course including User interactions. It is most useful for recalling Student performance or interactions at later time. The archive package is saved as a.ZIP file that can be restored to the Blackboard system at another time. In effect, Archive/Restore acts as a backup tool at the individual course level.

The Archive Problem

Potential Solutions?

Throw money at the problem?

Add more servers?

The Archive Problem

Potential solutions Write our own job scheduler? Could we take advantage of the other 3 (CPUs)? How do we monitor performance so end user (Blackboard) experience isn’t impacted? Use a DB to store and manage the queue? What about security? Has anyone else out there already done this?

Project Blackbird +

Condor to the rescue? Job scheduler? Check Multi-core capable? Check Manage the queue? Check Performance monitoring? Check Security? Check Has anyone done this before? No

Steps in the weekly archive process Determine what to archive (active courses, orgs) Build a course list Create Blackbird submit files Submit DAGMan job to Condor Monitor Condor queue Receive notification when all courses have been archived Look for errors and verify archive integrity

Custom Condor Configuration DAGMAN_MAX_JOBS_IDLE = 25 DAGMAN_MAX_JOBS_SUBMITTED = 50 SLOTS_CONNECTED_TO_CONSOLE = 0 SLOTS_CONNECTED_TO_KEYBOARD = 0 ## Force Condor to use Blackboard Private Network NETWORK_INTERFACE = Private Blackboard Net

DAGMan example JOB UniqueCourseID /path/to/condor/submit/job/file/UniqueCourseID.bbCondor JOB UniqueCourseID2 /path/to/condor/submit/job/file/UniqueCourseID2.bbCondor JOB UniqueCourseID3 /path/to/condor/submit/job/file/UniqueCourseID3.bbCondor JOB UniqueCourseID4 /path/to/condor/submit/job/file/UniqueCourseID4.bbCondor JOB UniqueCourseID5 /path/to/condor/submit/job/file/UniqueCourseID5.bbCondor JOB UniqueCourseID6 /path/to/condor/submit/job/file/UniqueCourseID6.bbCondor SCRIPT POST UniqueCourseID6 /usr/local/CMSIntegration/bin/weeklyArchiveChecker.pl

Condor Submit example universe = vanilla requirements = (OpSys=="LINUX") && ((Arch=="INTEL") || (Arch=="X86_64")) executable = /usr/local/bin/condorSubmitArchive.pl arguments = shoover-S0000BKBRD_401001,/san/weeklyArchives/ / getenv = True log = /usr/local/logs/bbCondorLogs/archive log notification = Error notify_user = transfer_executable = False when_to_transfer_output = ON_EXIT queue 1

Blackbird archive solution

The Archive Problem

Blackbird archive solution

Blackbird Benefits Reduced total archive time from > 85 hrs to < 24 hrs Job scheduling – servers finish about the same time Zero impact to Blackboard Performance Automatic suspension/resumption of archives if Load reaches threshold on any core notification upon completion of all archives Load balancing – archive jobs are distributed as cores become available Takes advantage of all available CPU cores instead of just one core per server

Project Blackbird +

Blackbird Benefits

What did it take to implement? Have one or more multi-core (CPU) machines A large amount of shared storage for archives Choose one machine as your Central Manager Install and configure Condor on each machine Automate course list creation (Query DB or Directory) Automate Condor submit files and Condor DAGMan file creation Automate the whole thing with cron Check log files for errors upon archive completion

Where else could I use this? Any system that does batch processing that can be broken up into many jobs Recently implemented on our MySQL server to export all of the MySQL databases Reduced the export time from 10 hours to 3.5 hours on a single, quad core machine

Recent updates 64 Bit Red Hat 5.4 OS and JVM 1.6 Maximum (affordable) RAM per machine – 32 GB Web page to view Blackbird Condor Pool status Duplicate archives Error checking logs Redo any courses with errors or not completed Major Blackboard upgrade from 7.3 to 9.1 end of June

What’s next? New machines have 2 x Quad Core CPUs with HyperThreading so Condor sees 16 Cores Add out of warranty machines to the Blackboard Condor Pool (keep users off of them) Monitoring of queue (web page) Use ClassAds to specify architecture and memory requirements for large archive jobs Write code to query DB and find out what courses have changed, backup any course that has changed on a daily basis Automate installation and configuration

Please provide feedback for this session by ing The subject of the should be title of this session: [Blackbird: Accelerated Course Archives Using Condor with Blackboard]