Presentation is loading. Please wait.

Presentation is loading. Please wait.

LCGAA nightlies infrastructure

Similar presentations


Presentation on theme: "LCGAA nightlies infrastructure"— Presentation transcript:

1 LCGAA nightlies infrastructure
Alex Hodgkins

2 Our nightlies Build and test various software projects each night
Provide a nightlies summary page that displays all the results from the previous nights build (number of build/test errors) and historical data Place all of the files onto AFS and make the log files accessible from the nightlies page

3 Infrastructure Tasks We use CMT to manage the checkout, build, install and test of each software project, as well as any dependencies. The infrastructure is responsible for: Automating the usage of CMT to perform all build steps Putting results into our database Moving builds to AFS (for Mac and Windows) Deleting old build data

4 Problems with previous infrastructure
Queue No way of viewing the queue No way to add/remove a platform to be built to the running queue – the entire queue must be reset Client No automated way to kill a started build If a platform is manually re-built on the same day it will overwrite the previous build completely Builds occasionally hang, and stay hung silently until they are manually killed Scheduling Clients must be launched manually or the crontab edited to schedule a client to run later The moving of builds to AFS is currently scheduled by crontab, instead of when a build finishes Reporting Once a build has been requested from the server it is marked as completed – the server doesn’t know who is building it or what stage it’s at The infrastructure sends up to 20 s per project per night

5 New Infrastructure General overview. Clients do the build. Server distributes what to build.

6 General client-server improvements
The client-server interaction has been re-designed to be much more flexible: The server now distributes builds to any idle clients Client updates the server with its progress All client calls to the server are run in a separate thread so the build wont be halted A job can be assigned to a specific client Once a project has finished, the synchronisation job is called on the server to copy mac/windows builds to AFS

7 Client server registration
The protocol we use for client server interaction (XMLRPC) is stateless, so to keep track of connected clients extra steps were required: The server maintains a list of clients that have connected Each client can be in one of three states; connected, unknown and disconnected The server is required to make sure any ‘connected’ clients are still reachable, by frequently checking that they respond to requests Diagram?

8 The actual registration process

9 Server design The server is split into three separate threads:
The pinger – responsible for pinging all the connected clients to check they are still responding The listener – responsible for managing all XMLRPC requests, but can only process one request at a time. For requests that will not return instantly (e.g. a request to copy a build to AFS) a new thread is created The dispatcher – responsible for continually checking the database for any new build requests, and distributing to the client.

10 Client design The client also has an XMLRPC server that is primarily used to get the client’s capabilities and start a build on the client On start up the client decides how many projects it can build at once (based on the number of cores), and allocates the necessary amount of builder slots When a build request is received the client starts a new build (the SlotBuilder class) in a separate thread, and stores the thread instance During the build the client tells the server which project and which build step it is currently on If the client decides a step has failed (e.g. checkout returned non zero) it notifies the server, and both the specific project and the job are marked as failed Try to make this more like slide 9? Mention more detail about the stuff I’ve done.

11 Database changes LCGSOFT and nightlies databases have been merged to allow releases to be managed through the same interface All configuration is now stored in the database, removing the redundancy of the configuration files The job queue is now kept in the database, allowing a decentralised queue that can be viewed from anywhere and does not rely on our server instance Cancelled slot configurations are kept permanently and remain linked to specific jobs, so we can easily see what was built on any given date, as well as the machine they were built on The server and request interface both now use django for database interaction which gives much more compact, flexible and readable code (47 lines vs. 275 for sending results)

12 Code cleanup The original nightlies scripts have been edited by many people since they were first written, leaving a lot of redundant code and many changes had been hacked in, making it difficult to integrate any major features: Redundant code has been removed, and the majority of the remaining code re-written. Comments have also been added Documentation has been created to accompany the nightlies scripts, explaining how they work Clearer logging has been added throughout Unit test suite and integration tests added

13 Remaining tasks Everything I have implemented so far has been fully tested and is ready for production For now the new scripts place all build data into both the databases, so they can be slowly phased in without disrupting any end users There are three remaining tasks that are essential to the new scripts being placed fully into production: Incremental builds and non-nightly builds must be added first, or there would be no use in a request interface The request interface to allow jobs to be added and job configurations to be edited A new summary page (to be re-written using django) so that all the results can be seen quickly and easily

14 Remaining jobs (contd.)
We also hope to eventually have the following: Ability to kill a running build An automated release process – releases should be done automatically by marking an existing build in the request interface to be released Allow the building of externals through the request interface Ability to shut a client down from the server


Download ppt "LCGAA nightlies infrastructure"

Similar presentations


Ads by Google