Cloudifying Source Code Repositories: How much does it cost? LADIS 2009 Big Sky, Montana Michael Siegenthaler Hakim Weatherspoon Cornell University.

Slides:



Advertisements
Similar presentations
Week 2 DUE This Week: Safety Form and Model Release DUE Next Week: Project Timelines and Website Notebooks Lab Access SharePoint Usage Subversion Software.
Advertisements

Introduction To GIT Rob Di Marco Philly Linux Users Group July 14, 2008.
Wait-free coordination for Internet-scale systems
Intro to Version Control Have you ever …? Had an application crash and lose ALL of your work Made changes to a file for the worse and wished you could.
PROVENANCE FOR THE CLOUD (USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES(FAST `10)) Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard.
Flavio Junqueira, Mahadev Konar, Andrew Kornev, Benjamin Reed
Revision Control Systems Amin Tootoonchian Kian Mirjalali.
Version Control What it is and why you want it. What is Version Control? A system that manages changes to documents, files, or any other stored information.
Dedi Rahmawan Putra  Shared Document  Conventional Ways  Common Problems  What is TortoiseSVN  Advantages over another tools  Basic Concepts.
1 SVN – Tool for Version Control Talal Ahmed ( ) Ali Ahsan ( ) Adil Zia Khan ( ) Farid Ullah ( )
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
CVS II: Parallelizing Software Development Author: Brian Berliner John Tully.
Source Control Repositories for Enabling Team Working Svetlin Nakov Telerik Corporation
SubVersioN – the new Central Service at DESY by Marian Gawron.
Distributed Deadlocks and Transaction Recovery.
Version control Using Git 1Version control, using Git.
Version Control with Subversion. What is Version Control Good For? Maintaining project/file history - so you don’t have to worry about it Managing collaboration.
Introduction to Version Control with SVN & Git CSC/ECE 517, Fall 2012 Titus Barik & Ed Gehringer, with help from Gaurav.
Source Control Repositories for Team Collaboration: SVN, TFS, Git.
Subversion (SVN) Tutorial for CS421 Dan Fleck Spring 2010.
1 Introductory Notes on the Git Source Control Management Ric Holt, 8 Oct 2009.
Service Computation 2010November 21-26, Lisbon.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Tekin Bicer Gagan Agrawal 1.
Version control Using Git Version control, using Git1.
Cloudifying Source Code Repositories: How much does it cost? 1 Hadi Salimi, Distributed Systems Labaratory, School of Computer Engineering, Iran University.
2010. The Subversion Dilemma Check in buggy code and drive everyone else crazy Avoid checking it in until it’s fully debugged or.
Version Control Systems academy.zariba.com 1. Lecture Content 1.What is Software Configuration Management? 2.Version Control Systems (VCS) 3.Basic Git.
…using Git/Tortoise Git
Git workflow and basic commands By: Anuj Sharma. Why git? Git is a distributed revision control system with an emphasis on speed, data integrity, and.
Information Systems and Network Engineering Laboratory II DR. KEN COSH WEEK 1.
Object-Oriented Analysis & Design Subversion. Contents  Configuration management  The repository  Versioning  Tags  Branches  Subversion 2.
Version Control Menggunakan TortoiseSVN
Subversion (SVN) A Revision Control System Successor to CVS Carlos Armas Hervey Allen.
Copyright © 2015 – Curt Hill Version Control Systems Why use? What systems? What functions?
CS370 Spring 2007 CS 370 Database Systems Lecture 1 Overview of Database Systems.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
1 Brief Introduction to Revision Control Ric Holt.
Version Control with SVN Images from TortoiseSVN documentation
Version Control Systems. Version Control Manage changes to software code – Preserve history – Facilitate multiple users / versions.
GIT.
Sabriansyah R.A Version Control. The Repository Subversion adalah sistem tersentralisasi untuk informasi sharing Repository adalah pusat penyimpanan data.
Version Control System
Introduction to Git Yonglei Tao GVSU. Version Control Systems  Also known as Source Code Management systems  Increase your productivity by allowing.
Vignesh Ravindran Sankarbala Manoharan. Infrastructure As A Service (IAAS) is a model that is used to deliver a platform virtualization environment with.
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
Subversion (SVN) Tutorial for CS421 Dan Fleck Spring 2010.
ZOOKEEPER. CONTENTS ZooKeeper Overview ZooKeeper Basics ZooKeeper Architecture Getting Started with ZooKeeper.
Information Systems and Network Engineering Laboratory I DR. KEN COSH WEEK 1.
MyUWO Portal Updates By: Emily Al Bulushi Richard Sheppard Steven Beshensky.
Source Control Repositories for Enabling Team Working Doncho Minkov Telerik Corporation
NALINI S. NAUTIYAL SYSTEM SOFTWARE DIVISION Subversion.
DIGITAL REPOSITORIES CGDD Job Description… Senior Tools Programmer – pulled August 4 th, 2011 from Gamasutra.
Source Control Dr. Scott Schaefer. Version Control Systems Allow for maintenance and archiving of multiple versions of code / other files Designed for.
4 Version control (part 1)
Information Systems and Network Engineering Laboratory II
Version Control Systems
CSS534: Parallel Programming in Grid and Cloud
Git Practice walkthrough.
Source Control Dr. Scott Schaefer.
Version control, using Git
Subversion.
Concurrent Version Control
Cloud based Open Source Backup/Restore Tool
Source Code Management
Building a Database on S3
Wait-free coordination for Internet-scale systems
Git GitHub.
ZooKeeper Justin Magnotti 9/19/18.
Erik Vollebekk Application Architect
Presentation transcript:

Cloudifying Source Code Repositories: How much does it cost? LADIS 2009 Big Sky, Montana Michael Siegenthaler Hakim Weatherspoon Cornell University

A Brief History of Cloud Computing Large scale Application-specific architectures Developed for in-house use Available for general usage Inexpensive, even for small or medium scale deployments

What is Revision Control? Repository for data (source code) –All changes are tracked by date and author –Branching and merging Why move it to the cloud? –Resilient storage –No physical server to administrate –Scale to larger communities (SourceForge)

Available Tools Subversion, revision control system –Free, open-source –Very popular –Rigid consistency model Amazon S3, cloud storage service –Eventual consistency Yahoo ZooKeeper, coordination service –Free, open-source

Various alternative solutions exist… Cloud ComputingP2P Subversion etc. Repository stored persistently in the cloud One true, consistent repository exists GIT etc. Repository stored at every client Many repository copies, converging eventually

Outline Costs of using cloud storage for revision control Architecture of a simple solution Performance evaluation

How to Measure Costs Each revision stored as two files on disk –Revision data (diff against earlier revisions) –Revision properties (author, log message…) Calculate bandwidth, per-transaction, and storage costs of pushing each revision into S3 over time

Storage Costs

Storage Trends

Outline Costs of using cloud storage for revision control Architecture of a simple solution Performance evaluation

Today’s architecture for source code revision control...

A cloud-based architecture... EC2 S3

Two simultaneous commits… EC2 S3 Rev Followed by an update… Leads to data loss!

EC2 S3

Commit Process ZooKeeper

How ZooKeeper is Utilized Acquire a lock by creating a node with an atomically increasing sequence number /s3vn/ /lock/lock- List contents of /s3vn/ /lock and wait if a node with a lower number than ours exists Store current revision number: /s3vn/ /current Delete the lock node to release the lock

Outline Costs of using cloud storage for revision control Architecture of a simple solution Performance evaluation

Usage Observations Apache Foundation –1 repository, 74 projects –Average 1.10 commits per minute –Maximum 7 commits per minute Debian community –506 repositories –Average 1.12 commits per minute (aggregate) –Maximum 6 commits per minute (aggregate)

Results Checkouts (Reads)Commits (Writes) Adding servers improves the user experience

Conclusion Storing source code repositories in the cloud is feasible… …and very inexpensive Only minor changes to existing revision control systems are necessary to robustly take advantage of cloud storage

Lock Service: ZooKeeper Open source tool developed by Yahoo! Tree namespace with storage in nodes –Sequence nodes: automatically append a sequence number –Ephemeral nodes: disappear when the session that created them is closed –Clients can watch a node for changes All clients see changes in same order

s3vn Components mkrepo: Create a repository in an S3 bucket fetchrepo: Copy a repository from S3 to the local disk updaterepo: Background process to fetch changes from S3 as they are made start-commit-hook: Acquire a global write lock when a new revision is committed post-commit-hook: Upload the new changes to S3 and release the write lock