For any development organisation, the security of the development source code is paramount. And yet, many organisations simply rely on standard backups to ensure them against data loss or corruption. Whilst this certainly provides an element of protection, it is far from an effective solution.
During any development project, files will be edited and saved continuously as functionality is added and refined. Very few development projects contain just one file; the overwhelming majority are made up of numerous pieces of source code which interlink and reference each other extensively. A single change to a software product may involve numerous small edits to numerous source files and if just one file is not in sync with its peers, the product will contain errors. If the developer is lucky, the error gets caught at compile time. However for many projects, the error may only rear its head once in production.
Generally speaking, backups work by taking a snapshot of the data at a certain point in time. To improve efficiency and speed, differential backups will be used only capturing and storing changes to each file, rather than the whole file being stored each time. What makes backups alone a less-than-adequate solution for protecting source code is the temporal nature of the snapshots. Although it’s theoretically possible to schedule backups every few minutes, to do so would cause unnecessarily high resource usage. What’s needed is a backup that can be triggered on demand so as to provide iterative backups; storing the development project at each stage of development once changes have been completed. But even an on-demand backup would only allow one version of any particular file to exist at once. What if you have more than one developer working on a project?
This is where version control (also referred to as revision control or source control) comes into the picture. Version control records the state of each file each time it is updated, together with the date and time of each change and who made the change. This allows for a project to be restored to any point in time and different revisions to be compared or merged. Some version control systems allow for branches or forks in development; parallel development processes on different versions of the same project, with the ability to re-merge the different versions at a later point in time.
There are numerous version control systems available, both commercial and open source. Some of the more popular include:
-
Bazaar – Open source and free to use. Used by Ubuntu and MySQL.
-
BitKeeper – Commercial. Formerly used for the Linux kernel.
-
CVS – Open source and free to use. Maintained but no longer actively developed.
-
Git – Open source and free to use. Developed by Linus Torvalds and used for the Linux kernel and well as by Facebook, Yahoo! And Ruby on Rails.
-
Mercurial – Open source and free to use. Used by the Mozilla Foundation and NetBeans. Implemented by Google Code.
-
Apache Subversion – Open source and free to use. Used by the Apache Software Foundation, FreeBSD, Ruby, SourceForge and PHP. Implemented by GoogleCode and CodePlex.
-
Team Foundation Server – Commercial. Microsoft’s version control and project tracking system. Supported by Visual Studio and available either as a standalone component or integrated into Visual Studio Team System.
Some systems work on a traditional client-server mode, where individual files are checked in and out as needed, whilst others employ a distributed repository, where each user’s working copy essentially represents a different branch of the development tree.
Version Control within Idea 11
When deciding upon which version control system to use internally, we at Idea 11 had to consider a number of specific requirements:
- We have a number of developers working from different locations
- We use both Windows and MacOS for development
- We develop in a number of different applications and in a number of different languages
- We have more than one project being developed at any one time
When developing solutions, both software and infrastructure, Idea 11 adhere to a set of guiding principles. We therefore adhere to the same principles when implementing a solution for our own purposes. For us, a version control system had to be:
- Cost-effective,
- Simple and easy to use,
- Be based upon trusted technology, and
- Supported.
In addition, to meet our requirements, the solution had to support multiple users and be both platform- and location-independent.
A number of the more popular version control systems could be ruled out immediately. Team Foundation Server is Microsoft-centric. Other than a web-client, there is no support for non-Windows clients. CVS is no longer actively developed or supported. Bitkeeper don’t publish their pricing structure but some research showed it was simply too pricey remain in contention.
For each remaining version control system, there are numerous variations. In addition to hosting one’s own repositories, numerous organisations offer version control hosting services; cloud-based version control, one might call it. Given that Idea 11 develop software for the cloud, a hosted solution was obviously something to consider.
But before deciding upon whether to use a cloud-based code host or simply host our own repositories, we needed to decide upon which version control technology to use. That decision, to some extent, would define which (if any) cloud services we could utilise.
Git
Git was originally developed by Linus Torvalds to maintain the Linux kernel source when Bitkeeper suddenly changed their licensing terms. Linus Torvalds was never exactly shy about expressing his opinions of other version control systems, CVS and Subversion in particular, so when he suddenly required an alternative to Bitkeeper, he created his own. What started life as a hastily-written collection of C programs and Bash scripts has now evolved into a stable technology upon which businesses have been built. GitHub, built on RackSpace’s cloud storage platform, offers free open-source repository hosting as well as private repositories starting from a few dollars per month. In a relatively short time, GitHub has become the preferred code hosting solution for many major software organisations include Facebook and SixSignals.
Natively, Git is command-line based and relies heavily on bash scripts. Various Windows-based implementations exist, including a Git-specific clone of the Tortoise client as well as various plug-ins for IDEs such as Visual Studio and Eclipse. Git can also be run on Cygwin using POSIX emulation. However, the majority of these clients are in the early stages of development. With accurate documentation virtually non-existent, installation on Windows is particularly painful and many of the various Git clients tried were unstable or unable to perform basic tasks, like checking out a repository or resolving conflicts. Although it is possible to utilise Git without a GUI client, having to memorise arcane command lines and having to work within the Bash shell really didn’t impress our Windows developers (the author of this post included). Git was therefore ruled out of contention.
Bazaar
Bazaar is a distributed version control system that’s written in Python and part of the GNU project. It has a similar command set to CVS and Subversion and is interoperable with both, allowing for Subversion repositories to be checked out, modified and then merged back in. Git and Mercurial are also supported but in read-only mode. Bazaar is used by many top-tier open source projects, such as MySQL and Ubuntu and supported by SourceForge and Launchpad for open-source project hosting.
Bazaar runs on Windows, MacOS, Solaris plus many variants of Linux. The Bazaar Explorer provides a consistent interface to the repositories across all operating systems with a Windows Explorer-style interface. The Bazaar Explorer integrates with a range of Diff and Merge tools, as well as a selection of the Bug Tracker. Being a Distributed system, it can work with or without a centralised server. Various third party tools further extend the functionality, such as Loggerhead (a web interface for browsing repositories over HTTP) and various plug-ins for many supported IDEs, including Eclipse and Visual Studio 2005.
Upon evaluation, the tools looked stable and well-developed. The product was straight-forward to set up and able to import existing repositories with just a few clicks. However, the lack of a plug-in for one of Idea 11’s key development environments, Visual Studio 2008, left it slightly wanting.
Mercurial
Like Git, Mercurial started life as a Linux application when Bitkeeper changed its licensing terms and has since been ported to both Windows and MacOS. Like Git, it is also primarily command-line based but various GUI extensions exist; notably a port of the ubiquitous Tortoise client – TortoiseHg – that provides shell integration in Windows and within the GNOME Nautilus file manager. GoogleCode, CodePlex and SourceForge all support Mercurial for open-source code hosting and Mercurial can boast a number of high-profile open-source users, including the Mozilla Foundation, OpenSolaris and OpenOffice.org.
Mercurial integration into IDEs is relatively limited; Eclipse and NetBeans extensions exist but no Visual Studio plug-in seems to have been written. Mercurial’s interoperability with other version control systems is limited to being able to import foreign repositories including CVS, Subversion and Bazaar as well as from a number of less well-known version control systems.
Apache Subversion
Apache Subversion (often referred to as SVN) was originally intended to be a successor to the then prevalent CVS system. It is among the older of the active technologies and is supported by a large number of open-source projects, including the Apache Foundation, FreeBSD, Ruby, Mono, PHP and SourceForge. Google Code, CodePlex and SourceForge also provide Subversion repository hosting for open source projects. A number of companies also provide commercial Subversion hosting for proprietary code. Subversion has now been accepted into the Apache Integrator, marking the beginning of the process to evolve into a top-level Apache project. Subversion handles remote and local repositories, with remote access available though http/s and its own SVN protocol.
Although Subversion is essentially a framework for which a command-line interface provides basic functionality, numerous third-party tools implement the Subversion framework to provide GUI access and shell integration and integration into popular IDEs, as well as single package server implementations. These include the excellent Subversion implementation of Tortoise, TortoiseSVN, the AnkhSVN plug-in for Visual Studio 2008 and VisualSVN Server. On the Mac front, less choice exists, but Versions and Cornerstone seem to be the main contenders.
Conclusion
The final decision was relatively simple. Although Bazaar came very close with its ease of use and powerful native UI, Subversion had all of those plus available plug-ins for Visual Studio 2008. Its developed nature, wide support and the plethora of available third-party tools made it the obvious choice. Although its detractors claim that other systems are faster, for the projects that Idea 11 undertake, commit, revert and check-out operations are commendably quick whether working locally or remotely, taking remote network speed into account.
With the technology decision made, the final choice was whether to use a commercial repository hosting service, such as Codesion, or to use a privately-hosted HTTP repository. In the end, we finally decided on a hybrid approach. We decided to host our own Subversion repositories using the VisualSVN Server component. But we also have to protect our repositories against data loss or corruption. By using the Jungle Disk Server Edition, we securely back-up our repositories to the Amazon S3 storage cloud and maintain version histories of the repositories. We can even, if only one developer is working on a project, use the JungleDisk sync functionality to maintain both remote and local copies of repositories, increasing speed of access.
This entry was posted by DAN HALFORD on Monday, 8 February 2010 at 3:30 AM and is filed under
version control and
cloud computing.