Enhancing a Subversion Server

Subversion 1.4.x

When managing a Subversion server for a number of related projects, in an Open Source community or an enterprise, one needs to strike a useful balance between standardizing the development environment to the extent needed for effective collaboration while leaving enough flexibility to individual teams to work in a variety of ways. Individuals and projects will request particular features or customizations’ with some regularity. This article discusses when to customize, how to customize, and suggests a recommended approach to such requests.

The first and most important question is when to customize. Many software engineers and projects have a natural tendency to adapt, or customize, their tools and environment to their likes, dislikes, and needs. They often feel strongly about particular customizations and push for them, not necessarily properly considering or quantifying the costs for themselves, their project, or the organization or community at large. As long as a customization only impacts an individual, you could arguably assert that it is only one person’s problem and responsibility. However, when it impacts multiple people, a project, or the organization at large, you must balance the benefits against the costs.

The Cost of Customization

The obvious cost is the amount of effort needed to create and maintain the customization. It has to be written, tested, and maintained as the environment evolves. In addition, a customization increases the threshold to accessing data: customizations change the behavior of tools and practices and use of the tool, effectively making it more difficult for an outsider to understand, participate in, and contribute to a project. Also, each customization needs to be supported if users have issues with it. In other words, every customization makes it more expensive to operate and maintain the server, increases the threshold for users to collaborate, and adds to the variety of functionality that needs to be supported.

Therefore, there must be either a tangible benefit to the vast majority of projects or a significant benefit to a few projects. Note that the former is arguably not so much a customization but more a request for (and early prototyping of) a feature users want. The latter is arguably a customization that is too specific for the majority of users.

How to Customize

There is a variety of ways to extend the functionality that Subversion already provides. The options can be classified into two flavors: Client-side or server-side customizations.

Client side – Wrappers

Client-side customizations are solutions in which the server remains unchanged, and the customization is done on the client side by wrapping either the command-line client or client-side API calls. Client-side customizations keep the burden of customization on the individual or project requesting it, and forces the requestors to do a proper, honest, cost-benefit analysis as to whether the customization is really wanted or needed. Also, wrappers scale very well in terms of the number of customizations that can be handled: the number of people maintaining the wrappers scales with the customizations.

An example of a client-side customization is svnmerge.py, a Python script on top of the standard Subversion command-line client that allows users to easily merge changes from and to a branch, automatically recording which change sets have already been merged. It can display an always updated list of changes yet to be merged and prevents merge mistakes, such as merging the same change twice. The svnmerge.py script is essentially an early prototype of the merge tracking functionality that is currently being discussed, designed, and implemented for a future release of Subversion.

Server Side – Hook Scripts

Server-side customizations are solutions where the server configuration is changed. Server-side customizations scale in terms of rolling out a customization to all projects on the server. They touch the day-to-day operation of the site and increase the effort and cost of operating the service. Also, they potentially impact the security, availability, and performance of the service.

The primary example of server-side customizations are hook scripts. A hook, or hook script, is a program triggered by some repository event, such as creating a new revision or the modifying of an unversioned property. Each hook is handed enough information to tell what that event is, what target(s) it operates on, and the username of the person who triggered the event. Depending on the hook’s output or return status, the hook program may continue the action, stop it, or suspend it in some way. The Version Control with Subversion book describes this in more detail.

Subversion currently defines nine hooks:

  • The start-commit hook is invoked before a transaction is created in the process of doing a commit.
  • The pre-commit hook is invoked after a transaction has been created but before it is committed.
  • The post-commit hook is invoked after a transaction is committed.
  • The pre-revprop-change and post-revprop-change hooks are invoked before respectively after a revision property is added, modified, or deleted.
  • The pre-lock and post-lock hooks are invoked before respectively after an exclusive lock on a path is created.
  • The pre-unlock and post-unlock hooks are invoked before respectively after an exclusive lock is destroyed.

Hooks are typically used for three kinds of functionality:

  • First, to log or notify interested parties about an event. For example, sending an e-mail message per commit, summarizing key information about that commit such as the author, date, commit message, and the change set.
  • Second, to check a particular condition. For example, verify whether the code complies within coding guidelines or whether the user has the appropriate access rights to the parts of the repository that he wants to commit to.
  • Third, to block certain behavior. For example, allow a user to change a log message but not the author and date of a revision (to maintain traceability), prevent locks from being stolen, or allow locking if and only if the path has the svn:needs-lock property set.

Note that, at present, Subversion does not support a hook performing pre- or post-processing functionality, such as automatically ensuring the code complies with coding guidelines, because the server does not have a means to communicate such changes back to the client. In other words, whatever a hook does, it does not modify the transaction itself. Instead, it can check a condition and accept or reject the action.

Hooks are essentially a way of running arbitrary code on the server in response to actions by the version-control client. Moreover, a hook runs with the same permissions as the web server in general and, with that, has the ability to affect other repositories on the same server. This mechanism is very powerful but has potential implications on the security, availability, and performance of the server. A hook can easily slow down or bring down your server or, even worse, corrupt the data in the repository.

Findings and Recommendations

When managing a Subversion server for a number of projects, you need to strike a useful balance between standardizing the environment to enable effective collaboration and efficient operation, while leaving enough flexibility to projects to work in a variety of ways. Standardization can bring many benefits, such as a reduced time to learn the environment when switching projects and enabling more effective collaboration between teams. However, a one-size-fits-all is neither feasible nor desirable with today’s heterogeneity (in local culture, departmental culture, processes, and so on) in individuals and project teams.

From a technical perspective, client- and server-side customizations differ in what they can and can not do:

  • Client-side customizations are suitable when it affects only a single user or for automatic pre-processing (such as code formatting).
  • Server-side customizations are suitable when the change should be standardized across the repository and is a notification, a check, or blocking certain behavior.

From a cost-benefit perspective, try to keep customizations that are specific to only a limited set of users on the client side. This puts the burden of customization on the project, stimulating them to make a proper and honest cost-benefit analysis, as well as preventing it from impacting others. Generic customizations that are relevant to and requested by a large percentage of the projects fit better on the server side. Server-side customizations typically have a substantially higher cost, mainly because they potentially impact performance, availability, and security of the entire server. They need rigorous testing, both upon creation and with each upgrade of the server.

Especially when deploying hooks, we strongly recommend using only very commonly used hooks, both to mitigate the risks (the more a hook is used, the more it is tested) and to strike a reasonable balance between standardization and customization: hooks are popular if and only if they contribute value to many people and, with that, are worth the effort. Requests for esoteric customizations are likely not worth the effort of creating, testing, and maintaining.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s