Maintaining a CernVM-FS Repository

CernVM-FS is a versioning, snapshot-based file system. Similar to versioning systems, changes to /cvmfs/... are temporary until they are committed (cvmfs_server publish) or discarded (cvmfs_server abort). That allows you to test and verify your changes, for instance to test a newly installed release before publishing it to clients. Whenever changes are published (committed), a new file system snapshot of the current state is created. These file system snapshots can be tagged with a name, which makes them named snapshots. A named snapshot is meant to stay in the file system. You can rollback to named snapshots and you can, on client side, decide to mount any of the named snapshots in lieu of the newest available snapshot.

Two named snapshots are managed automatically by CernVM-FS, trunk and trunk-previous. This allows for easy unpublishing of a mistake, by rolling back to the trunk-previous tag.

Check Integrity

CernVM-FS provides an integrity checker for repositories. Run the integrity checker using

cvmfs_server check

The integrity checker verifies the sanity of file catalogs and verifies that referenced data chunks are present. Ideally, run the integrity checker after every publish operation. Where this is not affordable due to the size of the repositories, run the integrity checker regularly. Optionally cvmfs_server check can also verify the data integrity (command line flag -i) of each data object in the repository. However this is a time consuming process and we recommend it only for diagnostic purposes.

Manage Named Snapshots

At the point of publishing, the resulting snapshot can be named. To do so, use the -a option like

cvmfs_server transaction
# Changes
cvmfs_server publish -a release-1.0

As a tag name, use an identifier without spaces and special characters. You can list all named snapshots by

cvmfs_server lstags

In order to remove (unpublish) a named snapshot, use the -r option like

cvmfs_server transaction
cvmfs_server publish -r release-1.0

Use named snapshots whenever you do larger modifications to the repository, for instance when you install a new software release. Only with named snapshots you have the ability to easily undo modifications and to preserve the state of the file system for the future. Nevertheless, do not use named snapshots excessively. Start cleaning up unnecessary snapshots once you have more than ~50.

Rollbacks

You can rollback your repository to any of the named snapshots. Technically, this means that the given snapshot is re-published, while all intermediate snapshots are removed from the history. In order to rollback, do

cvmfs_server transaction
cvmfs_server rollback -t release-1.0

A rollback is, like restoring from backups, not something you would do often. Use caution. A rollback is irreversible.

Manage Nested Catalogs

CernVM-FS stores meta-data (path names, file sizes, …) in file catalogs. When a client accesses a repository, it has to download the file catalog first and then it downloads the files as they are opened. A single file catalog for an entire repository can quickly become large and impractical. Also, clients typically do not need all of the repository's meta-data at the same time. For instance, clients using software release 1.0 do not need to know about the contents of software release 2.0.

With nested catalogs, CernVM-FS has a mechanism to partition the directory tree of a repository into many catalogs. Repository maintainers are responsible for sensible cutting of the directory trees into nested catalogs. They can do so by creating and removing the magic file .cvmfscatalog. If the directory tree has some inherent structure it could be worthwhile to auto-create most of the nested catalogs using a .cvmfsdirtab file. Please see below for details.

For example, in order to create a nested catalog for software release 1.0 in the hypothetical repository experiment.cern.ch, do

cvmfs_transaction
touch /cvmfs/experiment.cern.ch/software/1.0/.cvmfscatalog
cvmfs_server publish

If you want to merge a nested catalog with its parent catalog, remove the corresponding .cvmfscatalog file. Nested catalogs can be nested on arbitrary many levels.

Recommendations for Nested Catalogs

Nested catalogs should be created having in mind which files and directories are accessed together. This is typically the case for software releases, but can be also on the directory level that separates platforms. For instance, for a directory layout like

/cvmfs/experiment.cern.ch
  |- /software
  |    |- /i686
  |    |    |- 1.0
  |    |    |- 2.0
  |    `    |- common
  |    |- /x86_64
  |    |    |- 1.0
  |    `    |- common  
  |- /grid-certificates
  |- /scripts 

it makes sense to have nested catalogs at

/cvmfs/experiment.cern.ch/software/i686
/cvmfs/experiment.cern.ch/software/x86_64
/cvmfs/experiment.cern.ch/software/i686/1.0
/cvmfs/experiment.cern.ch/software/i686/2.0
/cvmfs/experiment.cern.ch/software/x86_64/1.0 

A nested catalog at the top level of each software package release is generally the best approach because once package releases are installed they tend to never change, which reduces churn and garbage generated in the repository from old catalogs that have changed. In addition, each run only tends to access one version of any package so having a separate catalog per version avoids loading catalog information that will not be used. A nested catalog at the top level of each platform may make sense if there is a significant number of platform-specific files that aren't included in other catalogs.

It could also make sense to have a nested catalog under grid-certificates, if the certificates are updated much more frequently than the other directories. It would not make sense to create a nested catalog under /cvmfs/experiment.cern.ch/software/i686/common, because this directory needs to be accessed anyway whenever its parent directory is needed. As a rule of thumb, a single file catalog should contain more than 1000 files and directories but not contain more than ~200000 files.

Auto-Creating Nested Catalogs Using a .cvmfsdirtab

Rather than managing .cvmfscatalog files by hand, a repository administrator may create a file called .cvmfsdirtab, in the top directory of the repository. This file contains a list of path specifications where .cvmfscatalog files should be created automatically. Therefore path specifications may contain shell wildcards such as asterisks (*) and question marks (?). The .cvmfsdirtab will be evaluated by `cvmfs_server publish` and takes care of the creation of .cvmfscatalog files before actually publishing the repository revision. A very good use of the patterns is to identify directories where software releases will be installed.
Additonally, one can exclude specific paths by preceeding lines in the .cvmfsdirtab file with an exclamation point (!). For the directory structure explained above, a .cvmfsdirtab might look like this to create the nested catalogs mentioned before:

/software/*
/software/*/*
! */common

Warning

Restructuring the repository's directory tree is an expensive operation in CernVM-FS. Moreover, it can easily break client applications when they switch to a restructured file system snapshot. Therefore, your software directory tree layout should be relatively stable before you start filling the CernVM-FS repository.

You are here