Maintaining a CernVM-FS Repository
CernVM-FS is a versioning, snapshot-based file system. Similar to versioning systems, changes to /cvmfs/... are temporary until they are committed (
cvmfs_server publish) or discarded (
cvmfs_server abort). That allows you to test and verify your changes, for instance to test a newly installed release before publishing it to clients. Whenever changes are published (committed), a new file system snapshot of the current state is created. These file system snapshots can be tagged with a name, which makes them
Two named snapshots are managed automatically by CernVM-FS,
trunk-previous. This allows for easy unpublishing of a mistake, by rolling back to the
CernVM-FS provides an integrity checker for repositories. Run the integrity checker using
The integrity checker verifies the sanity of file catalogs and verifies that referenced data chunks are present. Ideally, run the integrity checker after every publish operation. Where this is not affordable due to the size of the repositories, run the integrity checker regularly. Optionally
cvmfs_server check can also verify the data integrity (command line flag
-i) of each data object in the repository. However this is a time consuming process and we recommend it only for diagnostic purposes.
Manage Named Snapshots
At the point of publishing, the resulting snapshot can be named. To do so, use the
-a option like
cvmfs_server transaction # Changes cvmfs_server publish -a release-1.0
As a tag name, use an identifier without spaces and special characters. You can list all named snapshots by
In order to remove (unpublish) a named snapshot, use the
-r option like
cvmfs_server transaction cvmfs_server publish -r release-1.0
Use named snapshots whenever you do larger modifications to the repository, for instance when you install a new software release. Only with named snapshots you have the ability to easily undo modifications and to preserve the state of the file system for the future. Nevertheless, do not use named snapshots excessively. Start cleaning up unnecessary snapshots once you have more than ~50.
You can rollback your repository to any of the named snapshots. Technically, this means that the given snapshot is re-published, while all intermediate snapshots are removed from the history. In order to rollback, do
cvmfs_server transaction cvmfs_server rollback -t release-1.0
A rollback is, like restoring from backups, not something you would do often. Use caution. A rollback is irreversible.
Manage Nested Catalogs
CernVM-FS stores meta-data (path names, file sizes, …) in file catalogs. When a client accesses a repository, it has to download the file catalog first and then it downloads the files as they are opened. A single file catalog for an entire repository can quickly become large and impractical. Also, clients typically do not need all of the repository's meta-data at the same time. For instance, clients using software release 1.0 do not need to know about the contents of software release 2.0.
With nested catalogs, CernVM-FS has a mechanism to partition the directory tree of a repository into many catalogs. Repository maintainers are responsible for sensible cutting of the directory trees into nested catalogs. They can do so by creating and removing the magic file .cvmfscatalog. If the directory tree has some inherent structure it could be worthwhile to auto-create most of the nested catalogs using a .cvmfsdirtab file. Please see below for details.
For example, in order to create a nested catalog for software release 1.0 in the hypothetical repository experiment.cern.ch, do
cvmfs_transaction touch /cvmfs/experiment.cern.ch/software/1.0/.cvmfscatalog cvmfs_server publish
If you want to merge a nested catalog with its parent catalog, remove the corresponding .cvmfscatalog file. Nested catalogs can be nested on arbitrary many levels.
Recommendations for Nested Catalogs
Nested catalogs should be created having in mind which files and directories are accessed together. This is typically the case for software releases, but can be also on the directory level that separates platforms. For instance, for a directory layout like
/cvmfs/experiment.cern.ch |- /software | |- /i686 | | |- 1.0 | | |- 2.0 | ` |- common | |- /x86_64 | | |- 1.0 | ` |- common |- /grid-certificates |- /scripts
it makes sense to have nested catalogs at
/cvmfs/experiment.cern.ch/software/i686 /cvmfs/experiment.cern.ch/software/x86_64 /cvmfs/experiment.cern.ch/software/i686/1.0 /cvmfs/experiment.cern.ch/software/i686/2.0 /cvmfs/experiment.cern.ch/software/x86_64/1.0
A nested catalog at the top level of each software package release is generally the best approach because once package releases are installed they tend to never change, which reduces churn and garbage generated in the repository from old catalogs that have changed. In addition, each run only tends to access one version of any package so having a separate catalog per version avoids loading catalog information that will not be used. A nested catalog at the top level of each platform may make sense if there is a significant number of platform-specific files that aren't included in other catalogs.
It could also make sense to have a nested catalog under grid-certificates, if the certificates are updated much more frequently than the other directories. It would
Auto-Creating Nested Catalogs Using a .cvmfsdirtab
Rather than managing .cvmfscatalog files by hand, a repository administrator may create a file called .cvmfsdirtab, in the top directory of the repository. This file contains a list of path specifications where .cvmfscatalog files should be created automatically. Therefore path specifications may contain shell wildcards such as asterisks (*) and question marks (?). The .cvmfsdirtab will be evaluated by `cvmfs_server publish` and takes care of the creation of .cvmfscatalog files before actually publishing the repository revision. A very good use of the patterns is to identify directories where software releases will be installed.
Additonally, one can exclude specific paths by preceeding lines in the .cvmfsdirtab file with an exclamation point (!). For the directory structure explained above, a .cvmfsdirtab might look like this to create the nested catalogs mentioned before:
/software/* /software/*/* ! */common
Restructuring the repository's directory tree is an expensive operation in CernVM-FS. Moreover, it can easily break client applications when they switch to a restructured file system snapshot. Therefore, your software directory tree layout should be relatively stable before you start filling the CernVM-FS repository.