CernVM-FS Network Path Selection
(Refers to CernVM-FS 2.2)
At any given point in time, there is only one combination of web proxy and web host that all new requests are going to utilize. We will call this combination of proxy and host "network path". The network path is chosen from the collection of web proxies and hosts in the CernVM-FS configuration according to the following rules.
The hosts specified as an ordered list. CernVM-FS will always start with the first host and fail-over one by one to the next hosts in the list.
Web proxies are treated as an ordered list of load-balance groups. Like the hosts, load-balance groups will be probed one after another. Within a load-balance group, a proxy is chosen at random.
(Proxies within the same load-balance group are separated by a pipe in the CernVM-FS configuration, whereas proxy groups are separated by a semicolon. For instance, your CVMFS_HTTP_PROXY parameter could contain
On download failures, CernVM-FS tries to figure out if the failure is caused by the host or by the proxy.
- Failures of host name resolution, HTTP 5XX and 404 return codes, and any connection/timeout error, partial file transfer, or non 2XX return code in case no proxy is in use are classified as host failure
- Failures of proxy name resolution and any connection/timeout error, partial file transfer, or non 2XX return code (except 5XX and 404) are classified as proxy failure if a proxy server is used.
If CernVM-FS detects a host failure, it will fail-over to the next host in the list while keeping the proxy server untouched. If it detects a proxy failure, it will fail-over to to another proxy while keeping the host untouched. CernVM-FS will try all proxies of the current load-balance group in random order before trying proxies from the next load-balance group.
The change of host or proxy is a global change affecting all subsequent requests. In order to avoid concurrent requests changing the global network path at the same time, the actual change of path is only performed if the global host/proxy is equal to the currently used host/proxy of the request. Otherwise, the request assumes that another request already performed the fail-over and only the request's fail-over counter is increased.
In order to avoid endless loops, every request carries a host fail-over counter and a proxy fail-over counter. Once this counter reaches the number of host/proxies, CernVM-FS gives up and returns a failure.
The failure classification can mistakenly take a host failure for a proxy failure. Therefore, after all proxies have been probed, a connection/timeout error, partial file transfer, or non 2XX return code is treated like a host failure in any case and the proxy server as well as the proxy server failure counter of the request at hand is reset. This way, eventually all possible network paths are examined.
Network Path Reset Rules
On host or proxy fail-over, CernVM-FS will remember the timestamp of the failover. The first request after a given grace period (see Default Values) will reset the proxy to a random proxy of the first load-balance group or the host to the first host, respectively. If the default proxy/host is still unavailable, the fail-over routines again switch to a working network path.
Retry and Backoff
On connection and timeout errors, CernVM-FS retries a fixed, limitied number of times on the same network path before performing a fail-over. Retrying involves an exponential backoff with a minimum and maximum waiting time.
- Network timeout for connections using a proxy: 5 seconds (adjustable by
- Network timeout for connections without a proxy: 10 seconds (adjustable by
- Grace period for proxy reset after fail-over: 5 minutes (adjustable by
- Grace period for host reset after fail-over: 30 minutes (adjustable by
- Maximum number of retries on the same network path: 1 (adjustable by
- Minimum waiting time on a retry: 2 seconds (adjustable by
- Maximum waiting time on a retry: 10 seconds (adjustable by
- Minimum/Maximum DNS name cache: 1 minute / 1 day
Note: a continuous transfer rate below 1kB/s is treated like a network timeout.
DNS Round-Robin Names for Proxies
DNS proxy names that resolve to multiple IP addresses are automatically transformed into a proxy load-balance group. In order to limit the number of proxy servers used from
a round-robin DNS entry, set
CVMFS_MAX_IPADDR_PER_PROXY. This also limits the perceived "hang duration" while CernVM-FS performs fail-overs.
IPv4 / IPv6 Selection
CernVM-FS will use the system default settings when connecting directly to a host. When connecting to a proxy, by default it will try on the IPv4 address unless the proxy only has IPv6 addresses configured. The
CVMFS_IPFAMILY_PREFER=[4|6] parameter can be used to select the preferred IP protocol for dual-stack proxies.