Cloud Contextualization

Introduction

A context is a small (up to 16kB), usually human-readable snippet that is used to apply a role to a virtual machine. A context allows to have a singe virtual machine image that can back many different virtual machine instances in so far as the instance can adapt to various cloud infrastructures and use cases depending on the context. In the process of contextualization, the cloud infrastructure makes the context available to the virtual machine and the virtual machine interprets the context. On contextualization, the virtual machine can, for instance, start certain services, create users, or set configuration parameters.

For contextualization, we distinguish between so called meta-data and user-data. The meta-data is provided by the cloud infrastructure and is not modifiable by the user. For example, meta-data includes the instance ID as issued by the cloud infrastructure and the virtual machine's assigned network parameters. The user-data is provided by the user on creation of the virtual machine.

Meta-data and user-data are typically accessible through an HTTP server on a private IP address such as 169.254.169.254. The cloud infrastructure shields user-data from different VMs so that it can be used to deliver credentials to a virtual machine.

In CernVM, there are three entities that interpret the user-data. Each of them typically read "what it understands" while ignoring the rest. The µCernVM bootloader interprets a very limited set of key-value pairs that are used to initialize the virtual hardware and to select the CernVM-FS operating system repository. In a later boot phase, amiconfig and cloud-init are used to contextualize the virtual machine. The amiconfig system was initially provided by rPath but it is now maintained by us. It provides very simple, key-value based contextualization that is processed by a number of amiconfig plugins. The cloud-init system is maintained by Redhat and Ubuntu and provides a more sophisticated but also slightly more complex contextualization mechanism.

Contextualization of the µCernVM Boot Loader

The µCernVM bootloader can process EC2, Openstack, and vSphere user data. Within the user data everything is ignored expect a block of the form

[ucernvm-begin]
key1=value1
key2=value2
...
[ucernvm-end]

The following key-value pairs are recognized:

resize_rootfs
Can be on or off. When turned on, use all of the hard disk as root partition instead of the first 20G
cvmfs_http_proxy
HTTP proxy in CernVM-FS notation
cvmfs_pac_urls
WPAD proxy autoconfig URLs separated by ';'
cvmfs_server
List of Stratum 1 servers, e.g. cvmfs-stratum-one.cern.ch,another.com
cvmfs_tag
The snapshot name, useful for the long-term data preservation
cernvm_inject
Base64 encoded .tar.gz ball, which gets extracted in the root tree
useglideinWMS
Can be on or off, defaults to on. When turned off, glideinWMS auto detection gets disabled

Contextualization with amiconfig

The amiconfig contextualization executes on boot time, parses user data and looks for python style configuration blocks. If a match is found the corresponding plugin will process the options and execute configuration steps if needed. By default, enabled rootsshkeys is the only enabled plugins (others can be enabled in the configuration file).

Default plugins:

rootshkeys            - allow injection of root ssh keys

Available plugins:

amildap               - setup LDAP connection
cernvm                - configure various CernVM options
condor                - setup Condor batch system
disablesshpasswdauth  - if activated, it will disable ssh authentication with password
dnsupdate             - update DNS server with current host IO
ganglia               - configure gmond (ganglia monitoring)
hostname              - set hostname
noip                  - register IP address with NOIP dynamic DNS service      
nss                   - /etc/nsswithch.conf configuration
puppet                - set parameters for puppet configuration management
squid                 - configure squid for use with CernVM-FS

Common amiconfig options:

[amiconfig]
plugins = <list of plugins to enable>
disabed_plugins = <list of plugins to disable>

Specific plugin options:

[cernvm]
# list of ',' seperated organisations/experiments (lowercase)
organisations = <list>
# list of ',' seperated repositories (lowercase)
repositories = <list>
# list of ',' separated user accounts to create <user:group:[password]>
users = <list>
# CernVM user shell </bin/bash|/bin/tcsh>
shell = <shell>
# CVMFS HTTP proxy
proxy = http://<host>:<port>;DIRECT
----------------------------------------------------------
# url from where to retrieve initial CernVM configuration
config_url = <url>
# list of ',' separated scripts to be executed as given user: <user>:/path/to/script.sh
contextualization_command = <list>
# list of ',' seperated services to start
services = <list>
# extra environment variables to define
environment = VAR1=<value>,VAR2=<value>
[condor]
# host name
hostname = <FQDN>
# master host name
condor_master = <FQDN>
# shared secret key
condor_secret = <string>
#------------------------
# collector name
collector_name = <string>
# condor user
condor_user = <string>
# condor group
condor_group = <string>
# condor directory
condor_dir = <path>
# condor admin
condor_admin = <path>
highport = 9700
lowport = 9600
uid_domain =  filesystem_domain =  allow_write = *.$uid_domain extra_vars = use_ips =

Contextualization scripts

If the user data string starts with a line starting with #!, it will be interpreted as a bash script and executed. The same user data string may as well contain amiconfig contextualization options but they must be placed after the configuration script which must end with an exit statement. The interpreter can be /bin/sh or /bin/sh.before or /bin/sh.after depending on when the script is to be executed, before or after amiconfig contextualization. A script for the /bin/sh interpreter is executed after amiconfig contextualization.

Contextualization with cloud-init

As an alternative to amiconfig, CernVM 3 supports cloud-init contextualization. Some of the contextualization tasks done by amiconfig can be done by cloud-init as well due to the native cloud-init modules for cvmfs, ganglia, and condor.

Mixing user-data for µCernVM, amiconfig, and cloud-init

The user-data for cloud-init and for amiconfig can be mixed. The cloud-init syntax supports user data divided into multiple MIME parts. One of these MIME parts can contain amiconfig or µCernVM formatted user-data. All contextualization agents (µCernVM, amiconfig, cloud-init) parse the user data and each one interprets what it understands.

The following example illustrates how to mix amiconfig and cloud-init. We have an amiconfig context amiconfig-user-data that starts a catalog server for use with Makeflow:

[amiconfig]
plugins = workqueue
[workqueue]

We also have a cloud-init context cloud-init-user-data that creates an interactive user "cloudy" with the password "password"

users:
  - name: cloudy
    lock-passwd: false
    passwd: $6$XYWYJCb.$OYPPN5AohCixcG3IqcmXK7.yJ/wr.TwEu23gaVqZZpfdgtFo8X/Z3u0NbBkXa4tuwu3OhCxBD/XtcSUbcvXBn1

The following helper script creates our combined user data with multiple MIME parts:

amiconfig-mime cloud-init-user-data:cloud-config amiconfig-user-data:amiconfig-user-data > mixed-user-data

In the same way, the µCernVM contextualization block can be another MIME part in a mixed context with MIME type ucernvm.

glideinWMS User Data

By default, CernVM will automatically detect user data from glideinWMS and, if detected, activate the glideinWMS VM agent. CernVM recognizes user data that consists of no more than two lines and that contains the pattern ...#### -cluster 0123 -subcluster 4567####... as glideinWMS user data. It will automatically extract the CernVM-FS proxy configuration (proxy and pac URLs) from the user data. In order to disable autodetection, set useglideinWMS=false in the µCernVM contextualization.

Extra Contextualization

In addition to the normal user data, we have experimental support for "extra user data", which might be a last resort where the normal user data is occupied by the infrastructure. For instance, glideinWMS seems to exclusively specify user data, making it necessary to modify the image for additional contextualization. Extra user data are injected in the image under /cernvm/extra-user-data and they are internally appended to the normal user data. This does not yet work with cloud-init though; only with amiconfig and the µCernVM bootloader.

Cluster Contextualization

When you are creating a cluster, CernVM offers support for automatic master IP distribution to worker machines in the pre-contextualization phase. You generate a PIN with Cluster Pairing Service and put it into appropriate sections of the context files. For further info and usage example, please refer to the documentation.

Examples

Create a Condor cluster on CERN OpenStack

CernVM 4 supports the Cluster Contextualization via a centrally managed service. Currently it allows you to distribute a Master node IP address to the Worker nodes. This example shows cluster setup on CERN OpenStack, but the approach is similar with other cloud providers.

First you need to upload a CernVM 4 image to your personal OpenStack image store. Instructions can be found in the Running on CERN OpenStack tutorial.

After you uploaded the CernVM 4 image, you need to go to the Cluster Pairing Service and generate a new cluster PIN. Login is required in order to complete this action (you can log in with your CERN account).

Then you need to create two context files. One for your master node and one for your worker nodes. Remember to replace the cluster PIN field your own PIN. Notice the '###MASTER_IP_PLACEHOLDER###' placeholder, which is automatically replaced during the initial boot on the worker nodes.

Master context file

From nobody Fri Oct 14 13:55:48 2016
Content-Type: multipart/mixed; boundary="===============2227770197833174079=="
MIME-Version: 1.0

--===============2227770197833174079==
MIME-Version: 1.0 
Content-Type: text/cloud-config; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="master.context.cloudinit"

  users:
    - name: condor-submit

--===============2227770197833174079==
MIME-Version: 1.0 
Content-Type: text/amiconfig-user-data; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="master.context.amiconfig"

[amiconfig]
plugins=cernvm condor

[condor]
use_ips=true
lowport=41000
highport=42000
condor_user=condor
condor_group=condor
condor_secret=secret
uid_domain=*

[ucernvm-begin]
cvm_cluster_master=true
cvm_cluster_pin=<Your Generated Cluster Pin>
cvmfs_branch=cernvm-sl7.cern.ch
cvmfs_server=hepvm.cern.ch
cvmfs_path=cvm4
[ucernvm-end]
--===============2227770197833174079==--

Slave context file

From nobody Fri Oct 14 13:55:48 2016
Content-Type: multipart/mixed; boundary="===============2227770197833174079=="
MIME-Version: 1.0

--===============2227770197833174079==
MIME-Version: 1.0 
Content-Type: text/amiconfig-user-data; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="master.context.amiconfig"

[amiconfig]
plugins=cernvm condor

[condor]
condor_master=###MASTER_IP_PLACEHOLDER###
use_ips=true
lowport=41000
highport=42000
condor_user=condor
condor_group=condor
condor_secret=secret
uid_domain=*

[ucernvm-begin]
cvm_cluster_pin=<Your Generated Cluster Pin>
cvmfs_branch=cernvm-sl7.cern.ch
cvmfs_server=hepvm.cern.ch
cvmfs_path=cvm4
[ucernvm-end]
--===============2227770197833174079==--

Login to lxplus (if you don't have installed and properly configured Nova tools on your machine) and save these context files as master.context and slave.context. Next we are going to create our machines. You need to have a generated key pair in the CERN OpenStack (check in the Access&Security tab in the OpenStack web interface). The given image name has to be the same as the one you uploaded earlier. Again, don't forget to replace the key name and node name.

Create a Master node

nova boot <my-cluster-master-name> --image 'CernVM_4' --flavor m2.small \
         --key-name <my_cloud_key> --user-data master.context

Create Worker nodes

nova boot <my-cluster-slave-name-1> --image 'CernVM_4' --flavor m2.small \
          --key-name <my_cloud_key> --user-data slave.context
nova boot <my-cluster-slave-name-2> --image 'CernVM_4' --flavor m2.small \
          --key-name <my_cloud_key> --user-data slave.context

You need to wait approximately 15 minutes before the DNS info is propagated through the CERN network. If you do not want to setup the DNS records, you can pass the '--meta cern-services=false' argument to the nova command. In that case, you can log into the machines straight away by using their IP addresses.

After your master machine is up and running you can log in and check the status of the condor cluster.

ssh -i <my_cloud_key> root@<master-machine>
condor_status

After the worker nodes completed their boot, you should see the machines listed here. You can create and execute a test job to make sure the condor cluster works.

Cluster test job file: hello_world.job

Universe   = vanilla
Executable = /bin/hostname
Output = hello_world.stdout
Error = hello_world.stderr
Log = hello_world.log
Queue

Running the test job

# Switch to the test user
su - condor-submit
# Submit the job
​condor_submit hello_world.job

After a moment you should see the job output in the hello_world.stdout file, which should contain a hostname of the worker node that completed the job.

Congratulations! Your cluster is now set up.

Create a Makeflow cluster

CernVM 3 supports the Makeflow workflow engine. Makeflow provides an easy way to define and run distributed computing workflows. The contextualization is similar to condor. There are three parameters:

catalog_server=hostname or ip address
workqueue_project=project name, defaults to "cernvm" (similar to the shared secret in condor)
workqueue_user=user name, defaults to workqueue (the user is created on the fly if necessary)

In order to contextualize the master node, include an empty workqueue section, like

[amiconfig]
plugins=workqueue
[workqueue]

In order to start the work queue workers, specify the location of the catalog server, like

[amiconfig]
plugins=workqueue

[workqueue]
catalog_server=128.142.142.107
workqueue_project=foobar

The plugin will start one worker for every available CPU. Once the ensemble is up and running, makeflow can make use of the workqueue resources like so

makeflow -T wq -a -N foobar -d all -C 128.142.142.107:9097 makeflow.example

Note that your cloud infrastructure needs to provide access to UDP and TCP ports 9097 on your virtual machines.

You are here