Chef is configuration management platform written in Ruby. Configuration management is a large topic that most systems administrators and IT management are just now starting to gain experience with. Historically, infrastructures have been maintained either by hand, with structured scripting, by imagine cloning, or a combination of those. Chef’s usage model rejects the idea of cloning and maintaining “golden images”. Instead, the idea is to start with an embryonic image and grow it into it’s desired state. This works much better as infrastructure complexity increases, and eliminates the problem of image sprawl. The convergent nature of the tool allows you to change the infrastructure over time without much fuss. Chef allows you to express your infrastructure as code, which lets you store it in version control.
“A Can of Condensed Chef Documentation” is available here
Actually you can use any SCM, but git is the most widely adopted in the Chef community. All Chef Git repos begin their lives as clones of the Opscode chef-repo, found here: https://github.com/opscode/chef-repo There is a nice overview of the repo structure (cookbooks, databags, roles, etc) in the README.
chef-server up and running at a known IP or FQDN.
This is easily installed from packages by following the instructions on the opscode wiki. The process amounts to “add a package repository, install the packages, and turn it on” Alternatively, you could use the Opscode Platform and go dancing with space robots.
Knife installed on your local system
Chef git repo checked out on local file system
Client certificate creation
A “client” in chef parlance is an SSL certificate used to access the chef-server API. If the client’s CN name is marked “admin” in chef-server, the client can perform restricted operations such as creating and deleting nodes. This is the kind of client needed by knife to manipulate the infrastructure, and normally correspond to actual human being, but by no means has to. Nodes have non-admin client certificates, and can only manipulate their own node objects. To create a client certificate, you’ll need to log into the chef-server webui, click on “clients”, think of a name for it (I use someara), and paste the displayed private key into a local file.
Copy the validation key
The validation key is a special key that is shared by all freshly bootstrapped nodes. It has the ability to create new client certificates and nodes objects through the API.
Edit configuration files
For more details on this section, please visit http://wiki.opscode.com/display/chef/Chef+Configuration+Settings
.chef/client.rb - This file is copied onto the nodes that are bootstrapped with knife, and needs to be configured to point to the IP or FQDN of your chef server
1 2 3 4 5 6 7 8 9 10 11 12
.chef/knife.rb - This file also needs to be configured to point to your chef-server, and also to the client private key that was created earlier.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
Role, recipes, and run lists
As mentioned earlier, run lists are made up from role trees. Here is an example of how you would create a demo server with a correct clock, managed users, and metrics and monitoring capabilities. In this example, six recipes are executed per run, and an unknown number of resources are managed. (To figure that out, you’d have to read the recipes)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
That’s quite a bit of cooking for a beginner tutorial, so we’re just going to focus on a single node running an NTP client for now. Roles can be written either as .rb files or .json files. I prefer to use the .rb format because they’re easier to read and write. Some people prefer to deal with the JSON formatted version directly, since thats the way they’re dumped with knife. At the end of the day, it doesn’t really matter, so do which ever makes you happy.
Step One : Creating a demo role file
1 2 3 4 5
Step Two : Installing the role on chef-server
A machine’s NTP client is simple to install and configure. Every systems administrator is already familiar with it, which makes it a great example.
Most software available as a native package in a given linux distribution can be managed with a “package, template, service” design pattern.
Each of those words refers to a Chef resource, which we pass arguments to.
Step One : Creating an ntp cookbook
This creates a directory structure for the ntp cookbook. You can check it out with ls:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Step Two : Deciding what to name the recipe
Recipe names are related to cookbook structure. Putting recipe[foo::bar] in a node’s run list results in cookbooks/foo/recipes/bar.rb being downloaded from chef-server and executed.
There is a special recipe in every cookbook called default.rb. It is executed by every recipe in the cookbook. Specifying recipe[foo::bar] actually results in cookbooks/foo/recipes/default.rb, as well as cookbooks/foo/recipes/bar.rb being executed.
Default.rb is a good place to put common stuff when writing cookbooks with multiple recipes, but we’re going to keep it simple and just use default.rb for everything.
Step Three : Creating a recipe
This is where all the fun stuff happens. When using resources, you’re writing things in a declarative fashion. Declarative means you can concentrate on the WHAT without having to worry about the HOW. Chef will take care of that for you with something called a resource provider. When installing a package, it will check to see what your operating system is and use the appropriate methodology. For example, on Debian based systems, it will use apt-get, and on Redhat based systems, it will use yum.
1 2 3 4 5 6 7 8 9 10 11 12
Chef recipes are evaluated top down (like a normal ruby program), with each resource being ran in the order it appears. Order is important. In the above example, if we were to reverse the order of those three resources, it would first fail to start the service (as the software is not installed yet), then write the configuration file, then finally clobber the file it just wrote by installing the package.
Step Four : Creating the ntp.conf.erb template
1 2 3 4 5 6 7 8 9
Step Five : uploading the cookbook to chef-server
The chef-client needs to somehow get itself installed and running on managed nodes. This process is known as bootstrapping and is accomplished with shell scripting. The method of bootstrap will vary depending on how you go about provisioning your server, and the script will depend on the platform.
Cloud providers like AWS and Rackspace will let you make an API request, then return the IP of your compute resource.
In this example, knife uses the ruby fog library to talk to ec2 and request a server with an argument of the desired AMI. Knife then uses net-ssh-multi to ssh into the machine and execute a bootstrapping script. There are a number of other arguments that can be passed to knife, such as ec2 region, machine size, what ssh key to use. You can read all about them on the Opscode wiki.
If your method of provisioning servers is “ask your VMware administrator” or “fill out these forms”, then you’ll probably bootstrap via an IP address.
Cobbler / FAI / pxe_dust / Jumpstart / etc
In these provisioning scenarios, you can skip knife completely and put the contents of a bootstrap script kickstart or equivalent.
Customizing the bootstrap
By default (with no arguments), Chef attempts a gem based installation meant to work on Ubuntu. If you’re not using Ubuntu, or are uncomfortable installing gems directly from rubygems.org, you’ll have to change the script to suite your taste. It works by specifying a template name with the -d flag, SSH’ing into the machine and running the rendered script. When using knife to SSH, make sure you have the correct key loaded into your ssh-agent.
ends up running this
What I do in my boot scripts:
- Correctly set the hostname to value of -N argument. (By correctly, I mean that
hostname -fhas to work properly)
- Configure the package repositories
- Install Chef. I like packages using the native package manager
- Copy the validation key
- Write /etc/chef/client.rb (points to server)
- Write a json file with the contents of the -r argument
- chef-client -j bootstrap.json
After the script is ran, chef-client does the following
- Client registration: SSL CN is FQDN from ohai
- Node creation: Node name is also FQDN from ohai, run lists are from JSON
- Expands run list
- Downloads needed cookbooks
- Starts executing recipes
There is an example of a custom bootstrap script here
At this point, you should have an ntp client installed, configured, and running.
(It’s actually a little bit more complicated than that. For more information about chef-client runs, see http://wiki.opscode.com/display/chef/Anatomy+of+a+Chef+Run )
Databag Driven Recipes
Data driven infrastructures are all the rage these days. This allows you to do things like change the NTP server all your nodes use by editing a single JSON value in chef-server. You can get really creative with this, so let your imagination run wild.
Step One : Create an ntp data bag
1 2 3 4 5 6 7
Step Two : Upload data bag to chef-server
Step Three : Modify the recipe to take advantage of it
1 2 3 4 5 6 7 8 9 10 11 12 13 14
You can also access data bag data through the search() interface, which you can read about on the opscode wiki.
Step Four : uploading the cookbook to chef-server
Understanding Idempotence and Convergence
We’re not quite done yet. Let’s SSH into our shiny new NTP enabled machine and go poking about.
Wait a sec, isn’t that supposed to be “us.pool.ntp.org”? Not yet. We haven’t enabled our convergence mechanism yet! If we manually run chef-client on the node, we will indeed see that the file has changed.
1 2 3
That file just converged into the correct state. Lets edit the file again, this time filling it with complete garbage.
1 2 3 4
Again, the file converged into the correct state. Awesome. Running chef-client by hand on a large cluster of nodes would be a real pain, so it makes sense to set it up automatically. Indeed, often found in a “role[base]” is a “recipe[chef-client]” that configures it to run as a daemon, or from a cron.
It is safe to run the recipes on the nodes time and time again because resources are written to be idempotent. You may remember from math class that a function f is idempotent if, for all values of x, f(f(x))=f(x). That means you can run a function over a resource a bajillion times and it will behave as if it was only done once.
This is implemented under the hood as “If it ain’t broke, don’t fix it.” In a file resource, checksums are calculated and compared. In a package resource, the rpm or dpkg databases are consulted to see if the package is installed. The effect of this is that most chef-client runs do absolutely nothing to resources. That is, until you change the function by altering the inputs to the resource providers.
Notifications and Subscriptions
Further examination reveals that the ntpd service is still talking to “time.nist.gov”. This is because during the chef-client run, the resource named “ntpd” ran it’s idempotency check, and found that it was, in fact, running. It therefore did nothing. It we want ntpd to restart when the contents /etc/ntp.conf are altered, we have to modify our recipe to set up that relation.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Alternatively, we could have set up the “service[ntpd]” resource to subscribe to the “template[/etc/ntp.conf]” resource.
Upload the modified ntp cookbook to chef-server and re-run the client on your demo server to check your work.
1 2 3
Bulk Loading data into chef-server
To save yourself from writing crazy for loops on command line like
Or even worse,
1 2 3 4 5 6
… somebody was nice enough to write some rake tasks. List them with rake -T, and then install your repo in chef-server with “rake install”
Viewing your Infrastructure
There are two ways to view your infrastructure. The first is through the management console, and the other is from knife. Here is a list of handy commands to get you started.
1 2 3 4
Deleting Clients, Nodes, and Machines
Remember that nodes, their client certificates, and the machines they’re associated with are three separate entities.
This just deletes the node object from chef-server. The next time the machine runs chef-client, the node object will be recreated in chef-server. This node object will have an empty run list what will have to be repopulated before chef-client actually does anything.
This deletes a node object’s associated public key from chef-server. The next time the machine runs chef-client, it will get a permission denied error. If this is done on accident, ssh into the machine, delete it’s client key at /etc/chef/client.pem and re-run chef-client.
Deleting a machine will be specific to how it was provisioned. On AWS, it would look like “knife ec2 server delete i-DEAFBEEF”. On a VMware cluster, it could be by clicking buttons in a GUI. I once deleted a hardware machine by throwing it off a balcony. YMMV.
I like to keep a special directory called “infrastructures” that contain sub-directories and nodes.sh files. A nodes.sh contains a list of knife commands that can be thought of as the highest level view of the infrastructure. for example:
1 2 3 4 5 6 7 8
This file can eventually be used to bring up entire infrastructures, but during development, lines are typically pasted into a terminal individually.
This is as close as I’ve gotten to replacing myself with a very small shell script so far. Many a sysadmin has been pursuing this for a long time now. It is here. The journey has just begun.