Saturday, May 20, 2006

Parallelizing command execution with vxargs

If you need to maintain multiple hosts, you know how boring it is to repeat the exact same task on all of them. I'm currently using PlanetLab as part of a class assignment and I'm facing this problem because I need to set up around 10 machines and execute the same commands on all of them.

vxargs is a nice Python script that eases this task. It lets you run a command parametrizing it with a given set of strings (e.g. host names), similar to what find's -exec flag does. First of all, you construct a file with the list of host names you need to control and then feed it to the script alongside the command you need to execute. For example, to upload a dist.tgz file to all the servers:

vxargs -a hosts.list -o /tmp/result scp dist.tgz {}:

The utility will replace the {} substring with each line in hosts.list and will execute the command. The nice thing is that vxargs runs all tasks in parallel, maximizing efficience. During execution, its cursed-based interface shows the progress of each command. And when all jobs are over, you will find their output (stdout and stderr) as well as their exit code in the /tmp/result directory. Fairly useful.

Despite that manually installing vxargs is easy, there is now a vxargs package in pkgsrc.

2 comments:

  1. Really nice! Thanks for the pointer.

    ReplyDelete
  2. The dsh program from clusterit (clusterit.sf.net or pkgsrc/parallel/clusterit) is similar to this; it runs a given command on a bunch of hosts. The clusterit package contains more useful commands to manipulate multiple hosts together.

    And for real automated multiple-machines-administration, there is cfengine (cfengine.org, pkgsrc/sysutils/cfengine2), which is rather complex but extremely powerful.

    ReplyDelete