\1

Bidirectional full-cone NAT (bleah)

Posted by waldner on 20 March 2012, 6:02 pm

Life is unfair and shit happens, so let's consider the situation in the diagram:

Site A and site B both use the same IP range (192.168.10.0/24), and now for whatever reason the two sites need to talk to each other (for example through a VPN). Connections can be in either direction, and potentially any host at any site should be able to talk to any host or hosts at the other site.

Of course, renumbering is out of the question, so the poor sysadmin has to come up with some klu^Wbrilliant solution to solve the routing problem and save the day.

One may think of doing some sort of 1:1 NAT, also known as full cone NAT (here even without port number translation). The idea is: to each site, the other site will appear as if it is using some other, fake, IP range. So if our host 192.168.10.44 at site A wants to talk to host 192.168.10.211 at site B, it will pretend it wants to talk to host 192.168.200.211 instead. More generally, a host at site A wanting to talk to host 192.168.10.x at site B, will instead use a destination address of 192.168.200.x (where x is the same in both addresses).

Assuming we're somehow able to deliver those packets to site B (see later), something has to happen in between so the host at site B thinks the packets are destined to it, and they should look like they're coming from an IP range different from its own (the fake range assigned to site A, 192.168.100.0/24). More generally, packets coming from site A's host 192.168.10.y will appear at site B as if they're coming from host 192.168.100.y (where y is the same in both addresses).

The diagram below illustrates what needs to be done.

In the upper part of the diagram, host 192.168.10.44 sends its packet with source 192.168.10.44 and destination 192.168.200.211. Gateway gwa, before forwarding the packet to the VPN link, rewrites the source address so it looks like the packet is coming from 192.168.100.44 instead (192.168.100.0/24 is the fake IP range assigned to site A). When gateway gwb receives it, it has to change the destination address to the real one of the receiving host, that is, 192.168.10.211. Once that is done, the target host receives the packet and thinks it's from 192.168.100.44.
In the other direction (lower part of the diagram), the same tricks are applied in reverse so at the end the host at site A thinks it's really talking to 192.168.200.211, and the host at site B thinks it's really talking to 192.168.100.44.

All the logic is implemented in the gateways. At site A, the gateway rewrites source addresses of the form 192.168.10.x to 102.168.100.x for outgoing traffic, and does the reverse translation on destination addresses (from .100.x to .10.x) on incoming traffic. At site B, something similar happens, except that the translation is from source .10.x to .200.x on outgoing traffic, and from destination .200.x to .10.x on incoming traffic.
That way, everyone is fooled and happy, and things (mostly) work.

Since the NAT is 1:1, new machines can be added at each site and the gateways need no change in configuration.

All this machinery is made possible by a neat iptables target called NETMAP, which can automagically perform the translations described above. NETMAP only works on the nat table, and depending on whether it's used in the POSTROUTING or the PREROUTING chain, it does the right thing and rewrites the right address (source or destination).

NETMAP
This target allows you to statically map a whole network of addresses onto another network of addresses. It can only be used from rules in the nat table.

--to address[/mask]
Network address to map to. The resulting address will be constructed in the following way: All 'one' bits in the mask are filled in from the new 'address'. All bits that are zero in the mask are filled in from the original address.

Obviously, given the way it works, it is important that the range being used for rewriting have the same
netmask as the range being rewritten.

So here are the steps to follow. Each gateway needs to have a route to the other site's fake IP range pointing to the VPN peer:

gwa# ip route
10.0.0.0/24 dev tun0  proto kernel  scope link  src 10.0.0.2
192.168.200.0/24 via 10.0.0.1 dev tun0
192.168.10.0/24 dev eth0  proto kernel  scope link  src 192.168.10.254
...

gwb# ip route
192.168.100.0/24 via 10.0.0.2 dev tun0
10.0.0.0/24 dev tun0  proto kernel  scope link  src 10.0.0.1
192.168.10.0/24 dev eth0  proto kernel  scope link  src 192.168.10.254
...

If using OpenVPN for the VPN, the routes can easily be pushed/pulled through OpenVPN's configuration; alternatively they can be manually created upon connection establishment.

Then, suitable iptables rules need to be created (these can be created before or after the VPN is established, although the VPN needs to be up for them to work):

# rewrite destination address on incoming traffic
gwa# iptables -t nat -A PREROUTING -s 192.168.200.0/24 -d 192.168.100.0/24 -i tun0 -j NETMAP --to 192.168.10.0/24

# rewrite source address on outgoing traffic
gwa# iptables -t nat -A POSTROUTING -s 192.168.10.0/24 -d 192.168.200.0/24 -o tun0 -j NETMAP --to 192.168.100.0/24

# same things, on gwb
gwb# iptables -t nat -A PREROUTING -s 192.168.100.0/24 -d 192.168.200.0/24 -i tun0 -j NETMAP --to 192.168.10.0/24
gwb# iptables -t nat -A POSTROUTING -s 192.168.10.0/24 -d 192.168.100.0/24 -o tun0 -j NETMAP --to 192.168.200.0/24

Let's check that it works:

anyhost-siteA$ ping 192.168.200.211
PING 192.168.200.211 (192.168.200.211) 56(84) bytes of data.
64 bytes from 192.168.200.211: icmp_req=1 ttl=64 time=20.7 ms
64 bytes from 192.168.200.211: icmp_req=2 ttl=64 time=20.4 ms
64 bytes from 192.168.200.211: icmp_req=3 ttl=64 time=21.1 ms
64 bytes from 192.168.200.211: icmp_req=4 ttl=64 time=20.5 ms

Tools like tcpdump or wireshark can be used at each interface along the path to verify that addresses are being rewritten as they should.

Of course, if using names to communicate, appropriate DNS entries pointing to the other site's fake IPs have to
be configured at each site.

Conclusion

This is a quick and dirty setup (to say the least). As such, in the grand tradition of how things usually work in this field, once it's in place it will never be touched anymore (that is, as long as it works), and the poor sysadmin who wanted to renumber one of the sites will never be able to do so. That's life.

One downside of this setup (besides being a stinking kludge, which should be enough to keep people from using it) is that the gateways cannot talk to the hosts at the other site; if that is really needed, it can certainly be done, thus making the whole thing even more kludgy. To accomplish it, all that is needed is that each gateway be configured to use its internal IP as source address when talking to the other site's fake IPs. Implementing this is left as an exercise for the reader. (Hint: it's as easy as using the "src" argument to "ip route").

Filed under linux, networking, worksforme. Tagged iptables, kludges, NAT, routing, vpn

Comments Off

Look for multiple patterns in files

Posted by waldner on 8 March 2012, 4:36 pm

This is a really trite one, but seems to be an evergreen. How do I look for multiple patterns? This usually means one of two things: select the lines that contain /PAT1/ or /PAT2/ or /PAT3/, or select the lines that contain /PAT1/ and /PAT2/ and /PAT3/ (in that order or not).
These two tasks can be solved in many ways, let's see the most common ones.

/PAT1/ or /PAT2/ or /PAT3/

This matches lines that contain any one of a number of patterns (three in this examples). It can be done with many tools.

grep

grep -E 'PAT1|PAT2|PAT3' file.txt

grep -e 'PAT1' -e 'PAT2' -e 'PAT3' file.txt

When using -E, the patterns have to be in ERE syntax, while with -e the BRE syntax has to be used.

awk

awk '/PAT1|PAT2|PAT3/' file.txt

awk '/PAT1/ || /PAT2/ || /PAT3/' file.txt

Awk uses ERE syntax for its patterns.

sed

One way with sed:

sed '/PAT1/b; /PAT2/b; /PAT3/b; d' file.txt

Sed normally uses BRE syntax.

perl

perl -ne 'print if /PAT1/ or /PAT2/ or /PAT3/' file.txt

Perl patterns uses PCRE syntax (basically, a much richer superset of ERE).

/PAT1/ and /PAT2/ and /PAT3/

This selects lines that match all the patterns. Order may or may not matter. Again, it can be done with many tools.

grep

If order matters, one can simply do:

grep 'PAT1.*PAT2.*PAT3' file.txt

(or whatever order one wants).
If order does not matter, then with grep all the possible combinations have to be considered:

grep -e 'PAT1.*PAT2.*PAT3' -e 'PAT1.*PAT3.*PAT2' -e 'PAT2.*PAT1.*PAT3' -e 'PAT2.*PAT3.*PAT1' -e 'PAT3.*PAT1.*PAT2' -e 'PAT3.*PAT2.*PAT1' file.txt

Obviously, this method is not very efficient and does not scale well. This can be improved a little at the price of using multiple processes:

grep 'PAT1' file.txt | grep 'PAT2' | grep 'PAT3'

awk

Awk scales better:

awk '/PAT1/ && /PAT2/ && /PAT3/' file.txt   # order does not matter

This finds also lines where matches overlap.

sed

sed '/PAT1/!d; /PAT2/!d; /PAT3/!d' file.txt

This looks a bit weird, but it's obvious: if the line doesn't match any of the patterns, we delete it. At the end, the lines that "survive" and are thus printed must be only those that match all the patterns.

perl

This is basically the same as the awk solution:

perl -ne 'print if /PAT1/ and /PAT2/ and /PAT3/' file.txt

Filed under awk, faq, sed, shell, tips. Tagged awk, matching, perl, sed, shell, text processing

3 Comments

SSH port forwarding loop

Posted by waldner on 1 December 2011, 3:36 pm

Warning: this is totally silly and useless. Don't do this on production machines.

Let's try something silly for once.

tokyo$ ssh -L 5555:127.0.0.1:5555 user@moscow
moscow$ ssh -L 5555:127.0.0.1:5555 user@berlin
berlin$ ssh -L 5555:127.0.0.1:5555 user@newyork
newyork$ ssh -L 5555:127.0.0.1:5555 user@tokyo
tokyo$ echo a | netcat 127.0.0.1 5555
# after a while...
channel 1017: open failed: administratively prohibited: open failed

Use tcpdump on any of the hosts to watch your "a" go round the world endlessly (well, almost: just until all available file descriptors are eaten up). It works even without piping the "a" into netcat.

If you don't have machines around the world, a simpler (but admittedly less
dramatic) way of doing the same thing with a single machine is:

host1# ssh -L 5555:127.0.0.1:5555 user@127.0.0.1
host1# echo a | netcat 127.0.0.1 5555

and of course any number of machines can be chained this way, as long as it's possible to ssh from the last into the first.

Yes, I did say it was silly.

Filed under shell. Tagged port forwarding, ssh

Comments Off

Remove duplicates, but keeping only the last occurrence

Posted by waldner on 17 November 2011, 9:55 am

Again on IRC, somebody asked how to remove duplicates from a file, but keeping only the last occurrence of each item. The classic awk idiom

awk '!a[$0]++'

prints only the first instance of every line. So if the input is, for example

foo
bar
baz
foo
xxx
yyy
bar

the "normal" output (ie using the classic idiom) would be

foo
bar
baz
xxx
yyy

whereas in this particular formulation of the task we want instead

baz
foo
xxx
yyy
bar

Of course, one may check a specific field rather than $0 (which is probably more useful), but the general technique is the same.

Turns out that the problem is not as simple as it may seem. Let's start by seeing how we can find out where the last occurrence of a key is in the file:

{pos[$0] = NR}

After reading the whole file, pos["foo"] for example will contain the record number where "foo" was last seen, that is, its last occurrence. (If we were looking for a specific field rather than $0 and we wanted to print the whole line, we would have to save it - this will be shown below after the example with $0 is complete; it doesn't really change the logic).

Now that we have the pos[] array populated, we have to print it in ascending order of its values, which aren't known a priori (and we can only traverse the array using its keys).

At this point, one may think of doing some kind of sorting, but let's see whether it's possible to avoid that. For example, we can (using another common awk idiom) swap the keys and the values:

END {
  for(key in pos) reverse[pos[key]] = key
  ...

Now the array reverse[] uses record numbers as keys, and keys as values. We still don't know what those record numbers are, but now that they are used as indices, we can easily check whether a specific record number is present, so all we need is

  ...
  for(nr=1;nr<=NR;nr++)
    if(nr in reverse) print reverse[nr]
}

to print them in ascending order of the indices (ie, record numbers).

So the resulting awk code is

{pos[$0] = NR}
END {
  for(key in pos) reverse[pos[key]] = key
  for(nr=1;nr<=NR;nr++)
    if(nr in reverse) print reverse[nr]
}

Now the last detail: what if we wanted to check for duplicates on a specific field rather than the whole line? The code just needs to be changed slightly to remember the lines we need:

# for example. using $3 as a key
{pos[$3] = NR; lines[$3] = $0}
END {
  for(key in pos) reverse[pos[key]] = key
  for(nr=1;nr<=NR;nr++)
    if(nr in reverse) print lines[reverse[nr]]
}

and there we have it.

Filed under awk, shell, tips. Tagged awk, sorting, text processing

3 Comments

Buildbot in 5 minutes

Posted by waldner on 8 October 2011, 3:01 pm

(Ok, maybe 10.)

Buildbot is really an excellent piece of software, however it can be a bit confusing for a newcomer (like me when I first started looking at it). Typically, at first sight it looks like a bunch of complicated concepts that make no sense and whose relationships with each other are unclear. After some time and some reread, it all slowly starts to be more and more meaningful, until you finally say "oh!" and things start to make sense. Once you get there, you realize that the documentation is great, but only if you already know what it's about.

This is what happened to me, at least. Here I'm going to (try to) explain things in a way that would have helped me more as a newcomer. The approach I'm taking is more or less the reverse of that used by the documentation, that is, I'm going to start from the components that do the actual work (the builders) and go up the chain from there up to change sources. I hope purists will forgive this unorthodoxy.
Here I'm trying to clarify the concepts only, and will not go into the details of each object or property; the documentation explains those quite well.

Installation

I won't cover the installation; both buildbot master and slave are available as packages for the major distributions, and in any case the instructions in the official documentation are fine. This document will refer to buildbot 0.8.5 which was current at the time of writing, but hopefully the concepts are not too different in other versions.
All the code shown is of course python code, and has to be included in the master.cfg master configuration file.
We won't cover the basic things such as how to define the slaves, project names, or other administrative information that is contained in that file; for that, again the official documentation is fine.

Builders: the workhorses

Since buildbot is a tool whose goal is the automation of software builds, it makes sense to me to start from where we tell buildbot how to build our software: the builder (or builders, since there can be more than one).
Simply put, a builder is an element that is in charge of performing some action or sequence of actions, normally something related to building software (for example, checking out the source, or "make all"), but it can also run arbitrary commands.
A builder is configured with a list of slaves that it can use to carry out its task. The other fundamental piece of information that a builder needs is, of course, the list of things it has to do (which will normally run on the chosen slave). In buildbot, this list of things is represented as a BuildFactory object, which is essentially a sequence of steps, each one defining a certain operation or command.
Enough talk, let's see an example. For this example, we are going to assume that our super software project can be built using a simple "make all", and there is another target "make packages" that creates rpm, deb and tgz packages of the binaries. In the real world things are usually more complex (for example there may be a "configure" step, or multiple targets), but the concepts are the same; it will just be a matter of adding more steps to a builder, or creating multiple builders, although sometimes the resulting builders can be quite complex.

So to perform a manual build of our project we would type this from the command line (assuming we are at the root of the local copy of the repository):

$ make clean    # clean remnants of previous builds
...
$ svn update
...
$ make all
...
$ make packages
...
# optional but included in the example: copy packages to some central machine
$ scp packages/*.rpm packages/*.deb packages/*.tgz someuser@somehost:/repository
...

Here we're assuming the repository is SVN, but again the concepts are the same with git, mercurial or any other VCS.

Now, to automate this, we create a builder where each step is one of the commands we typed above. A step can be a shell command object, or a dedicated object that checks out the source code (there are various types for different repositories, see the docs for more info), or yet something else.

from buildbot.process.factory import BuildFactory
from buildbot.steps.source import SVN
from buildbot.steps.shell import ShellCommand
 
# first, let's create the individual step objects
 
# step 1: make clean; this fails if the slave has no local copy, but
# is harmless and will only happen the first time
makeclean = ShellCommand(name = "make clean", 
                         command = ["make", "clean"], 
                         description = "make clean")
 
# step 2: svn update (here updates trunk, see the docs for more
# on how to update a branch, or make it more generic).
checkout = SVN(baseURL = 'svn://myrepo/projects/coolproject/trunk',
               mode = "update",
               username = "foo",
               password = "bar",
               haltOnFailure = True )
 
# step 3: make all
makeall = ShellCommand(name = "make all",
                       command = ["make", "all"], 
                       haltOnFailure = True, 
                       description = "make all")
 
# step 4: make packages
makepackages = ShellCommand(name = "make packages",
                            command = ["make", "packages"],
                            haltOnFailure = True,
                            description = "make packages")
 
# step 5: upload packages to central server. This needs passwordless ssh
# from the slave to the server (set it up in advance as part of slave setup)
uploadpackages = ShellCommand(name = "upload packages", 
                              description = "upload packages", 
                              command = "scp packages/*.rpm packages/*.deb packages/*.tgz someuser@somehost:/repository",
                              haltOnFailure = True)
 
# create the build factory and add the steps to it
f_simplebuild = BuildFactory()
f_simplebuild.addStep(makeclean)
f_simplebuild.addStep(checkout)
f_simplebuild.addStep(makeall)
f_simplebuild.addStep(makepackages)
f_simplebuild.addStep(uploadpackages)
 
# finally, declare the list of builders. In this case, we only have one builder
c['builders'] = [
    BuilderConfig(name = "simplebuild", slavenames = ['slave1', 'slave2', 'slave3'], factory = f_simplebuild)
]

So our builder is called "simplebuild" and can run on either of slave1, slave2 and slave3.
If our repository has other branches besides trunk, we could create another one or more builders to build them; in the example, only the checkout step would be different, in that it would need to check out the specific branch. Depending on how exactly those branches have to be built, the shell commands may be recycled, or new ones would have to be created if they are different in the branch. You get the idea. The important thing is that all the builders be named differently and all be added to the c['builders'] value (as can be seen above, it is a list of BuilderConfig objects).

Of course the type and number of steps will vary depending on the goal; for example, to just check that a commit doesn't break the build, we could include just up to the "make all" step. Or we could have a builder that performs a more thorough test by also doing "make test" or other targets. You get the idea. Note that at each step except the very first we use haltOnFailure = True because it would not make sense to execute a step if the previous one failed (ok, it wouldn't be needed for the last step, but it's harmless and protects us if one day we add another step after it).

Schedulers

Now this is all nice and dandy, but who tells the builder (or builders) to run, and when? This is the job of the scheduler, which is a fancy name for an element that waits for some event to happen, and when it does, based on that information decides whether and when to run a builder (and which one or ones). There can be more than one scheduler.
I'm being purposely vague here because the possibilities are almost endless and highly dependent on the actual setup, build purposes, source repository layout and other elements.

So a scheduler needs to be configured with two main pieces of information: on one hand, which events to react to, and on the other hand, which builder or builders to trigger when those events are detected. (It's more complex than that, but if you understand this, you can get the rest of the details from the docs).

A simple type of scheduler may be a periodic scheduler: when a configurable amount of time has passed, run a certain builder (or builders). In our example, that's how we would trigger a build every hour:

from buildbot.schedulers.timed import Periodic
 
# define the periodic scheduler
hourlyscheduler = Periodic(name = "hourly",
                           builderNames = ["simplebuild"],
                           periodicBuildTimer = 3600)
 
# define the available schedulers
c['schedulers'] = [ hourlyscheduler ]

That's it. Every hour this "hourly" scheduler will run the "simplebuild" builder. If we have more than one builder that we want to run every hour, we can just add them to the builderNames list when defining the scheduler and they will all be run.
Or since multiple scheduler are allowed, other schedulers can be defined and added to c['schedulers'] in the same way.

Other types of schedulers exist; in particular, there are schedulers that can be more dynamic than the periodic one. The typical dynamic scheduler is one that learns about changes in a source repository (generally because some developer checks in some change), and triggers one or more builders in response to those changes. Let's assume for now that the scheduler "magically" learns about changes in the repository (more about this later); here's how we would define it:

from buildbot.schedulers.basic import SingleBranchScheduler
from buildbot.changes import filter
 
# define the dynamic scheduler
trunkchanged = SingleBranchScheduler(name = "trunkchanged",
                                     change_filter = filter.ChangeFilter(branch = None),
                                     treeStableTimer = 300,
                                     builderNames = ["simplebuild"])
 
# define the available schedulers
c['schedulers'] = [ trunkchanged ]

This scheduler receives changes happening to the repository, and among all of them, pays attention to those happening in "trunk" (that's what branch = None means). In other words, it filters the changes to react only to those it's interested in. When such changes are detected, and the tree has been quiet for 5 minutes (300 seconds), it runs the "simplebuild" builder. The treeStableTimer helps in those situations where commits tend to happen in bursts, which would otherwise result in multiple build requests queuing up.

What if we want to act on two branches (say, trunk and 7.2)? First we create two builders, one for each branch (see the builders paragraph above), then we create two dynamic schedulers:

from buildbot.schedulers.basic import SingleBranchScheduler
from buildbot.changes import filter
 
# define the dynamic scheduler for trunk
trunkchanged = SingleBranchScheduler(name = "trunkchanged",
                                     change_filter = filter.ChangeFilter(branch = None),
                                     treeStableTimer = 300,
                                     builderNames = ["simplebuild-trunk"])
 
# define the dynamic scheduler for the 7.2 branch
branch72changed = SingleBranchScheduler(name = "branch72changed",
                                        change_filter = filter.ChangeFilter(branch = 'branches/7.2'),
                                        treeStableTimer = 300,
                                        builderNames = ["simplebuild-72"])
 
# define the available schedulers
c['schedulers'] = [ trunkchanged, branch72changed ]

The syntax of the change filter is VCS-dependent (above is for SVN), but again once the idea is clear, the documentation has all the details. Another feature of the scheduler is that is can be told which changes, within those it's paying attention to, are important and which are not. For example, there may be a documentation directory in the branch the scheduler is watching, but changes under that directory should not trigger a build of the binary. This finer filtering is implemented by means of the fileIsImportant argument to the scheduler (full details in the docs and - alas - in the sources).

Change sources

Earlier we said that a dynamic scheduler "magically" learns about changes; the final piece of the puzzle are change sources, which are precisely the elements in buildbot whose task is to detect changes in the repository and communicate them to the schedulers. Note that periodic schedulers don't need a change source, since they only depend on elapsed time; dynamic schedulers, on the other hand, do need a change source.

A change source is generally configured with information about a source repository (which is where changes happen); a change source can watch changes at different levels in the hierarchy of the repository, so for example it is possible to watch the whole repository or a subset of it, or just a single branch. This determines the extent of the information that is passed down to the schedulers.

There are many ways a change source can learn about changes; it can periodically poll the repository for changes, or the VCS can be configured (for example through hook scripts triggered by commits) to push changes into the change source. While these two methods are probably the most common, they are not the only possibilities; it is possible for example to have a change source detect changes by parsing some email sent to a mailing list when a commit happen, and yet other methods exist. The manual again has the details.

To complete our example, here's a change source that polls a SVN repository every 2 minutes:

from buildbot.changes.svnpoller import SVNPoller, split_file_branches
 
svnpoller = SVNPoller(svnurl = "svn://myrepo/projects/coolproject",
                      svnuser = "foo",
                      svnpasswd = "bar",
                      pollinterval = 120,
                      split_file = split_file_branches)
 
c['change_source'] = svnpoller

This poller watches the whole "coolproject" section of the repository, so it will detect changes in all the branches. We could have said svnurl = "svn://myrepo/projects/coolproject/trunk" or svnurl = "svn://myrepo/projects/coolproject/branches/7.2" to watch only a specific branch, or conversely we could have said svnurl = "svn://myrepo/projects" to watch all the projects. Of course depending on what the change source watches, the number and configuration of schedulers and builders down the line will have to change accordingly (but see the update below). And of course this is all SVN-specific, but there are pollers for all the popular VCSs.
Since we're watching more than one branch, we need a method to tell in which branch the change occurred when we detect one. This is what the split_file argument does, it takes a callable that buildbot will call to do the job. The split_file_branches function, which comes with buildbot, is designed for exactly this purpose so that's what the example above uses.

Status targets

Now that the basics are in place, let's go back to the builders, which is where the real work happens. Status targets are simply the means buildbot uses to inform the world about what's happening, that is, how builders are doing. There are many status target: a web interface, a mail notifier, an IRC notifier, and others. They are described fairly well in the manual.
One thing I've found useful is the ability to pass a domain name as the lookup argument to a mailNotifier, which allows to take an unqualified username as it appears in the SVN change and create a valid email address by appending the given domain name to it:

from buildbot.status import mail
 
# if jsmith commits a change, mail for the build is sent to jsmith@example.org
notifier = mail.MailNotifier(fromaddr = "buildbot@example.org",
                             sendToInterestedUsers = True,
                             lookup = "example.org")
c['status'].append(notifier)

The mail notifier can be customized at will by means of the messageFormatter argument, which is a function that buildbot calls to format the body of the email, and to which it makes available lots of information about the build. Here all the details.

Conclusion

Please note that this article has just scratched the surface; given the complexity of the task of build automation, the possiblities are almost endless. So there's much, much more to say about buildbot. However, hopefully this is a preparation step before reading the official manual. Had I found an explanation as the one above when I was approaching buildbot, I'd have had to read the manual just once, rather than multiple times. Hope this can help comeone else.

Update 11/10/2011: it looks like what I said about the change source is not entirely correct. It's true that the repository can be watched at different heights in the hierarchy, but within a single project. To watch multiple projects, at least with SVNPoller, multiple change sources have to be defined, each watching a project. For example, with the repository mentioned above:

from buildbot.changes.svnpoller import SVNPoller, split_file_branches
 
svnpoller_coolproject = SVNPoller(svnurl = "svn://myrepo/projects/coolproject",
                                  project = "coolproject",
                                  svnuser = "foo",
                                  svnpasswd = "bar",
                                  pollinterval = 120,
                                  split_file = split_file_branches)
 
svnpoller_superproject = SVNPoller(svnurl = "svn://myrepo/projects/superproject",
                                   project = "superproject",
                                   svnuser = "foo",
                                   svnpasswd = "bar",
                                   pollinterval = 120,
                                   split_file = split_file_branches)
 
 
# this can be a list
c['change_source'] = [ svnpoller_coolproject, svnpoller_superproject ]

This means that changes to both projects (along with their branch etc.) will be passed down to the schedulers, which will thus need to filter on project in addition to branch:

from buildbot.schedulers.basic import SingleBranchScheduler
from buildbot.changes import filter
 
# define the dynamic scheduler for project coolproject, trunk
coolproject_trunkchanged = SingleBranchScheduler(name = "coolproject-trunkchanged",
                                                 change_filter = filter.ChangeFilter(project = "coolproject", branch = None),
                                                 treeStableTimer = 300,
                                                 builderNames = ["simplebuild-coolproject-trunk"])
 
# define the dynamic scheduler for project coolproject, the 7.2 branch
coolproject_branch72changed = SingleBranchScheduler(name = "coolproject-branch72changed",
                                                    change_filter = filter.ChangeFilter(project = "coolproject", branch = 'branches/7.2'),
                                                    treeStableTimer = 300,
                                                    builderNames = ["simplebuild-coolproject-72"])
 
# define the dynamic scheduler for project superproject, trunk
superproject_trunkchanged = SingleBranchScheduler(name = "superproject-trunkchanged",
                                                 change_filter = filter.ChangeFilter(project = "superproject", branch = None),
                                                 treeStableTimer = 300,
                                                 builderNames = ["simplebuild-superproject-trunk"])
 
# define the dynamic scheduler for project superproject, the 7.1 branch
superproject_branch71changed = SingleBranchScheduler(name = "superproject-branch71changed",
                                                    change_filter = filter.ChangeFilter(project = "superproject", branch = 'branches/7.1'),
                                                    treeStableTimer = 300,
                                                    builderNames = ["simplebuild-superproject-71"])
 
# define the available schedulers
c['schedulers'] = [ coolproject_trunkchanged, coolproject_branch72changed, superproject_trunkchanged, superproject_branch71changed ]

And of course the above also implies that the appropriate builders must be created (not shown, but basically each builder will have to include the correct checkout step to update from the appropriate project/branch).

With many projects, branches and builders it probably pays to not hardcode all the schedulers and builders in the configuration, but generate them dynamically starting from list of all projects, branches, targets etc. and using loops to generate all possible combinations (or only the needed ones, depending on the specific setup), as explained in the documentation chapter about Programmatic Configuration Generation.

Filed under linux, worksforme. Tagged buildbot, continuous integration, linux, python

5 Comments

\1

Bidirectional full-cone NAT (bleah)

Conclusion

Look for multiple patterns in files

/PAT1/ or /PAT2/ or /PAT3/

grep

awk

sed

perl

/PAT1/ and /PAT2/ and /PAT3/

grep

awk

sed

perl

SSH port forwarding loop

Remove duplicates, but keeping only the last occurrence

Buildbot in 5 minutes

Installation

Builders: the workhorses

Schedulers

Change sources

Status targets

Conclusion

BTC

Recent Posts

Categories

Archives

\1

Bidirectional full-cone NAT (bleah)

Conclusion

Look for multiple patterns in files

/PAT1/ or /PAT2/ or /PAT3/

grep

awk

sed

perl

/PAT1/ and /PAT2/ and /PAT3/

grep

awk

sed

perl

SSH port forwarding loop

Remove duplicates, but keeping only the last occurrence

Buildbot in 5 minutes

Installation

Builders: the workhorses

Schedulers

Change sources

Status targets

Conclusion

BTC

Recent Posts

Categories

Tags

Archives