Skip to content

SSH port forwarding loop

Warning: this is totally silly and useless. Don't do this on production machines.

Let's try something silly for once.

tokyo$ ssh -L 5555:127.0.0.1:5555 user@moscow
moscow$ ssh -L 5555:127.0.0.1:5555 user@berlin
berlin$ ssh -L 5555:127.0.0.1:5555 user@newyork
newyork$ ssh -L 5555:127.0.0.1:5555 user@tokyo
tokyo$ echo a | netcat 127.0.0.1 5555
# after a while...
channel 1017: open failed: administratively prohibited: open failed

Use tcpdump on any of the hosts to watch your "a" go round the world endlessly (well, almost: just until all available file descriptors are eaten up). It works even without piping the "a" into netcat.

If you don't have machines around the world, a simpler (but admittedly less
dramatic) way of doing the same thing with a single machine is:

host1# ssh -L 5555:127.0.0.1:5555 user@127.0.0.1
host1# echo a | netcat 127.0.0.1 5555

and of course any number of machines can be chained this way, as long as it's possible to ssh from the last into the first.

Yes, I did say it was silly.

Remove duplicates, but keeping only the last occurrence

Again on IRC, somebody asked how to remove duplicates from a file, but keeping only the last occurrence of each item. The classic awk idiom

awk '!a[$0]++'

prints only the first instance of every line. So if the input is, for example

foo
bar
baz
foo
xxx
yyy
bar

the "normal" output (ie using the classic idiom) would be

foo
bar
baz
xxx
yyy

whereas in this particular formulation of the task we want instead

baz
foo
xxx
yyy
bar

Of course, one may check a specific field rather than $0 (which is probably more useful), but the general technique is the same.

Turns out that the problem is not as simple as it may seem. Let's start by seeing how we can find out where the last occurrence of a key is in the file:

{pos[$0] = NR}

After reading the whole file, pos["foo"] for example will contain the record number where "foo" was last seen, that is, its last occurrence. (If we were looking for a specific field rather than $0 and we wanted to print the whole line, we would have to save it - this will be shown below after the example with $0 is complete; it doesn't really change the logic).

Now that we have the pos[] array populated, we have to print it in ascending order of its values, which aren't known a priori (and we can only traverse the array using its keys).

At this point, one may think of doing some kind of sorting, but let's see whether it's possible to avoid that. For example, we can (using another common awk idiom) swap the keys and the values:

END {
  for(key in pos) reverse[pos[key]] = key
  ...

Now the array reverse[] uses record numbers as keys, and keys as values. We still don't know what those record numbers are, but now that they are used as indices, we can easily check whether a specific record number is present, so all we need is

  ...
  for(nr=1;nr<=NR;nr++)
    if(nr in reverse) print reverse[nr]
}

to print them in ascending order of the indices (ie, record numbers).

So the resulting awk code is

{pos[$0] = NR}
END {
  for(key in pos) reverse[pos[key]] = key
  for(nr=1;nr<=NR;nr++)
    if(nr in reverse) print reverse[nr]
}

Now the last detail: what if we wanted to check for duplicates on a specific field rather than the whole line? The code just needs to be changed slightly to remember the lines we need:

# for example. using $3 as a key
{pos[$3] = NR; lines[$3] = $0}
END {
  for(key in pos) reverse[pos[key]] = key
  for(nr=1;nr<=NR;nr++)
    if(nr in reverse) print lines[reverse[nr]]
}

and there we have it.

Buildbot in 5 minutes

(Ok, maybe 10.)

Buildbot is really an excellent piece of software, however it can be a bit confusing for a newcomer (like me when I first started looking at it). Typically, at first sight it looks like a bunch of complicated concepts that make no sense and whose relationships with each other are unclear. After some time and some reread, it all slowly starts to be more and more meaningful, until you finally say "oh!" and things start to make sense. Once you get there, you realize that the documentation is great, but only if you already know what it's about.

This is what happened to me, at least. Here I'm going to (try to) explain things in a way that would have helped me more as a newcomer. The approach I'm taking is more or less the reverse of that used by the documentation, that is, I'm going to start from the components that do the actual work (the builders) and go up the chain from there up to change sources. I hope purists will forgive this unorthodoxy.
Here I'm trying to clarify the concepts only, and will not go into the details of each object or property; the documentation explains those quite well.

Installation

I won't cover the installation; both buildbot master and slave are available as packages for the major distributions, and in any case the instructions in the official documentation are fine. This document will refer to buildbot 0.8.5 which was current at the time of writing, but hopefully the concepts are not too different in other versions.
All the code shown is of course python code, and has to be included in the master.cfg master configuration file.
We won't cover the basic things such as how to define the slaves, project names, or other administrative information that is contained in that file; for that, again the official documentation is fine.

Builders: the workhorses

Since buildbot is a tool whose goal is the automation of software builds, it makes sense to me to start from where we tell buildbot how to build our software: the builder (or builders, since there can be more than one).
Simply put, a builder is an element that is in charge of performing some action or sequence of actions, normally something related to building software (for example, checking out the source, or "make all"), but it can also run arbitrary commands.
A builder is configured with a list of slaves that it can use to carry out its task. The other fundamental piece of information that a builder needs is, of course, the list of things it has to do (which will normally run on the chosen slave). In buildbot, this list of things is represented as a BuildFactory object, which is essentially a sequence of steps, each one defining a certain operation or command.
Enough talk, let's see an example. For this example, we are going to assume that our super software project can be built using a simple "make all", and there is another target "make packages" that creates rpm, deb and tgz packages of the binaries. In the real world things are usually more complex (for example there may be a "configure" step, or multiple targets), but the concepts are the same; it will just be a matter of adding more steps to a builder, or creating multiple builders, although sometimes the resulting builders can be quite complex.

So to perform a manual build of our project we would type this from the command line (assuming we are at the root of the local copy of the repository):

$ make clean    # clean remnants of previous builds
...
$ svn update
...
$ make all
...
$ make packages
...
# optional but included in the example: copy packages to some central machine
$ scp packages/*.rpm packages/*.deb packages/*.tgz someuser@somehost:/repository
...

Here we're assuming the repository is SVN, but again the concepts are the same with git, mercurial or any other VCS.

Now, to automate this, we create a builder where each step is one of the commands we typed above. A step can be a shell command object, or a dedicated object that checks out the source code (there are various types for different repositories, see the docs for more info), or yet something else.

from buildbot.process.factory import BuildFactory
from buildbot.steps.source import SVN
from buildbot.steps.shell import ShellCommand

# first, let's create the individual step objects

# step 1: make clean; this fails if the slave has no local copy, but
# is harmless and will only happen the first time
makeclean = ShellCommand(name = "make clean",
                         command = ["make", "clean"],
                         description = "make clean")

# step 2: svn update (here updates trunk, see the docs for more
# on how to update a branch, or make it more generic).
checkout = SVN(baseURL = 'svn://myrepo/projects/coolproject/trunk',
               mode = "update",
               username = "foo",
               password = "bar",
               haltOnFailure = True )

# step 3: make all
makeall = ShellCommand(name = "make all",
                       command = ["make", "all"],
                       haltOnFailure = True,
                       description = "make all")

# step 4: make packages
makepackages = ShellCommand(name = "make packages",
                            command = ["make", "packages"],
                            haltOnFailure = True,
                            description = "make packages")

# step 5: upload packages to central server. This needs passwordless ssh
# from the slave to the server (set it up in advance as part of slave setup)
uploadpackages = ShellCommand(name = "upload packages",
                              description = "upload packages",
                              command = "scp packages/*.rpm packages/*.deb packages/*.tgz someuser@somehost:/repository",
                              haltOnFailure = True)

# create the build factory and add the steps to it
f_simplebuild = BuildFactory()
f_simplebuild.addStep(makeclean)
f_simplebuild.addStep(checkout)
f_simplebuild.addStep(makeall)
f_simplebuild.addStep(makepackages)
f_simplebuild.addStep(uploadpackages)

# finally, declare the list of builders. In this case, we only have one builder
c['builders'] = [
    BuilderConfig(name = "simplebuild", slavenames = ['slave1', 'slave2', 'slave3'], factory = f_simplebuild)
]

So our builder is called "simplebuild" and can run on either of slave1, slave2 and slave3.
If our repository has other branches besides trunk, we could create another one or more builders to build them; in the example, only the checkout step would be different, in that it would need to check out the specific branch. Depending on how exactly those branches have to be built, the shell commands may be recycled, or new ones would have to be created if they are different in the branch. You get the idea. The important thing is that all the builders be named differently and all be added to the c['builders'] value (as can be seen above, it is a list of BuilderConfig objects).

Of course the type and number of steps will vary depending on the goal; for example, to just check that a commit doesn't break the build, we could include just up to the "make all" step. Or we could have a builder that performs a more thorough test by also doing "make test" or other targets. You get the idea. Note that at each step except the very first we use haltOnFailure = True because it would not make sense to execute a step if the previous one failed (ok, it wouldn't be needed for the last step, but it's harmless and protects us if one day we add another step after it).

Schedulers

Now this is all nice and dandy, but who tells the builder (or builders) to run, and when? This is the job of the scheduler, which is a fancy name for an element that waits for some event to happen, and when it does, based on that information decides whether and when to run a builder (and which one or ones). There can be more than one scheduler.
I'm being purposely vague here because the possibilities are almost endless and highly dependent on the actual setup, build purposes, source repository layout and other elements.

So a scheduler needs to be configured with two main pieces of information: on one hand, which events to react to, and on the other hand, which builder or builders to trigger when those events are detected. (It's more complex than that, but if you understand this, you can get the rest of the details from the docs).

A simple type of scheduler may be a periodic scheduler: when a configurable amount of time has passed, run a certain builder (or builders). In our example, that's how we would trigger a build every hour:

from buildbot.schedulers.timed import Periodic

# define the periodic scheduler
hourlyscheduler = Periodic(name = "hourly",
                           builderNames = ["simplebuild"],
                           periodicBuildTimer = 3600)

# define the available schedulers
c['schedulers'] = [ hourlyscheduler ]

That's it. Every hour this "hourly" scheduler will run the "simplebuild" builder. If we have more than one builder that we want to run every hour, we can just add them to the builderNames list when defining the scheduler and they will all be run.
Or since multiple scheduler are allowed, other schedulers can be defined and added to c['schedulers'] in the same way.

Other types of schedulers exist; in particular, there are schedulers that can be more dynamic than the periodic one. The typical dynamic scheduler is one that learns about changes in a source repository (generally because some developer checks in some change), and triggers one or more builders in response to those changes. Let's assume for now that the scheduler "magically" learns about changes in the repository (more about this later); here's how we would define it:

from buildbot.schedulers.basic import SingleBranchScheduler
from buildbot.changes import filter

# define the dynamic scheduler
trunkchanged = SingleBranchScheduler(name = "trunkchanged",
                                     change_filter = filter.ChangeFilter(branch = None),
                                     treeStableTimer = 300,
                                     builderNames = ["simplebuild"])

# define the available schedulers
c['schedulers'] = [ trunkchanged ]

This scheduler receives changes happening to the repository, and among all of them, pays attention to those happening in "trunk" (that's what branch = None means). In other words, it filters the changes to react only to those it's interested in. When such changes are detected, and the tree has been quiet for 5 minutes (300 seconds), it runs the "simplebuild" builder. The treeStableTimer helps in those situations where commits tend to happen in bursts, which would otherwise result in multiple build requests queuing up.

What if we want to act on two branches (say, trunk and 7.2)? First we create two builders, one for each branch (see the builders paragraph above), then we create two dynamic schedulers:

from buildbot.schedulers.basic import SingleBranchScheduler
from buildbot.changes import filter

# define the dynamic scheduler for trunk
trunkchanged = SingleBranchScheduler(name = "trunkchanged",
                                     change_filter = filter.ChangeFilter(branch = None),
                                     treeStableTimer = 300,
                                     builderNames = ["simplebuild-trunk"])

# define the dynamic scheduler for the 7.2 branch
branch72changed = SingleBranchScheduler(name = "branch72changed",
                                        change_filter = filter.ChangeFilter(branch = 'branches/7.2'),
                                        treeStableTimer = 300,
                                        builderNames = ["simplebuild-72"])

# define the available schedulers
c['schedulers'] = [ trunkchanged, branch72changed ]

The syntax of the change filter is VCS-dependent (above is for SVN), but again once the idea is clear, the documentation has all the details. Another feature of the scheduler is that is can be told which changes, within those it's paying attention to, are important and which are not. For example, there may be a documentation directory in the branch the scheduler is watching, but changes under that directory should not trigger a build of the binary. This finer filtering is implemented by means of the fileIsImportant argument to the scheduler (full details in the docs and - alas - in the sources).

Change sources

Earlier we said that a dynamic scheduler "magically" learns about changes; the final piece of the puzzle are change sources, which are precisely the elements in buildbot whose task is to detect changes in the repository and communicate them to the schedulers. Note that periodic schedulers don't need a change source, since they only depend on elapsed time; dynamic schedulers, on the other hand, do need a change source.

A change source is generally configured with information about a source repository (which is where changes happen); a change source can watch changes at different levels in the hierarchy of the repository, so for example it is possible to watch the whole repository or a subset of it, or just a single branch. This determines the extent of the information that is passed down to the schedulers.

There are many ways a change source can learn about changes; it can periodically poll the repository for changes, or the VCS can be configured (for example through hook scripts triggered by commits) to push changes into the change source. While these two methods are probably the most common, they are not the only possibilities; it is possible for example to have a change source detect changes by parsing some email sent to a mailing list when a commit happen, and yet other methods exist. The manual again has the details.

To complete our example, here's a change source that polls a SVN repository every 2 minutes:

from buildbot.changes.svnpoller import SVNPoller, split_file_branches

svnpoller = SVNPoller(svnurl = "svn://myrepo/projects/coolproject",
                      svnuser = "foo",
                      svnpasswd = "bar",
                      pollinterval = 120,
                      split_file = split_file_branches)

c['change_source'] = svnpoller

This poller watches the whole "coolproject" section of the repository, so it will detect changes in all the branches. We could have said svnurl = "svn://myrepo/projects/coolproject/trunk" or svnurl = "svn://myrepo/projects/coolproject/branches/7.2" to watch only a specific branch, or conversely we could have said svnurl = "svn://myrepo/projects" to watch all the projects. Of course depending on what the change source watches, the number and configuration of schedulers and builders down the line will have to change accordingly (but see the update below). And of course this is all SVN-specific, but there are pollers for all the popular VCSs.
Since we're watching more than one branch, we need a method to tell in which branch the change occurred when we detect one. This is what the split_file argument does, it takes a callable that buildbot will call to do the job. The split_file_branches function, which comes with buildbot, is designed for exactly this purpose so that's what the example above uses.

Status targets

Now that the basics are in place, let's go back to the builders, which is where the real work happens. Status targets are simply the means buildbot uses to inform the world about what's happening, that is, how builders are doing. There are many status target: a web interface, a mail notifier, an IRC notifier, and others. They are described fairly well in the manual.
One thing I've found useful is the ability to pass a domain name as the lookup argument to a mailNotifier, which allows to take an unqualified username as it appears in the SVN change and create a valid email address by appending the given domain name to it:

from buildbot.status import mail

# if jsmith commits a change, mail for the build is sent to jsmith@example.org
notifier = mail.MailNotifier(fromaddr = "buildbot@example.org",
                             sendToInterestedUsers = True,
                             lookup = "example.org")
c['status'].append(notifier)

The mail notifier can be customized at will by means of the messageFormatter argument, which is a function that buildbot calls to format the body of the email, and to which it makes available lots of information about the build. Here all the details.

Conclusion

Please note that this article has just scratched the surface; given the complexity of the task of build automation, the possiblities are almost endless. So there's much, much more to say about buildbot. However, hopefully this is a preparation step before reading the official manual. Had I found an explanation as the one above when I was approaching buildbot, I'd have had to read the manual just once, rather than multiple times. Hope this can help comeone else.

Update 11/10/2011: it looks like what I said about the change source is not entirely correct. It's true that the repository can be watched at different heights in the hierarchy, but within a single project. To watch multiple projects, at least with SVNPoller, multiple change sources have to be defined, each watching a project. For example, with the repository mentioned above:

from buildbot.changes.svnpoller import SVNPoller, split_file_branches

svnpoller_coolproject = SVNPoller(svnurl = "svn://myrepo/projects/coolproject",
                                  project = "coolproject",
                                  svnuser = "foo",
                                  svnpasswd = "bar",
                                  pollinterval = 120,
                                  split_file = split_file_branches)

svnpoller_superproject = SVNPoller(svnurl = "svn://myrepo/projects/superproject",
                                   project = "superproject",
                                   svnuser = "foo",
                                   svnpasswd = "bar",
                                   pollinterval = 120,
                                   split_file = split_file_branches)

# this can be a list
c['change_source'] = [ svnpoller_coolproject, svnpoller_superproject ]

This means that changes to both projects (along with their branch etc.) will be passed down to the schedulers, which will thus need to filter on project in addition to branch:

from buildbot.schedulers.basic import SingleBranchScheduler
from buildbot.changes import filter

# define the dynamic scheduler for project coolproject, trunk
coolproject_trunkchanged = SingleBranchScheduler(name = "coolproject-trunkchanged",
                                                 change_filter = filter.ChangeFilter(project = "coolproject", branch = None),
                                                 treeStableTimer = 300,
                                                 builderNames = ["simplebuild-coolproject-trunk"])

# define the dynamic scheduler for project coolproject, the 7.2 branch
coolproject_branch72changed = SingleBranchScheduler(name = "coolproject-branch72changed",
                                                    change_filter = filter.ChangeFilter(project = "coolproject", branch = 'branches/7.2'),
                                                    treeStableTimer = 300,
                                                    builderNames = ["simplebuild-coolproject-72"])

# define the dynamic scheduler for project superproject, trunk
superproject_trunkchanged = SingleBranchScheduler(name = "superproject-trunkchanged",
                                                  change_filter = filter.ChangeFilter(project = "superproject", branch = None),
                                                 treeStableTimer = 300,
                                                 builderNames = ["simplebuild-superproject-trunk"])

# define the dynamic scheduler for project superproject, the 7.1 branch
superproject_branch71changed = SingleBranchScheduler(name = "superproject-branch71changed",
                                                     change_filter = filter.ChangeFilter(project = "superproject", branch = 'branches/7.1'),
                                                    treeStableTimer = 300,
                                                    builderNames = ["simplebuild-superproject-71"])

# define the available schedulers
c['schedulers'] = [ coolproject_trunkchanged, coolproject_branch72changed, superproject_trunkchanged, superproject_branch71changed ]

And of course the above also implies that the appropriate builders must be created (not shown, but basically each builder will have to include the correct checkout step to update from the appropriate project/branch).

With many projects, branches and builders it probably pays to not hardcode all the schedulers and builders in the configuration, but generate them dynamically starting from list of all projects, branches, targets etc. and using loops to generate all possible combinations (or only the needed ones, depending on the specific setup), as explained in the documentation chapter about Programmatic Configuration Generation.

Some tips on RPM conditional macros

Admittedly, I hadn't been messing around with spec files too much, but recently I had to and I found some things that bit me and make me spend some time trying to figure out. Here I'm talking about RPM version 4.x, which is still the default in most (all?) distros.

Specifically, my troubles were with the %if conditional macro, whose exact behavior and syntax is sparsely documented and may appear to do strange things at times, so I'll just put my findings here to remember them (and hopefully they may perhaps be useful to others).

It all started when I naively tried to use this conditional to check whether a macro was defined (as most documentation that can be found on the net seems to suggest):

%if %{mymacro}
# ... do something
%endif

When the macro was defined, that worked; when it was not defined, I was getting the dreaded error

error: parse error in expression
error: /usr/src/foo/foo.spec:25: parseExpressionBoolean returns -1

A similar situation happened with this test for equality:

%if %{mymacro} == 100
# ... do something
%else
# ... do something else
%endif

When the macro was undefined, I was expecting either the whole %if/%else block to be skipped, or at least the %else branch to be executed (since something that's undefined is clearly not equal to 100). What was happening instead, was again the same "parse error in expression" error as before.

Trying to make it work, I did this:

%if "%{mymacro}" == "100"
# ... do something
%else
# ... do something else
%endif

Which appeared to work indeed - but as we'll see it wasn't working for the reason I thought, and could even produce false matches under some special circumstances (see below).

It still wasn't satisfactory, because I felt I wasn't really understanding what was going on, so I had that feeling that it all was working by chance. And still, I hadn't found a way to check something as simple as whether a macro is defined or not.

Finally, it occurred to me to add the following line at the very beginning of the %prep section in the spec file to see what was going on:

echo "the value of mymacro is --%{mymacro}--"

That helped a lot, because I discovered that if the macro is defined the expansion is

the value of mymacro is --100--

but when the macro is undefined, this is what happens:

the value of mymacro is --%{mymacro}--

that is, it is left untouched (rather then removed). So this is the reason why adding double quotes to the test seems to work: when the macro is undefined, the test executed is

"%{mymacro}" == "100"

(double quotes included) which is false. However, should one one day need to test against the literal string %{mymacro}, the test would actually succeed when the macro is undefined! Let's confirm this:

# spec file
%if "%{mymacro}" == "%%{mymacro}"
echo "equality test succeeded"
%endif

%% is used to put a literal percent sign.

If we invoke the spec file without defining %{mymacro}, the test succeeds:

$ rpmbuild -ba foo.spec
...
+ echo 'mymacro is --%{mymacro}--'
mymacro is --%{mymacro}--
+ echo 'equality test succeeded'
equality test succeeded
...

Granted, this is not very likely to happen in practice, but it still looks somewhat not clean.

What's needed, then, is a real way to check whether a macro is defined or not. We can't rely on the equality test we used above to determine that either, because the macro may have been defined, and have the literal value %{mymacro} assigned to it. Again this is very unlikely, but it's always better to make things in a clean way when possible.

The solution: conditional expansion

(I don't know if that is the right definition for this feature). There's a useful tool that the macro language used in the spec files has, and it's the %{?mymacro:value1} and %{!?mymacro:value2} syntax. The result of the expansion of those macros is value1 if %{mymacro} is defined, and value2 if %{mymacro} is undefined, respectively. If the condition they check is false, they expand to nothing.

Since the macro processor is recursive, this allows for the conditional definition of macros, for example:

%{?onemacro:%define anothermacro 100}

That defines the macro %{anothermacro} only if %{onemacro} is defined. But let's focus on the task at hand: determining whether a macro is defined (and after that, possibly do further tests on its value).

If we only need to check whether a macro is defined or not, these idioms seem to work:

%if %{?mymacro:1}%{!?mymacro:0}
# ... %{mymacro} is defined
%else
# ... %{mymacro} is not defined
%endif

The above expands either to 1 or 0 depending on whether %{mymacro} is defined or not.
This also seems to work:

%if 0%{?mymacro:1}
# ... %{mymacro} is defined
%else
# ... %{mymacro} is not defined
%endif

That expands to either 01 or 0, again corresponding to true or false for the %if.

If instead, we need to check that a macro has a specific value, this should work:

%if "%{?mymacro:%{mymacro}}%{!?mymacro:0}" == "somevalue"
# ... %{mymacro} is defined and has value "somevalue"
%endif

That expands to either the actual value of the macro, or 0 (in double quotes), either of which can safely be compared to "somevalue". If there can be no spaces in the values, the double quotes can be omitted. Obviously, the value returned when the macro is not defined doesn't have to be 0; it can be any value that's not somevalue.

Putting it all together, if we need to differentiate between the macro being unset or being set to a specific value, we can do this:

%if 0%{?mymacro:1}
# ... %{mymacro} is defined
%if "%{mymacro}" == "somevalue"
# ... %{mymacro} is defined and has value somevalue
%else
# ... %{mymacro} is defined, but has a value != somevalue
%endif
%else
# ... %{mymacro} is not defined
%endif

Again, if values can have no spaces, the double quotes can be omitted.

More elaborate variation of conditional expansion can be found in the official rpm documentation. For example, they use some sort of "function-like" macros as follows:

# Check if symbol is defined.
# Example usage: %if %{defined with_foo} && %{undefined with_bar} ...
%defined()      %{expand:%%{?%{1}:1}%%{!?%{1}:0}}
%undefined()    %{expand:%%{?%{1}:0}%%{!?%{1}:1}}

%{1} expands to the first "argument" passed to the macro. So when doing %{defined with_foo} what is done is actually

%{expand:%{?with_foo:1}%{!?with_foo:0}}

"expand" is like "eval", so the above effectively re-evaluates the part after expand: and eventually expands to either 1 or 0 depending on whether %{with_foo} was defined or not.

Update 27/09/2011: after some tests it looks like in conditional expansion the macro expands to its value by default if it's defined and to nothing if it's not, so the idiom

%{?mymacro:%{mymacro}}

can also be written just as

%{?mymacro}

Running local script remotely (with arguments)

So, we have a script we want or need to run on loads of remote servers, but we don't want to copy it to every server, to avoid a maintenance nightmare. (Let's not focus on why we may find ourselves in that situation; it may be because there is no choice, or whatever reason. Agreed it's undesirable. That's life.)
Another use case is as follows: we need to develop a script that will run on a machine where the editing tools aren't as good as those we are used to; so we want to develop on our familiar local machine, but once in a while, while developing it, we need to run it on the target machine (presumably to test it).

If the code to run is short and simple, we can just put it inline:

$ ssh user@remote 'our code here'

However this quickly becomes difficult to type, prone to errors, and proper quoting can be a nightmare too. So let's assume that the script is stored in a file.

Method 1: stdin

Well, ssh runs a shell on the remote host, so why not do

$ ssh user@remote < local.sh

Sure, that works and looks easy right? But things start to change if we need to pass arguments to the script.

Let's use the following example script (real ones, of course, will be much more complex):

# local.sh
printf 'Argument is __%s__\n' "$@"

This code is representative of the task, because it lets us check that arguments are seen correctly by the script even when it runs remotely. This is the critical part; if that works, we don't have to worry about the rest of the code; that will mostly "just work" (with some caveats, noted at the end).

So, since the remote shell is reading stdin anyway, this should also work:

$ ssh user@remote 'bash' < local.sh
Argument is ____

And so should this (but doesn't):

$ ssh user@remote 'bash /dev/stdin' < local.sh
bash: /dev/stdin: No such device or address

What's happening here? Let's try to find out:

$ ssh user@remote 'ls -l /dev/stdin' < local.sh
lrwxrwxrwx 1 root root 15 2011-07-08 10:45 /dev/stdin -> /proc/self/fd/0
$ ssh user@remote 'ls -l /proc/self/fd/0' < local.sh
lrwx------ 1 user users 64 2011-08-10 11:04 /proc/self/fd/0 -> socket:[28463861]

So stdin exists but it's connected to a UNIX socket (this is part of how ssh sets things up when connecting). Why is it failing?

$ ssh user@remote 'strace bash /dev/stdin' < local.sh
...
open("/dev/stdin", O_RDONLY)            = -1 ENXIO (No such device or address)
...

Doing some research, it appears that Linux disallows open()ing a socket, and returns ENXIO when that is attempted. (And yes, "bash < /dev/stdin" fails equally). Can we work around that? Let's see if cat works:

$ ssh user@remote cat < local.sh
# local.sh
printf 'Argument is __%s__\n' "$@"

Predictably, it works since cat just reads its stdin (which is set up before it is run) without explicitly attempting to open() it. So we can use this to accomplish our goal:

$ ssh user@remote 'cat | bash /dev/stdin' < local.sh
Argument is ____

This trick turns bash's stdin into a pipe (rather than a socket), which it can thus open successfully.

Now, this may just look like a fancy way of rewriting the original attempt, but it has an important advantage: since /dev/stdin looks like the name of the script to run to the remote shell, it allows us to specify arguments after it, as follows:

$ ssh user@remote 'cat | bash /dev/stdin arg1 arg2 arg3' < local.sh
Argument is __arg1__
Argument is __arg2__
Argument is __arg3__

(Thanks to Stéphane Chazelas and Marcel Bruinsma for suggesting the above ideas during an old discussion on comp.unix.shell).

We're almost done. So far, we are hardcoding the arguments in the single-quoted string; it would be nice to have a way of putting variables there. We can create a wrapper script that does the hard work for us:

#!/bin/bash
# runremote.sh
# usage: runremote.sh remoteuser remotehost arg1 arg2 ...

realscript=local.sh
user=$1
host=$2
shift 2

ssh $user@$host 'cat | bash /dev/stdin' "$@" < "$realscript"

The expansion of "$@" is replaced by the actual arguments. Let's run it:

$ runremote.sh user remote arg1 arg2 arg3
Argument is __arg1__
Argument is __arg2__
Argument is __arg3__
$ runremote.sh user remote arg1 "arg2 with spaces" arg3
Argument is __arg1__
Argument is __arg2__
Argument is __with__
Argument is __spaces__
Argument is __arg3__

Ok, so it's not perfect yet. The problem is that with ssh, the supplied command string is (re)evaluated by the remote shell, and that turns what is meant to be a single argument "arg2 with spaces" into three separate arguments. For the same reason, there may also be problems with other characters that are special to the shell like globbing characters, escapes and quotes. So, the wrapper script needs to escape the arguments it's given before putting them into the command string for the remote ssh. Since we want the wrapper to be transparent, and want to be able to supply arbitrarily complex arguments, the task can rapidly become an escaping nightmare, which is one of the things we wanted to avoid in the first place.
Fortunately, bash has just the right feature for this: the builtin printf command supports the %q specifier:

%q     causes printf to output the corresponding argument in a format that can be reused as shell input.

Let's try it:

$ printf '%q\n' "argument with space"
argument\ with\ space
$ printf '%q\n' "argument with 'single quotes'"
argument\ with\ \'single\ quotes\'
$ printf '%q\n' 'argument with "double quotes"'
argument\ with\ \"double\ quotes\"
$ printf '%q\n' 'argument with *? glob and $ other ` special { chars'
argument\ with\ \*\?\ glob\ and\ \$\ other\ \`\ special\ \{\ chars
$ foo=$(printf '%q\n' 'argument with *? glob and $ other ` special { chars')
$ echo "$foo"
argument\ with\ \*\?\ glob\ and\ \$\ other\ \`\ special\ \{\ chars
$ eval echo "$foo"
argument with *? glob and $ other ` special { chars

Looks good. Since arguments to a script can't be modified directly, we can use an array to store and modify them, then use the special "${array[@]}" construct to pass them (which behaves the same as "$@"). We can also generalize the wrapper to accept the name of the local script to run remotely, so:

#!/bin/bash
# runremote.sh
# usage: runremote.sh localscript remoteuser remotehost arg1 arg2 ...

realscript=$1
user=$2
host=$3
shift 3

declare -a args

count=0
for arg in "$@"; do
  args[count]=$(printf '%q' "$arg")
  count=$((count+1))
done

ssh $user@$host 'cat | bash /dev/stdin' "${args[@]}" < "$realscript"

Let's try it:

$ runremote.sh local.sh user remote 'arg1 with spaces and "quotes"' 'arg2 with *? glob and $ other ` special { chars'
Argument is __arg1 with spaces and "quotes"__
Argument is __arg2 with *? glob and $ other ` special { chars__

Now runremote.sh can be used to run a local script remotely with arbitrary arguments. Of course, quoting and/or escaping must still be done correctly locally if needed, so that the script sees the intended number of arguments.

Method 2: stdin, revisited

As a variation of the previous method, we could have the wrapper prepend some code to the local script so it magically finds its arguments already set, for example something like this:

#!/bin/bash
# runremote.sh
# usage: runremote.sh localscript remoteuser remotehost arg1 arg2 ...

realscript=$1
user=$2
host=$3
shift 3

# escape the arguments
declare -a args

count=0
for arg in "$@"; do
  args[count]=$(printf '%q' "$arg")
  count=$((count+1))
done

{
  printf '%s\n' "set -- ${args[*]}"
  cat "$realscript"
} | ssh $user@$host "cat | bash /dev/stdin"

Note the "${args[*]}" expansion, which should not normally be used, but here it's useful as it expands as a single argument (whereas "${args[@]}" would expand to multiple arguments, each of which would be formatted by printf's format specifier - not what we want here).
Again this works:

$ runremote.sh local.sh user remote 'arg1 with spaces and "quotes"' 'arg2 with *? glob and $ other ` special { chars'
Argument is __arg1 with spaces and "quotes"__
Argument is __arg2 with *? glob and $ other ` special { chars__

Method 3: copy-and-execute

This uses a different approach; the wrapper just copies the file to the remote machine, and runs it:

#!/bin/bash
# runremote.sh
# usage: runremote.sh localscript remoteuser remotehost arg1 arg2 ...

realscript=$1
user=$2
host=$3
shift 3

# escape the arguments
declare -a args

count=0
for arg in "$@"; do
  args[count]=$(printf '%q' "$arg")
  count=$((count+1))
done

scp -q "$realscript" "$user"@"$host":/some/where
ssh $user@$host bash "/some/where/$realscript" "${args[@]}"

This does work, however in my opinion is less preferable because it leaves the file around on the remote machine; ok, the wrapper could be changed to remove it after it's run, but it still looks less clean than the other methods (also, it needs a place where to save the file remotely, and it makes multiple ssh connections every time, one to copy the file, one to run it, and optionally another to delete it).
However it does have the advantage that the script's standard input remains available (see Caveats below).

Caveats

The first thing to note is that the scripts that we are going to run, even if they reside on the local machine, need to behave correctly on the remote machine, so all the paths, commands invoked, temporary files and other references have to be valid on the remote machine, not the local one. Also the features that it uses must be supported by the remote shell that you invoke. This may seem obvious, but it is easily overlooked, especially if the script is being developed on the local machine.

The second thing to consider is that, if the script is run through the remote shell's standard input, it can't use commands that read from unredirected standard input; if it did, those commands would swallow part or all of the script themselves. So ensure that all such commands have their stdin appropriately redirected, or alternatively, use method three above ("copy-and-execute").

Update 26/08/2011: the methods that use stdin work fine with Perl too (mostly, the same caveats apply). So now the runremote.sh script can be made even more general by accepting an argument that specifies the command interpreter to run on the remote machine:

#!/bin/bash
# runremote.sh
# usage: runremote.sh localscript interpreter remoteuser remotehost arg1 arg2 ...

realscript=$1
interpreter=$2
user=$3
host=$4
shift 4

declare -a args

count=0
for arg in "$@"; do
  args[count]=$(printf '%q' "$arg")
  count=$((count+1))
done

ssh $user@$host "cat | ${interpreter} /dev/stdin" "${args[@]}" < "$realscript"