Skip to content

A simple SameGame implementation

Port mirroring with Linux bridges

Many commercial switches allow replication of traffic from one or more ports to one designated port (usually chosen by the user) for monitoring and analysis purposes. Some models offer the option to choose whether to replicate only incoming or outgoing traffic (or both, of course).
Typical uses cases for this are traffic analysis systems like IDS/IPS, but it can also be used for troubleshooting.

This feature goes by many names, among which are "SPAN", "port mirroring", "port monitoring", "monitor mode", "roving" and surely others. Although the actual setup procedures vary from vendor to vendor (or even from model to model), what they do in the end is the same.
There can be differences, however, in the way tagged (ie VLAN) packet are mirrored; in some cases, VLAN tags are stripped from the mirrored copy.

Since Linux implements at least two types of bridging (nowadays used mostly to create virtual networks t/o connect virtual machines), one may wonder whether port mirroring is possible. The answer is yes, although the procedures may be a bit tricky. So let's see how to set up port mirroring under Linux with the two prevailing bridging implementations (Openvswitch and in-kernel bridging), plus another kludge at the end.

Openvswitch

Let's start with Openvswitch, the (by now not so) new, multiplatform, all-singing, all-dancing bridge implementation.

Extremely simplified, openvswitch uses a kernel module to manage the data path (ie, the actual forwarding of frames), and keeps everything else in user space. A daemon (ovs-vswitchd) manages the switch operations (which however can manage multiple bridges, so only one daemon needs to run), and another daemon (ovs-ovsdb) manages the database which contains the various tables that make up the configuration(s) for all the bridges managed by ovs-vswitchd.

Each of these two functions is driven by a corresponding protocol: OpenFlow for the management of flows and data paths (not mandatory), and OVSDB for the management of the switch itself (to add/remove ports, interfaces, bridges etc. and to removal and configuration in general).

In fact, a basic installation of openvswitch runs a local OVSDB daemon, and all the various ovs-vsctl management commands (including those shown below) connect to this local OVSDB instance via a UNIX socket, asking it to carry out the tasks.

So we have our bridge ovsbr0, with three VMs connected respectively to vnet0, vnet1 and vnet2 (of course, everything remains perfectly valid and applicable if we have real physical interfaces instead).

# ovs-vsctl show
...
    Bridge "ovsbr0"
        Port "ovsbr0"
            Interface "ovsbr0"
                type: internal
        Port "vnet2"
            Interface "vnet2"
        Port "vnet1"
            Interface "vnet1"
        Port "vnet0"
            Interface "vnet0"
...
# ovs-vsctl list bridge ovsbr0
_uuid               : 0141452d-efc1-47f8-a3b4-24f0c2bc1c36
controller          : []
datapath_id         : "00002e454101f847"
datapath_type       : ""
external_ids        : {}
fail_mode           : []
flood_vlans         : []
flow_tables         : {}
ipfix               : []
mirrors             : [8a547c29-a171-4412-b7ed-b2a1b88815de]
name                : "ovsbr0"
netflow             : []
other_config        : {}
ports               : [1d1da575-73ac-4bac-8e81-1042da415103, a8333e72-cb12-4777-bf55-e339ff41ece1, ccd87251-f61f-47ff-84f3-9e8864e6c2d8, f66298f8-02e8-48cc-a2c8-92181bea2c56]
protocols           : []
sflow               : []
status              : {}
stp_enable          : false

A thing to note (besides the awkward command names, that is) is that in openvswitch, absolutely everything that can be referenced has an UUID (this is by design).
In this case, we see that the switch has three ports (plus the "internal" port that is created by default), whose UUIDs are as shown in the ports field (which is a list of values).
(Each port, in turn, may be and usually is composed of one or more interfaces, which are also objects and have their own UUIDs, but that's not relevant here).

Just to get an idea, to get the actual UUIDs of our ports we can use this command:

# for p in vnet{0..2}; do echo "$p: $(ovs-vsctl get port "$p" _uuid)"; done
vnet0: f66298f8-02e8-48cc-a2c8-92181bea2c56
vnet1: ccd87251-f61f-47ff-84f3-9e8864e6c2d8
vnet2: a8333e72-cb12-4777-bf55-e339ff41ece1

To do mirroring with openvswitch, the first thing to do is to create and add a mirror (doh!) to the bridge.

# ovs-vsctl -- --id=@m create mirror name=mymirror -- add bridge ovsbr0 mirrors @m
cd94ea72-bb7f-4a26-816f-983a085a4bfd

The syntax may look a bit awkward, but it's not complicated (and it's well explained in the ovs-vsctl man page). We're running two commands at once, each command is introduced by --. The first comand creates a mirror named mymirror and, thanks to the --id=@m part, saves its UUID in the "variable" @m, which remains available for later commands. And we use it indeed in the second command, which associates the newly-created mirror mymirror with the bridge ovsbr0.

As said, everything has an UUID, and mirrors are no exceptions: the UUID of the new mirror is output as a result of the (successful) command. Let's check:

# ovs-vsctl list bridge ovsbr0
_uuid               : 0141452d-efc1-47f8-a3b4-24f0c2bc1c36
controller          : []
datapath_id         : "00002e454101f847"
datapath_type       : ""
external_ids        : {}
fail_mode           : []
flood_vlans         : []
flow_tables         : {}
ipfix               : []
mirrors             : [cd94ea72-bb7f-4a26-816f-983a085a4bfd]
name                : "ovsbr0"
netflow             : []
other_config        : {}
ports               : [1d1da575-73ac-4bac-8e81-1042da415103, a8333e72-cb12-4777-bf55-e339ff41ece1, ccd87251-f61f-47ff-84f3-9e8864e6c2d8, f66298f8-02e8-48cc-a2c8-92181bea2c56]
protocols           : []
sflow               : []
status              : {}
stp_enable          : false

So everything as before, but now our bridge has the mirror (since it's a list, as shown by the fact that it's in square brackets, there can be more than one).

Now that we have our mirror created and in the bridge, we should configure its source ports and destination ports. We want to mirror all traffic going in/out port vnet0, and we want to send it to bridge port vnet2 (where presumably we have a traffing monitoring application).

We must be careful with the terminology here. A mirror has a set of "source" and "destination" ports, but those refer only to origin ports, that is, those whose traffic we want to mirror. If a port is included in the source port set (select_src_port in openvswitch term), its outgoing traffic will be mirrored; if it's included in the destination port set (select_dst_port), its incoming traffic will be mirrored. So if we want to mirror both incoming and outgoing traffic for vnet0, we must include it in both sets:

# f66298f8-02e8-48cc-a2c8-92181bea2c56 is the UUID of vnet0
# ovs-vsctl set mirror mymirror select_src_port=f66298f8-02e8-48cc-a2c8-92181bea2c56 select_dst_port=f66298f8-02e8-48cc-a2c8-92181bea2c56
# ovs-vsctl list mirror mymirror
_uuid               : cd94ea72-bb7f-4a26-816f-983a085a4bfd
external_ids        : {}
name                : mymirror
output_port         : []
output_vlan         : []
select_all          : false
select_dst_port     : [f66298f8-02e8-48cc-a2c8-92181bea2c56]
select_src_port     : [f66298f8-02e8-48cc-a2c8-92181bea2c56]
select_vlan         : []
statistics          : {}

Thanks to the previously introduced --id=@name feature, we could have done the same thing without having to specify the actual UUID of vnet0:

# ovs-vsctl -- --id=@vnet0 get port vnet0 -- set mirror mymirror select_src_port=@vnet0 select_dst_port=@vnet0

In general, this syntax is both clearer and easier, so we're going to use it for the remaining steps.

If we wanted to mirror both vnet0 and vnet1 in both directions, we would do:

# ovs-vsctl \
  -- --id=@vnet0 get port vnet0 \
  -- --id=@vnet1 get port vnet1 \
  -- set mirror mymirror 'select_src_port=[@vnet0,@vnet1]' 'select_dst_port=[@vnet0,@vnet1]'

So the trick is to populate select_src_port and select_dst_port with the (list(s) of) UUIDs of the ports that we're interested in.

So far we've told openvswitch which port(s) we want to mirror, but we haven't said yet to which port we want to send this mirrored traffic. That is the purpose of the output_port attribute, which again is the UUID of the port which will receive the mirrored traffic. In our case, we know that this port is vnet2, so here's how we add it:

# ovs-vsctl -- --id=@vnet2 get port vnet2 -- set mirror mymirror output-port=@vnet2
# ovs-vsctl list mirror mymirror
_uuid               : cd94ea72-bb7f-4a26-816f-983a085a4bfd
external_ids        : {}
name                : mymirror
output_port         : a8333e72-cb12-4777-bf55-e339ff41ece1
output_vlan         : []
select_all          : false
select_dst_port     : [f66298f8-02e8-48cc-a2c8-92181bea2c56]
select_src_port     : [f66298f8-02e8-48cc-a2c8-92181bea2c56]
select_vlan         : []
statistics          : {}

So if we now go to our VM connected to vnet2 we're going to see the mirrored traffic from vnet0. Try it and see.

Now that we have seen the step-by-step procedure, it should not come as a surprise that we could also have done all the above in a single command (reformatted for clarity):

# ovs-vsctl \
  -- --id=@m create mirror name=mymirror \
  -- add bridge ovsbr0 mirrors @m \
  -- --id=@vnet0 get port vnet0 \
  -- set mirror mymirror select_src_port=@vnet0 select_dst_port=@vnet0 \
  -- --id=@vnet2 get port vnet2 \
  -- set mirror mymirror output-port=@vnet2
cd94ea72-bb7f-4a26-816f-983a085a4bfd

A quick and dirty way to mirror all traffic passing thrugh the bridge to a given port is to use the select_all property of the mirror:

# ovs-vsctl -- --id=@vnet2 get port vnet2 -- set mirror mymirror select_all=true output-port=@vnet2
# ovs-vsctl list mirror mymirror
_uuid               : cd94ea72-bb7f-4a26-816f-983a085a4bfd
external_ids        : {}
name                : mymirror
output_port         : a8333e72-cb12-4777-bf55-e339ff41ece1
output_vlan         : []
select_all          : true
select_dst_port     : []
select_src_port     : []
select_vlan         : []
statistics          : {tx_bytes=216769, tx_packets=1400}

Openvswitch mirrors preserve VLAN tags, so the traffic is received untouched.

To remove a specific mirror, the following command can be used:

# ovs-vsctl -- --id=@m get mirror mymirror -- remove bridge ovsbr0 mirrors @m

To remove all existing mirrors from a bridge:

# ovs-vsctl clear bridge ovsbr0 mirrors

Traditional bridging

Before Openvswitch came about, Linux had had (and of course still has) in-kernel bridging since about forever.
This is a much simpler yet functional bridge implementation in the Linux kernel, which provides basic functionality like STP but not much more. In particular, there is no native port mirroring functionality.
But fear not: Linux has a powerful tool which, among a lot of other things, can also mirror traffic. We're talking of the traffic control subsystem (tc for short), which can do all sorts of magic things.
Since it's a generic framework, its capabilities (including mirroring) are not limited to bridges; this means that we can mirror traffic for any interface(s) and send it to any other(s), regardless of whether they are phisical, virtual, part of a bridge or not, etc.

Indeed, for this example we're going to mirror the incoming/outgoing traffic of the interface bond0 and have it copied to the dummy interface dummy0 (very useful for testing). Replace with vnetx/vifx.y/whatever as needed. It works just the same.

First a very brief and simplified recap, since tc is very akin to a black art. Every interface, in Linux, has a so-called queuing discipline (qdisc), which basically defines the criteria that are used to send packets out the interface. This is for outgoing packets; it is also possible, although not usually done, to set a qdisc for incoming traffic, although its usefulness is somewhat limited (but it is definitely used for mirroring).
These qdisc are usually referred to as the "root qdisc" (for outgoing traffic) and "ingress qdisc" (for incoming traffic).
So the idea is: to mirror the traffic for an interface, we configure the relevant qdisc (root and/or ingress) to mirror packets before doing anything else.

To do this, we need to attach a classifier (filter in tc speak) to the relevant qdisc. Simply put, a filter tries to match packets according to some criteria and, if the match succeeds, performs certain actions on them.

Let's start with the code to mirror incoming traffic for an interface, which is simpler. The first thing to do is to establish an ingress qdisc for the interface, as there's none by default:

# tc qdisc add dev bond0 ingress

This creates an ingress qdisc for bond0 and gives it the ffff: identifier (it's always ffff:, for any interface, so no surprises):

# tc qdisc show dev bond0
qdisc ingress ffff: parent ffff:fff1 ----------------

Now, as said, we attach a filter to it. This filter simply matches all packets, and mirrors them to dummy0. A filter is attached to a qdisc, so it must have a reference to the parent. Here's the syntax to create the filter:

# tc filter add dev bond0 parent ffff: \
    protocol all \
    u32 match u8 0 0 \
    action mirred egress mirror dev dummy0

The syntax is arcane (and, in this case, not really immediately understandable), but there are basically 3 parts. Let's break it down. The first part is the filter creation linked to the parent qdisc for interface bond0:

tc filter add dev bond0 parent ffff:

Then come the matching rules; first, we say that the match should be attempted on any protocol, since we want all the traffic:

protocol all

This is not yet part of the actual filter; it's just part of the syntax that tc needs to know which packets it should attempt to apply actual matching rules to (ok, it is effectively a filter, but not in the tc sense).
Then we give the actual filter rule:

u32 match u8 0 0

This is the syntax used to tell the u32 filter that, of the packets it's seeing (that is, all of them), all should be matched. "u32" informs the parser that a u32 match follows, and the actual matching happens in the "u8 0 0" part, which, in simple language, returns true if the first byte of the packet (u8), ANDed with 0, gives 0. Some basic knowledge of bitwise operations tells us that X AND 0 == 0 for any X, so the match is always true.

Finally, the third part of the command specifies the action that is to be executed on matching packets (again, all of them):

action mirred egress mirror dev dummy0

Here we use the mirred action, which basically has two modes of operation: mirror (which is what we want here) to, er, mirror the packet, and redirect, to, uhm, redirect it. Both do their job using the device specified in the "dev" argument. As for the "egress" part, that's the only supported mode as of this writing.

If we wanted to mirror to multiple devices, all we would have to do is to specify multiple actions:

action mirred egress mirror dev dummy0 \
action mirred egress mirror dev dummy1 ...

So if you've made it so far, you'll be happy to know that applying these rules for outgoing traffic is almost the same, just a bit more complicated. The thing is, unlike the ingress case, interfaces normally do have an egress (outgoing) qdisc, but we can't attach filters directly to it since it's a classless qdisc ("classless" just means that it can't have "child" classes and filters). So the first thing to do is add a classful egress qdisc; once we've done that, the filter is attached in the same way as for the ingress qdisc.
As a side note, the mq qdisc found in wireless interfaces, despite claiming to be classful, doesn't seem to support direct filter attachment.

If we add a classful qdisc, we should decide which one to use, since there are a few of them. The most common ones are PRIO, CBQ and HTB. Of these, the simplest is PRIO, which is what we're going to use for our example. So without further ado, let's add our classful egress qdisc to our interface:

# tc qdisc add dev bond0 handle 1: root prio

We choose to give it the handle 1:; we could as well have used 100: or 42:, it doesn't matter as long as we use the same number when attaching the filter.
Once we have a classful qdisc to play, we can finally attach the filter to it, exactly in the same way as we did for the ingress qdisc:

# tc filter add dev bond0 parent 1: \
    protocol all \
    u32 match u8 0 0 \
    action mirred egress mirror dev dummy0

Now, let's bring dummy0 up and check:

# ip link set dummy0 up
# tcpdump -e -v -n -i dummy0
tcpdump: WARNING: dummy0: no IPv4 address assigned
tcpdump: listening on dummy0, link-type EN10MB (Ethernet), capture size 65535 bytes
18:56:41.237966 00:13:72:af:11:23 > 00:16:3e:fd:aa:67, ethertype IPv4 (0x0800), length 153: (tos 0x0, ttl 64, id 57195, offset 0, flags [DF], proto TCP (6), length 139)
    192.168.1.3.17569 > 192.168.1.232.514: Flags [P.], cksum 0x84b9 (correct), seq 3603440679:3603440766, ack 1213686729, win 229, options [nop,nop,TS val 1217617195 ecr 69837571], length 87
18:56:41.238131 00:16:3e:fd:aa:67 > 00:13:72:af:11:23, ethertype IPv4 (0x0800), length 66: (tos 0x0, ttl 64, id 51990, offset 0, flags [DF], proto TCP (6), length 52)
    192.168.1.232.514 > 192.168.1.3.17569: Flags [.], cksum 0x9889 (correct), ack 87, win 1307, options [nop,nop,TS val 69844202 ecr 1217617195], length 0
...
18:57:06.687832 00:26:b9:72:16:99 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 64: vlan 14, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Reply 10.7.1.1 is-at 00:26:b9:72:16:99, length 46

As can be seen above, VLAN tags are copied.

So to sum it up, here's how to enable bidirectional mirroring from bond0 to dummy0;

sif=bond0
dif=dummy0

# ingress
tc qdisc add dev "$sif" ingress
tc filter add dev "$sif" parent ffff: \
          protocol all \
          u32 match u8 0 0 \
          action mirred egress mirror dev "$dif"

# egress
tc qdisc add dev "$sif" handle 1: root prio
tc filter add dev "$sif" parent 1: \
          protocol all \
          u32 match u8 0 0 \
          action mirred egress mirror dev "$dif"

Of course, to mirror traffic for multiple source interfaces, the above (all or only half of it, depending on whether we want traffic in both or only one direction) should be repeated for each of them.

To remove the mirroring, it's enough to delete the root and ingress qdiscs from all the involved source interfaces (the default root qdisc will be restored automatically):

tc qdisc del dev bond0 ingress
tc qdisc del dev bond0 root

Daemonlogger

So, just for the sake of it, let's see another method to mirror traffic under Linux.

There's a nice utility called daemonlogger, which, according to its description, "is able to log packets to file or mirror to another interface", which sounds just like what we are looking for. Debian has it in its standard repositories.

A quick read of the man page shows that we can use it as follows:

# daemonlogger -i bond0 -o dummy0
[-] Interface set to bond0
[-] Log filename set to "daemonlogger.pcap"
[-] Tap output interface set to dummy0[-] Pidfile configured to "daemonlogger.pid"
[-] Pidpath configured to "/var/run"
[-] Rollover size set to 18446744071562067968 bytes
[-] Rollover time configured for 0 seconds
[-] Pruning behavior set to oldest IN DIRECTORY

-*> DaemonLogger <*-
Version 1.2.1
By Martin Roesch
(C) Copyright 2006-2007 Sourcefire Inc., All rights reserved

sniffing on interface bond0

At this point, tcpdump on dummy0 gives us all the traffic of bond0. Admittedly, less sophisticated than both Openvswitch and tc, but definitely much more "quick and dirty". It's also worth mentioning that it supports BPF filters just like tcpdump, so traffic can be filtered out before mirroring.
Nevertheless, a word of caution; the README file says, at the end:

This code is largely untested and probably completely shoddy.

Poor man’s directory tree replication

So you have this /var/lib/mysql directory that you need to copy to three other machines. A quick and dirty solution is to use ssh and tee (it goes without saying that passwordless ssh is needed, here and for all the other examples):

$ tar -C /var/lib/mysql -cvzf - . |\
  tee >(ssh dstbox1 'tar -C /var/lib/mysql/ -xzvf -') \
      >(ssh dstbox2 'tar -C /var/lib/mysql/ -xzvf -') \
      >(ssh dstbox3 'tar -C /var/lib/mysql/ -xzvf -') > /dev/null

If the directory tree to be transfered is not local, it is again possible to use ssh to get to it:

$ ssh srcbox 'tar -C /var/lib/mysql -cvzf - .' |\
  tee >(ssh dstbox1 'tar -C /var/lib/mysql/ -xzvf -') \
      >(ssh dstbox2 'tar -C /var/lib/mysql/ -xzvf -') \
      >(ssh dstbox3 'tar -C /var/lib/mysql/ -xzvf -') > /dev/null

This means that all the data flows from the source, through the machine where the pipeline runs, to the targets. On the other hand this solution has the advantage that there is no need to set up passwordless ssh between the origin and the target(s); the only machine that needs passwordless ssh to all the others is the machine where the command runs.

Now this is all basic stuff, but after doing this I wondered whether it would be possible to generalize the logic for a variable number of target machines, so for example a nettar-style operation could be possible, as in

$ nettar2.sh /var/lib/mysql dstbox1:/var/lib/mysql dstbox2:/var/tmp dstbox3:/var/lib/mysql ...

Would mean: take (local) /var/lib/mysql and replicate it to dstbox1 under /var/lib/mysql, to dstbox2 under /var/tmp, to dstbox3 under /var/lib/mysql, and so on for any extra argument supplied. Arguments could have the form targetname:[targetpath], with a missing targetpath indicating the same path as the source (ie, /var/lib/mysql in this example).

It turns out that such a generalization is not easy.

Note that in the following code, all error checking and other refinements are omitted for simplicity. In particular, care should be taken at least to:

  • validate the arguments passed to the script for number (at least two) and correct syntax
  • check that paths exist (or create them if not, etc)
  • properly escape arguments to commands that are executed using ssh (for example using printf %q)
  • validate data that is used to dynamically build commands to be run with eval

None of the above is done in the code that follows.

Concurrent transfers

An obvious way to do it is to run three (or however many) concurrent transfers, eg

#!/bin/bash
 
# syntax: $0 /src/dir dstbox1:[/dst/dir] [ dstbox2:[/dst/dir] dstbox3:[/dst/dir] ... ]
# parallel transfers
 
srcpath=$1
shift
 
for arg in "$@"; do
  dstbox=${arg%:*}
  dstpath=${arg#*:}
  [ -n "$dstpath" ] || dstpath=$srcpath
  tar -C "$srcpath" -cvzf - . | ssh "$dstbox" "tar -C '$dstpath' -xvzf -" &
done
 
wait

This obviously simply reads $srcpath multiple times and transfers it to each target machine. We are not exploiting the data duplication done by tee. If the source directory is huge, this will not be efficient as multiple processes at once will try to read it; although the OS will probably cache most of it, it doesn't look like a satisfactory solution.

So what if we actually want to use tee (which in turn implies that we need process substitution or an equivalent facility)?

Using eval

The first thing that comes to mind is to use the questionable eval command:

#!/bin/bash
 
# syntax: $0 /src/dir dstbox1:[/dst/dir] [ dstbox2:[/dst/dir] dstbox3:[/dst/dir] ... ]
# using tee + eval
 
do_sshtar(){
  local dstbox=$1 dstpath=$2
  ssh "$dstbox" "tar -C '$dstpath' -xvzf -"
}
 
declare -a args
 
srcpath=$1
shift
 
for arg in "$@"; do
  dstbox=${arg%:*}
  dstpath=${arg#*:}
  [ -n "$dstpath" ] || dstpath=$srcpath
  args+=( ">(do_sshtar '$dstbox' '$dstpath')" )
done
 
tar -C "$srcpath" -cvzf - . | eval tee "${args[@]}" ">/dev/null"

This effectively builds the full list of process substitutions at runtime and executes them. However, when using eval we should be well aware of what we're doing. See the following pages for a good discussion of the implications of using eval: http://mywiki.wooledge.org/BashFAQ/048 and http://wiki.bash-hackers.org/commands/builtin/eval.

Note that with process substitution there is also the (in this case minor) issue that the created processes are run asynchronously in background, and we have no way to wait for their full termination (not even using wait), so the script might give us back the prompt slightly before all the background processes have fully completed their job.

Coprocesses

Bash and other shells have coprocesses (see also here), so it would seem that they could be useful for our purposes.
However, at least in bash, it seems that it's not possible to create a coprocess whose name is stored in a variable (which is how we would create a bunch of coprocesses programmatically), eg:

$ coproc foo { command; }      # works
$ cname=foo; coproc $cname { command; }  # does not work as expected (creates a coproc literally named $cname)

So to use coprocesses for our task, we would need again to resort to eval.

Named pipes

Let's see if there is some other possibility. Indeed there is, and it involves using named pipes (aka FIFOs):

#!/bin/bash
 
# syntax: $0 /src/dir dstbox1:[/dst/dir] [ dstbox2:[/dst/dir] dstbox3:[/dst/dir] ... ]
# using tee + FIFOs (ssh version)
 
declare -a fifos
 
srcpath=$1
shift
 
count=1
for arg in "$@"; do
  dstbox=${arg%:*}
  dstpath=${arg#*:}
  [ -n "$dstpath" ] || dstpath=$srcpath
  curfifo=/tmp/FIFO${count}
  mkfifo "$curfifo"
  fifos+=( "$curfifo" )
  ssh "$dstbox" "tar -C '$dstpath' -xvzf -" < "$curfifo" &
  ((count++))
done
 
tar -C "$srcpath" -cvzf - . | tee -- "${fifos[@]}" >/dev/null
 
wait
# cleanup the FIFOs
rm -- "${fifos[@]}"

Here we're creating N named pipes, whose names are saved in an array, and an instance of ssh +tar to the target machine is launched in background reading from each pipe. Finally, tee is run against all the existing named pipes to send them the data; all the FIFOs are removed at the end.
This is not too bad, but we should manually set up interprocess communication (ie, create/delete the FIFOs); the beauty of process substitution is that bash sets up those channels for us, and here we're not taking advantage of that.

A point to note is that here we used ssh for the data transfer; it's always possible to change the code to use netcat, as explained in the nettar article. Here's an adaptation of the last example to use the nettar method (the other cases are similar):

#!/bin/bash
 
# syntax: $0 /src/dir dstbox1:[/dst/dir] [ dstbox2:[/dst/dir] dstbox3:[/dst/dir] ... ]
# using tee + FIFOs (netcat version)
 
declare -a fifos
 
srcpath=$1
shift
 
count=1
for arg in "$@"; do
  dstbox=${arg%:*}
  dstpath=${arg#*:}
  [ -n "$dstpath" ] || dstpath=$srcpath
 
  if ssh "$dstbox" "cd '$dstpath' || exit 1; { nc -l -p 1234 | tar -xvzf - ; } </dev/null >/dev/null 2>&1 &"; then
    curfifo=/tmp/FIFO${count}
    mkfifo "$curfifo"
    fifos+=( "$curfifo" )
    nc "$dstbox" 1234 < "$curfifo" &
    ((count++))
  else
    echo "Warning, skipping $dstbox" >&2   # or whatever
  fi
done
 
tar -C "$srcpath" -cvzf - . | tee -- "${fifos[@]}" >/dev/null
 
wait
# cleanup the FIFOs
rm -- "${fifos[@]}"

There should be some other way. I'll update the list if I discover some other method. As always, suggestions welcome.

Recursion

Update 19/05/2014: Marlon Berlin suggested (thanks) that recursion could be used to build an implicit chain of >(...) process substitutions, and indeed that's true. So here it is:

#!/bin/bash
 
# syntax: $0 /src/dir dstbox1:[/dst/dir] [ dstbox2:[/dst/dir] dstbox3:[/dst/dir] ... ]
# using recursion (ssh version)
 
do_sshtar(){
 
  local dstbox=${1%:*} dstpath=${1#*:}
  [ -n "$dstpath" ] || dstpath=$srcpath
  shift
 
  if [ $# -eq 0 ]; then
    # end recursion
    ssh "$dstbox" "tar -C '$dstpath' -xzvf -"
  else
    # send data to "current" $dstbox and recurse
    tee >(ssh "$dstbox" "tar -C '$dstpath' -xzvf -") >(do_sshtar "$@") >/dev/null
  fi
}
 
srcpath=$1
shift
 
tar -C "$srcpath" -czvf - . | do_sshtar "$@"

When the do_sshtar function receives only one argument, it just transfers the data directly via ssh to terminate the recursion. Otherwise, it uses tee to transfer the data and continue the recursion. Simple and elegant. Here's the netcat version:

#!/bin/bash
 
# syntax: $0 /src/dir dstbox1:[/dst/dir] [ dstbox2:[/dst/dir] dstbox3:[/dst/dir] ... ]
# using recursion (netcat version)
 
do_nctar(){
 
  local dstbox=${1%:*} dstpath=${1#*:}
  [ -n "$dstpath" ] || dstpath=$srcpath
  shift
 
  # set up listening nc on $dstbox
  if ssh -n "$dstbox" "cd '$dstpath' || exit 1; { nc -l -p 1234 | tar -xvzf - ; } </dev/null >/dev/null 2>&1 &"; then
    if [ $# -eq 0 ]; then
      # end recursion
      nc "$dstbox" 1234
    else
      # send data to "current" $dstbox and recurse
      tee >(nc "$dstbox" 1234) >(do_nctar "$@") >/dev/null
    fi
  else
    echo "Warning, skipping $dstbox" >&2
    # one way or another, we must consume the input
    if [ $# -eq 0 ]; then
      cat > /dev/null
    else
      do_nctar "$@"
    fi
  fi
}
 
srcpath=$1
shift
 
tar -C "$srcpath" -czvf - . | do_nctar "$@"

The -n switch to ssh is important, otherwise it will try to read from stdin, consuming our tar data.

Many ways to encrypt passwords

Specifically, using crypt(3). One typical use case is, you have a plaintext password and need the damn full thing to put into the /etc/shadow file, which nowadays is usually something like:

$5$sOmEsAlT$pKHkGjoFXUgvUv.UYQuekdpjoZx7mqXlIlKJj6abik7   # sha-256

or

$6$sOmEsAlT$F3DN61SEKPHtTeIzgzyLe.rpctiym/qxz5xQz9YM.PyTdH7R13ZDXj6sDMeZg5wklbYJYSqDBXcH4UnAWQrRN0   # sha-512

The input to the crypt(3) library function is a cleartext password and a salt. Here we assume the salt is provided, but it's easy to generate a random one (at least one that's "good enough").

In the case of sha-256 and sha-512 hashes (identified respectively by the $5$ and $6$ in the first field, which are also the only ones supported by Linux along with the old md5 which uses code $1$) the salt can be augmented by prepending the rounds=<N>$ directive, to change the default number of rounds used by the algorithm, which is 5000. So for example we could supply a salt like

rounds=500000$sOmEsAlT

and thus use 500000 rounds (this is called stretching and is used to make brute force attacks harder). If the rounds= argument is specified, the output of crypt() includes it as well, since its value must be known every time the hash is recalculated.

It seems there's no utility to directly get the hash string (there used to be a crypt(1) command which however had troubles related to the export of cryptographic software, which made it so weak that many distros stopped shipping it). So we'll have to find some command that calls the crypto(3) function.

In the following examples, we assume the algorithm number, the salt and the password are stored in the shell variables $alg, $salt and $password respectively:

alg=6
salt='rounds=500000$sOmEsAlT'
password='password'

This way, the code doesn't hardcode anything and can be reused.

Perl

$ perl -e 'print crypt($ARGV[1], "\$" . $ARGV[0] . "\$" . $ARGV[2]), "\n";' "$alg" "$password" "$salt"
$6$rounds=500000$sOmEsAlT$Rf3.xi9RRiCW/FTh4gp67TSLyKotq1QkGkbn0O6cYDYEExwrFE30zeKGDIaZ3TZ.RDwiNya5nKlPDRTA0U4E8/

Python

# python 2/3
$ python -c 'import crypt; import sys; print (crypt.crypt(sys.argv[2],"$" + sys.argv[1] + "$" + sys.argv[3]))' "$alg" "$password" "$salt"
$6$rounds=500000$sOmEsAlT$Rf3.xi9RRiCW/FTh4gp67TSLyKotq1QkGkbn0O6cYDYEExwrFE30zeKGDIaZ3TZ.RDwiNya5nKlPDRTA0U4E8/

MySQL

Yes, MySQL has a built-in function that uses crypt(3):

$ mysql -B -N -e "select encrypt('$password', '\$$alg\$$salt');" 
$6$rounds=500000$sOmEsAlT$Rf3.xi9RRiCW/FTh4gp67TSLyKotq1QkGkbn0O6cYDYEExwrFE30zeKGDIaZ3TZ.RDwiNya5nKlPDRTA0U4E8/

Obviously, extra care should be taken with this one if $password or $salt contain quotes or other characters that are special to MySQL.

Php

$ php -r 'echo crypt($argv[2], "\$" . $argv[1] . "\$" . $argv[3]) . "\n";' "$alg" "$password" "$salt"
$6$rounds=500000$sOmEsAlT$Rf3.xi9RRiCW/FTh4gp67TSLyKotq1QkGkbn0O6cYDYEExwrFE30zeKGDIaZ3TZ.RDwiNya5nKlPDRTA0U4E8/

Ruby

$ ruby -e 'puts ARGV[1].crypt("$" + ARGV[0] + "$" + ARGV[2]);' "$alg" "$password" "$salt"
$6$rounds=500000$sOmEsAlT$Rf3.xi9RRiCW/FTh4gp67TSLyKotq1QkGkbn0O6cYDYEExwrFE30zeKGDIaZ3TZ.RDwiNya5nKlPDRTA0U4E8/

mkpasswd

This utility comes with the whois package (at least in Debian). Here it's better to introduce another separate variable to hold the number of rounds:

# password as before
rounds=500000
salt=sOmEsAlT

(and of course the other examples can be adapted to use the three variables instead of two). Then it can be used as follows:

$ mkpasswd -m sha-512 -R "$rounds" -S "$salt" "$password"
$6$rounds=500000$sOmEsAlT$Rf3.xi9RRiCW/FTh4gp67TSLyKotq1QkGkbn0O6cYDYEExwrFE30zeKGDIaZ3TZ.RDwiNya5nKlPDRTA0U4E8/

If using the standard number of rounds the -R option can be omitted, of course. Here the algorithm is specified by name, so the $alg variable is not used.

Some notes on macvlan/macvtap

There's not a lot of documentation about these interfaces. Here are some notes to summarize what I've been able to gather so far. Surely there's more to it (corrections and/or more information welcome).

macvlan

Macvlan interfaces can be seen as subinterfaces of a main ethernet interface. Each macvlan interface has its own MAC address (different from that of the main interface) and can be assigned IP addresses just like a normal interface.

So with this it's possible to have multiple IP addresses, each with its own MAC address, on the same physical interface. Applications can then bind specifically to the IP address assigned to a macvlan interface, for example. The physical interface to which the macvlan is attached is often referred to as "the lower device" or "the upper device"; here we'll use the term "lower device".

The main use of macvlan seems to be container virtualization (for example LXC guests can be configured to use a macvlan for their networking and the macvlan interface is moved to the container's namespace), but there are other scenarios, mostly very specific cases, like using virtual MAC addresses (see for example this keepalived feature).

A macvlan interface can work in one of four modes, defined at creation time.

  • VEPA (Virtual Ethernet Port Aggregator) is the default mode. If the lower device receives data from a macvlan in VEPA mode, this data is always sent "out" to the upstream switch or bridge, even if it's destined for another macvlan in the same lower device. Since macvlans are almost always assigned to virtual machines or containers, this makes it possible to see and manage inter-VM traffic on a real external switch (whereas with normal bridging it would not leave the hypervisor), with all the features provided by a "real" switch. However, at the same time this implies that, for VMs to be able to communicate, the external switch should send back inter-VM traffic to the hypervisor out of the same interface it was received from, something that is normally prevented from happening by STP. This feature (the so-called "hairpin mode" or "reflective relay") isn't widely supported yet, which means that if using VEPA mode with an ordinary switch, inter-VM traffic leaves the hypervisor but never comes back (unless it's sent back at the IP level by a router somewhere, but then there's nothing special about that, it has always worked that way).
    Since there are few switches supporting hairpin mode, VEPA mode isn't used all that much yet. However it's worth mentioning that Linux's own internal bridge implementation does support hairpin mode in recent versions; assuming eth0 is a port of br0, hairpin mode can be anabled by doing

    # echo 1 > /sys/class/net/br0/brif/eth0/hairpin_mode

    or using a recent version of brctl:

    # brctl hairpin br0 eth0 on

    or even better, using the bridge program that comes with recent versions of iproute2:

    # bridge link set dev eth0 hairpin on

    So a Linux box could very well be used in the role of "external switch" as mentioned above.

  • Bridge mode: this works almost like a traditional bridge, in that data received on a macvlan in bridge mode and destined for another macvlan of the same lower device is sent directly to the target (if the target macvlan is also in bridge mode), rather than being sent outside. This of course works well with non-hairpin switches, and inter-VM traffic has better performance than VEPA mode, since the external round-trip is avoided. In the words of a kernel developer,

    The macvlan is a trivial bridge that doesn't need to do learning as it
    knows every mac address it can receive, so it doesn't need to implement
    learning or stp. Which makes it simple stupid and and fast.

  • Private mode: this is essentially like VEPA mode, but with the added feature that no macvlans on the same lower device can communicate, regardless of where the packets come from (so even if inter-VM traffic is sent back by a hairpin switch or an IP router, the target macvlan is prevented from receiving it). I haven't tried, but I suppose that it is the operating mode of the target macvlan that determines whether it receives the traffic or not. This mode is useful, of course, if we really want macvlan isolation.
  • Passthru mode: this mode was added later, to work around some limitation of macvlans (more details here). I'm not 100% clear on what's the problem passthru mode tries to solve, as I was able to set promiscuous mode, create bridges, vlans and sub-macv{lan,tap} interfaces in KVM guests using a plain macvtap in VEPA mode for their networking (so no need for passthru). Since I'm surely missing something, more information (as usual) is welcome.

VEPA, bridged and private mode come from a standard called EVB (edge virtual bridging); a good article which provide more information can be found here.

Curiously (at least, in the case of the three original operating modes), the operating mode is per-macvlan interface rather than global (per-physical device); I guess that it's then more or less mandatory to configure all the macvlans of the same lower device to operate in the same mode, or at least match the macvlan modes so that only intended inter-VM traffic is possible; not sure what would happen, for instance, if a macvlan using VEPA mode tries to communicate with another one using bridge mode, or viceversa. This may well be worth investigating.

Irrespective of the mode used for the macvlan, there's no connectivity from whatever uses the macvlan (eg a container) to the lower device. This is by design, and is due to the the way macvlan interfaces "hook into" their physical interface. If communication with the host is needed, the solution is kind of easy: just create another macvlan on the host on the same lower device, and use this to communicate with the guest.

The documentation of iproute2 about setting operating mode for macvlans isn't complete, since neither "ip link help" nor the man pages mention how to do that. Fumbling around a bit, it can be seen that the syntax is

# ip link add link eth2 macvlan2 type macvlan mode aaa    # hit enter here to force an error message
Error: argument of "mode" must be "private", "vepa", "bridge" or "passthru"

Even more undocumented (if possible) is the way to show the operating mode of a macvlan, which turns out to be

# ip -d link show macvlan2
27: macvlan2@eth2: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT 
    link/ether 26:8a:3c:07:7d:f4 brd ff:ff:ff:ff:ff:ff
    macvlan  mode vepa 

Let's hope that all this appears in the documentation soon.

The MAC address of the macvlan is normally autogenerated; to explicitly specify one, the following syntax can be used (which also specifies custom name and operating mode at the same time):

# ip link add link eth2 FOOMACVLAN address 56:61:4f:7c:77:db type macvlan mode bridge

Final note, it's also possible to create a macvlan interface and bridge it (eg brctl addif br0 macvlan2); though it's a bit weird, it does work fine.

macvtap interfaces

A macvtap is a virtual interfaces based on macvlan (thus tied to another interface) vaguely similar (not much in fact) to a regular tap interface. A macvtap interface is similar to a normal tap interface in that a program can attach to it and read/write frames. However, the similarities end here. The most prominent user of macvtap interfaces seems to be libvirt/KVM, which allows guests to be connected to macvtap interfaces. Doing so allows for (almost) bridged-like behavior of guests but without the need to have a real bridge on the host, as a regular ethernet interface can be used as the macvtap's lower device.

Some notes about macvtap (more information is always welcome):

  • Since it's based on macvlan, macvtap shares the same operating modes it can be in (VEPA, bridge, private and passthru)
  • Similarly, a guest using a macvatp interface cannot communicate directly with its lower device in the host. In fact, if you run tcpdump on the macvtap interface on the host, no traffic will be seen. Again this is by design, but can be surprising. This link has some details and suggests workarounds for KVM in case this functionality is needed. A quick workaround is to create a macvlan (not macvtap) interface on the host, which will then be visible from the guests. (On a side note, this is also a way to use routed mode for the macvtap guests: put the host's macvlan and all guests on the same IP subnet, configure the guests to use the host macvlan's IP as their default gateway, and have the host do NAT between the macvlan and the physical interface. But then, in this case, it's probably easier to use a real bridge).
  • Creation of a macvtap interface is not done by opening /dev/net/tun; instead, it looks like the only way to create one is to directly send appropriate messages to the kernel via a netlink socket (at least, that's how iproute2 and libvirt do it; strace and/or the source will show the details, as there seems to be no documentation whatsoever). This makes it a bit more complicated than a normal tun/tap interface.
  • macvtap interfaces are persistent by default. Once the macvtap interface has been created via netlink, an actual chracter device file appears under /dev (this does not happen with normal tap interfaces), The device file is called /dev/tapNN, where NN is the interface index of the macvtap (can be seen for example with "ip link show"). It's this device file that has to be opened by programs wanting to use the interface (eg libvirtd/qemu to connect a guest).
  • One consequence of there being an actual device file for the macvtap interface is that traffic entering the interface can be seen and "stolen" to the intended recipient by simply reading from the device file; doing "cat /dev/tap22" (for example) while a guest VM is using it dumps the raw ehernet frames and prevents the VM from seeing them. On the other hand, neither seeing outgoing traffic nor injecting frames by writing to the device file from the outside seem to be possible.
  • If a VM is connected to the macvtap, the MAC address of the macvtap interface as seen on the host is the same that is seen by the guest; this is different from regular tap interfaces, where the guest is somehow "behind" the tap interface (the vnetX interfaces on the host have a MAC address which is not the same that the guest uses).
  • All traffic for guests connected to a macvtap does show up if running tcpdump on the lower device, even in bridge mode and for guest-to-guest traffic. However, as said, tcpdump (on the host) on the macvtap device itself shows no traffic.
  • If the lower device is a wireless card, macvtap doesn't work (the guest is isolated, nothing enters, nothing exits). Perhaps it's just that it only works with some wireless cards, and I happened to have one that doesn't work. Again, I could not find more information.

As said, creating a macvtap interface via code is a bit complicated, but luckily iproute2 can do it on the command line. To create a macvtap interface called macvtap2, with eth2 as its lower physical interface:

# ip link add link eth2 macvtap2 address 00:22:33:44:55:66 type macvtap mode bridge
# ip link set macvtap2 up
# ip link show macvtap2
18: macvtap2@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN mode DEFAULT qlen 500
    link/ether 00:22:33:44:55:66 brd ff:ff:ff:ff:ff:ff
# ls -l /dev/tap18 
crw------- 1 root root 250, 1 May 26 10:51 /dev/tap18

To delete the interface, the usual command can be used:

# ip link del macvtap2

Two links which provide good information about macvtap:
http://seravo.fi/2012/virtualized-bridged-networking-with-macvtap
http://virt.kernelnewbies.org/MacVTap.