Skip to content

“On the fly” IPsec VPN with iproute2

(This has been on the TODO list for months, let's finally get to it.)

Basically we're going to create an IPsec VPN with static manual keys, using only the ip command from iproute2.

As there seems to be some confusion, note that the VPN we're setting up here has nothing to do with a PSK setup, in which normally there is an IKE daemon that dynamically computes the keys and generates new ones at regular intervals. The PSK is used for IKE authentication purposes, and after that the actual IPsec keys are truly dynamic and change periodically.

Here instead we're not using IKE; we just manually generate some static IPsec keys (technically, we generate a bunch of security associations (SAs); each SA has a number of properties, among which is the SPI - a number that identifies the SA - and two keys, one used for authentication/integrity check and the other for enctryption).
As should be clear, this is not something to be used on a regular basis and/or for a long time; it's rather a quick and dirty hack to be used in emergency situations. The problem with passing a lot of VPN traffic using the same keys should be apparent: the more data encrypted with the same key an attacker has, the higher the chances of a successful decryption (which would affect all traffic, including past one). Manual keying also has other problems, like the lack of peer authentication and the possibility of replay attacks.
On the plus side, it's very easy to setup and it does not require to install any software; the only prerequisites are iproute2 (installed by default on just about any distro nowadays) and a kernel that supports IPsec (ditto).

So here's the sample topology:

ipsectun

So, quite obviusly, we want traffic between 10.0.0.0/24 and 172.16.0.0/24 to be encrypted.

There are two important concepts in IPsec: the SPD (Security Policy Database) and the SAs (Security Association). A SA contains the actual security parameters and keys to be used between two given peers; since a SA is unidirectional, there will always be two of them, one for each direction of the traffic. The SPD, on the other hand, defines to which traffic the SA should be applied, in terms of source/destination IP ranges and protocols. If a packet matches a SPD, the associated SA is applied to it, resulting in its encryption, signing or whatever the SA prescribes.
For them to be of any use, the SPD and the SAs must match (with reversed source/destination values) at both ends.

In our case, we're going to manually define both the SPD policies and the SAs using the xfrm subcommand of the ip utility. (In a traditional IKE setup, instead, the security policies are statically defined in configuration files and the SAs are dynamically established by IKE either at startup or upon seeing matching traffic, and periodically renewed.)

The idea here is to have some code that outputs the commands to be run on GW1 and GW2 respectively, so they can be copied and pasted.

Since authentication and encryption keys are an essential part of a SA, let's start by generating them. We want to use AES256 to encrypt, and SHA256 for integrity check, so we know the key length: 256 bit, or 32 bytes. Each SA contains two keys, and there will be two SAs, so we generate four keys.

# bash
declare -a keys
for i in {1..4}; do
  # keys 1 and 3 are for HMAC, keys 2 and 4 are for encryption
  keys[i]=$(xxd -p -l 32 -c 32 /dev/random)
done

As usual, if reading from /dev/random blocks, you need to add some entropy to the pool or use /dev/urandom (less secure, but this setup isn't meant to be very secure anyway).
Each SA needs a unique ID, called the SPI (Security Parameter Index), which is 32 bits (4 bytes) long:

declare -a spi
for i in {1..2}; do
  spi[i]=$(xxd -p -l 4 /dev/random)
done

Finally, there has to be what iproute calls the reqid, which is what links a SA with a SPD. Again this is 4 bytes, so let's generate it (each SA has its own reqid):

declare -a reqid
for i in {1..2}; do
  reqid[i]=$(xxd -p -l 4 /dev/random)
done

The code to create the SAs is as follows (same code for GW1 and GW2):

ip xfrm state add src 192.0.2.1 dst 198.51.100.20 proto esp spi "0x${spi[1]}" reqid "0x${reqid[1]}" mode tunnel auth sha256 "0x${keys[1]}" enc aes "0x${keys[2]}"
ip xfrm state add src 198.51.100.20 dst 192.0.2.1 proto esp spi "0x${spi[2]}" reqid "0x${reqid[2]}" mode tunnel auth sha256 "0x${keys[3]}" enc aes "0x${keys[4]}"

Now to the SPD. Here we define which traffic should be encrypted. In our case, of course, it's all traffic between 10.0.0.0/24 and 172.16.0.0/24, in both directions.

# for GW1
ip xfrm policy add src 10.0.0.0/24 dst 172.16.0.0/24 dir out tmpl src 192.0.2.1 dst 198.51.100.20 proto esp reqid "0x${reqid[1]}" mode tunnel
ip xfrm policy add src 172.16.0.0/24 dst 10.0.0.0/24 dir fwd tmpl src 198.51.100.20 dst 192.0.2.1 proto esp reqid "0x${reqid[2]}" mode tunnel
ip xfrm policy add src 172.16.0.0/24 dst 10.0.0.0/24 dir in tmpl src 198.51.100.20 dst 192.0.2.1 proto esp reqid "0x${reqid[2]}" mode tunnel

# for GW2
ip xfrm policy add src 172.16.0.0/24 dst 10.0.0.0/24 dir out tmpl src 198.51.100.20 dst 192.0.2.1 proto esp reqid "0x${reqid[2]}" mode tunnel
ip xfrm policy add src 10.0.0.0/24 dst 172.16.0.0/24 dir fwd tmpl src 192.0.2.1 dst 198.51.100.20 proto esp reqid "0x${reqid[1]}" mode tunnel
ip xfrm policy add src 10.0.0.0/24 dst 172.16.0.0/24 dir in tmpl src 192.0.2.1 dst 198.51.100.20 proto esp reqid "0x${reqid[1]}" mode tunnel

The commands are symmetrical, with src/dst pairs swapped. I'm not 100% sure why a fourth policy in the "fwd" direction is not needed (more information welcome), but looking at what eg openswan does, it seems that it creates only three policies as above and everything works, so let's stick with that.

The last thing to do is to add suitable routes for the traffic that must be encrypted:

# for GW1
ip route add 172.16.0.0/24 dev eth0 src 10.0.0.1
# for GW2
ip route add 10.0.0.0/24 dev eth0 src 172.16.0.1

Specifying the "src" parameter is important here, if we want traffic originating on the gateways themselves to go through the tunnel.

Now any traffic between the two networks 10.0.0.0/24 and 172.16.0.0/24 will go through the VPN.

So here is the complete script, with endpoint and subnet addresses parametrized (yes, argument checking - and not only that - could certainly be better):

#!/bin/bash

# doipsec.sh

dohelp(){
  echo "usage: $0 <GW1_public_IP|GW1_range|GW1_internal_IP[|GW1_public_iface[|GW1_GWIP]]> <GW2_public_IP|GW2_range|GW2_internal_IP[|GW2_public_iface[|GW2_GWIP]]>" >&2
  echo "Output commands to set up an ipsec tunnel between two machines" >&2
  echo "Example: $0 '192.0.2.1|10.0.0.0/24|10.0.0.254|eth0' '198.51.100.20|172.16.0.0/24|172.16.0.1|eth1'" >&2
}

if [ $# -ne 2 ] || [ "$1" = "-h" ]; then
  dohelp
  exit 1
fi

IFS="|" read -r GW1_IP GW1_NET GW1_IIP GW1_IF GW1_GWIP <<< "$1"
IFS="|" read -r GW2_IP GW2_NET GW2_IIP GW2_IF GW2_GWIP <<< "$2"

if [ "${GW1_IP}" = "" ] || [ "${GW1_NET}" = "" ] || [ "${GW1_IIP}" = "" ]; then
  dohelp
  exit 1
fi

# assume eth0 if not specified
[ "${GW1_IF}" = "" ] && GW1_IF=eth0

# generate variable data

rand_device=/dev/random    # change to urandom if needed

declare -a keys
for i in {1..4}; do
  # keys 1 and 3 are for HMAC, keys 2 and 4 are for encryption
  keys[i]=$(xxd -p -l 32 -c 32 "${rand_device}")
done

declare -a spi
for i in {1..2}; do
  spi[i]=$(xxd -p -l 4 "${rand_device}")
done

declare -a reqid
for i in {1..2}; do
  reqid[i]=$(xxd -p -l 4 "${rand_device}")
done

# route statement to allow default routing through the tunnel

# sucking heuristic
if [ "${GW1_GWIP}" != "" ] && [ "${GW2_NET}" = "0.0.0.0/0" ]; then
  # add a /32 route to the peer before pointing the default to the tunnel
  GW1_GW2_ROUTE="ip route add ${GW2_IP}/32 dev ${GW1_IF} via ${GW1_GWIP} && ip route del ${GW2_NET} && ip route add ${GW2_NET} dev ${GW1_IF} src ${GW1_IIP}" 
else
  GW1_GW2_ROUTE="ip route add ${GW2_NET} dev ${GW1_IF} src ${GW1_IIP}" 
fi

if [ "${GW2_GWIP}" != "" ] && [ "${GW1_NET}" = "0.0.0.0/0" ]; then
  GW2_GW1_ROUTE="ip route add ${GW1_IP}/32 dev ${GW2_IF} via ${GW2_GWIP} && ip route del ${GW1_NET} && ip route add ${GW1_NET} dev ${GW2_IF} src ${GW2_IIP}" 
else
  GW2_GW1_ROUTE="ip route add ${GW1_NET} dev ${GW2_IF} src ${GW2_IIP}" 
fi

cat << EOF
**********************
Commands to run on GW1
**********************

ip xfrm state flush; ip xfrm policy flush

ip xfrm state add src ${GW1_IP} dst ${GW2_IP} proto esp spi 0x${spi[1]} reqid 0x${reqid[1]} mode tunnel auth sha256 0x${keys[1]} enc aes 0x${keys[2]}
ip xfrm state add src ${GW2_IP} dst ${GW1_IP} proto esp spi 0x${spi[2]} reqid 0x${reqid[2]} mode tunnel auth sha256 0x${keys[3]} enc aes 0x${keys[4]}

ip xfrm policy add src ${GW1_NET} dst ${GW2_NET} dir out tmpl src ${GW1_IP} dst ${GW2_IP} proto esp reqid 0x${reqid[1]} mode tunnel
ip xfrm policy add src ${GW2_NET} dst ${GW1_NET} dir fwd tmpl src ${GW2_IP} dst ${GW1_IP} proto esp reqid 0x${reqid[2]} mode tunnel
ip xfrm policy add src ${GW2_NET} dst ${GW1_NET} dir in tmpl src ${GW2_IP} dst ${GW1_IP} proto esp reqid 0x${reqid[2]} mode tunnel

${GW1_GW2_ROUTE}

**********************
Commands to run on GW2
**********************

ip xfrm state flush; ip xfrm policy flush

ip xfrm state add src ${GW1_IP} dst ${GW2_IP} proto esp spi 0x${spi[1]} reqid 0x${reqid[1]} mode tunnel auth sha256 0x${keys[1]} enc aes 0x${keys[2]}
ip xfrm state add src ${GW2_IP} dst ${GW1_IP} proto esp spi 0x${spi[2]} reqid 0x${reqid[2]} mode tunnel auth sha256 0x${keys[3]} enc aes 0x${keys[4]}

ip xfrm policy add src ${GW2_NET} dst ${GW1_NET} dir out tmpl src ${GW2_IP} dst ${GW1_IP} proto esp reqid 0x${reqid[2]} mode tunnel
ip xfrm policy add src ${GW1_NET} dst ${GW2_NET} dir fwd tmpl src ${GW1_IP} dst ${GW2_IP} proto esp reqid 0x${reqid[1]} mode tunnel
ip xfrm policy add src ${GW1_NET} dst ${GW2_NET} dir in tmpl src ${GW1_IP} dst ${GW2_IP} proto esp reqid 0x${reqid[1]} mode tunnel

${GW2_GW1_ROUTE}
EOF

So for our example we'd run it with something like:

$mdoipsec.sh '192.0.2.1|10.0.0.0/24|10.0.0.254|eth0' '198.51.100.20|172.16.0.0/24|172.16.0.1|eth1'
**********************
Commands to run on GW1
**********************

ip xfrm state flush; ip xfrm policy flush

ip xfrm state add src 192.0.2.1 dst 198.51.100.20 proto esp spi 0xfd51141e reqid 0x62502e58 mode tunnel auth sha256 0x4046c2f9ff22725b850e2d981968249dc6c25fba189e701cf9a14e921f91cffb enc aes 0xccd80053ae1b55113a89bc476d0de1d9e8b7bc94655f3af1b0dad7bb9ada1065
ip xfrm state add src 198.51.100.20 dst 192.0.2.1 proto esp spi 0x34e0aac0 reqid 0x66a32a19 mode tunnel auth sha256 0x1caf04f262e889b9b53b6c95bfbb4ef0292616362e8878fe96123610ca000892 enc aes 0x9380e038247fcd893d4f8799389b90bfa4d0b09195495bb94fe3a9fa5c5b699d

ip xfrm policy add src 10.0.0.0/24 dst 172.16.0.0/24 dir out tmpl src 192.0.2.1 dst 198.51.100.20 proto esp reqid 0x62502e58 mode tunnel
ip xfrm policy add src 172.16.0.0/24 dst 10.0.0.0/24 dir fwd tmpl src 198.51.100.20 dst 192.0.2.1 proto esp reqid 0x66a32a19 mode tunnel
ip xfrm policy add src 172.16.0.0/24 dst 10.0.0.0/24 dir in tmpl src 198.51.100.20 dst 192.0.2.1 proto esp reqid 0x66a32a19 mode tunnel

ip route add 172.16.0.0/24 dev eth0 src 10.0.0.254

**********************
Commands to run on GW2
**********************

ip xfrm state flush; ip xfrm policy flush

ip xfrm state add src 192.0.2.1 dst 198.51.100.20 proto esp spi 0xfd51141e reqid 0x62502e58 mode tunnel auth sha256 0x4046c2f9ff22725b850e2d981968249dc6c25fba189e701cf9a14e921f91cffb enc aes 0xccd80053ae1b55113a89bc476d0de1d9e8b7bc94655f3af1b0dad7bb9ada1065
ip xfrm state add src 198.51.100.20 dst 192.0.2.1 proto esp spi 0x34e0aac0 reqid 0x66a32a19 mode tunnel auth sha256 0x1caf04f262e889b9b53b6c95bfbb4ef0292616362e8878fe96123610ca000892 enc aes 0x9380e038247fcd893d4f8799389b90bfa4d0b09195495bb94fe3a9fa5c5b699d

ip xfrm policy add src 172.16.0.0/24 dst 10.0.0.0/24 dir out tmpl src 198.51.100.20 dst 192.0.2.1 proto esp reqid 0x66a32a19 mode tunnel
ip xfrm policy add src 10.0.0.0/24 dst 172.16.0.0/24 dir fwd tmpl src 192.0.2.1 dst 198.51.100.20 proto esp reqid 0x62502e58 mode tunnel
ip xfrm policy add src 10.0.0.0/24 dst 172.16.0.0/24 dir in tmpl src 192.0.2.1 dst 198.51.100.20 proto esp reqid 0x62502e58 mode tunnel

ip route add 10.0.0.0/24 dev eth1 src 172.16.0.1

Note that it's not necessarily the two networks local to GW1 and GW2 that have to be connected by the tunnel. If GW2 had, say, an existing route to 192.168.0.0/24, it would be perfectly possible to say:

$ doipsec.sh '192.0.2.1|10.0.0.0/24|10.0.0.254|eth0' '198.51.100.20|192.168.0.0/24|172.16.0.1|eth1'
...

to encrypt traffic from/to 10.0.0.0/24 and 192.168.0.0/24. Of course, in this case either hosts in 192.168.0.0/24 must somehow have a route back to 10.0.0.0/24 going through GW2, or GW2 must NAT traffic coming from 10.0.0.0/24 destined to 192.168.0.0/24 (and hosts there must still have a route back to GW2's masquerading address), but that should be obvious.

In the same way, it's possible to just route everything to/from site A through the tunnel (although I would not recommend it):

$ doipsec.sh '192.0.2.1|10.0.0.0/24|10.0.0.254|eth0|192.0.2.254' '198.51.100.20|0.0.0.0/0|172.16.0.1|eth1'
**********************
Commands to run on GW1
**********************

ip xfrm state flush; ip xfrm policy flush

ip xfrm state add src 192.0.2.1 dst 198.51.100.20 proto esp spi 0x00127764 reqid 0xd7d184b1 mode tunnel auth sha256 0x8dcce7d80f7c8bb81e6a526b9d5d7ce2e7a474e3406c40953108b6d92b61cb77 enc aes 0xf9d41041fc014b94d602ed051800601464cdbc525847d5894ed03f55b8b5e78c
ip xfrm state add src 198.51.100.20 dst 192.0.2.1 proto esp spi 0xec8fe8cb reqid 0x18fcbfd1 mode tunnel auth sha256 0xc1dbbafc0deff6d4bfe0e2736d443d94ffe25ce8637e6f70e3260c87cf8f9724 enc aes 0x6170cd164092554bfd8402c528439c2c3d9823b74b493d9c18ca05a9c3b40a0d

ip xfrm policy add src 10.0.0.0/24 dst 0.0.0.0/0 dir out tmpl src 192.0.2.1 dst 198.51.100.20 proto esp reqid 0xd7d184b1 mode tunnel
ip xfrm policy add src 0.0.0.0/0 dst 10.0.0.0/24 dir fwd tmpl src 198.51.100.20 dst 192.0.2.1 proto esp reqid 0x18fcbfd1 mode tunnel
ip xfrm policy add src 0.0.0.0/0 dst 10.0.0.0/24 dir in tmpl src 198.51.100.20 dst 192.0.2.1 proto esp reqid 0x18fcbfd1 mode tunnel

ip route add 198.51.100.20/32 dev eth0 via 192.0.2.254 && ip route del 0.0.0.0/0 && ip route add 0.0.0.0/0 dev eth0 src 10.0.0.254

**********************
Commands to run on GW2
**********************

ip xfrm state flush; ip xfrm policy flush

ip xfrm state add src 192.0.2.1 dst 198.51.100.20 proto esp spi 0x00127764 reqid 0xd7d184b1 mode tunnel auth sha256 0x8dcce7d80f7c8bb81e6a526b9d5d7ce2e7a474e3406c40953108b6d92b61cb77 enc aes 0xf9d41041fc014b94d602ed051800601464cdbc525847d5894ed03f55b8b5e78c
ip xfrm state add src 198.51.100.20 dst 192.0.2.1 proto esp spi 0xec8fe8cb reqid 0x18fcbfd1 mode tunnel auth sha256 0xc1dbbafc0deff6d4bfe0e2736d443d94ffe25ce8637e6f70e3260c87cf8f9724 enc aes 0x6170cd164092554bfd8402c528439c2c3d9823b74b493d9c18ca05a9c3b40a0d

ip xfrm policy add src 0.0.0.0/0 dst 10.0.0.0/24 dir out tmpl src 198.51.100.20 dst 192.0.2.1 proto esp reqid 0x18fcbfd1 mode tunnel
ip xfrm policy add src 10.0.0.0/24 dst 0.0.0.0/0 dir fwd tmpl src 192.0.2.1 dst 198.51.100.20 proto esp reqid 0xd7d184b1 mode tunnel
ip xfrm policy add src 10.0.0.0/24 dst 0.0.0.0/0 dir in tmpl src 192.0.2.1 dst 198.51.100.20 proto esp reqid 0xd7d184b1 mode tunnel

ip route add 10.0.0.0/24 dev eth1 src 172.16.0.1

In this last case, GW2 must obviously perform NAT on at least some of the traffic coming from 10.0.0.0/24. IMPORTANT: since the fifth argument has been specified for GW1 and the remote network is 0.0.0.0/0, the resulting commands include a statement that temporarily deletes the default route on GW1, before recreating it to point into the tunnel. If you're running the commands remotely (eg via SSH) on the relevant machine, things can go wrong and screw up pretty easily. You must always inspect the generated routing code to make sure it's fine for your case, and take the necessary precautions to avoid losing access. This code isn't meant to be production-level anyway.

Another point worth noting is that the generated commands

ip xfrm state flush; ip xfrm policy flush

will remove any trace of IPsec configuration, including preexisting tunnels that may be configured and possibly running. But if that is the case, it means there is a "real" IPsec implementation on the machine, so that's what should be used for the new tunnel too, not the kludgy script described here.

So that's it for this klu^Wexperiment. In principle, one could envision some sort of scheduled task syncronized between the machines that updates the SAs or generates new ones with new keys at regular intervals (ip xfrm allows for that), but in practice for anything more complex it would be too much work for a task for which a well-known protocol exists, namely IKE, which is what should be used for any serious IPsec deployment in any case.

“Range of fields” in awk

This is an all-time awk FAQ. It can be stated in various ways. A typical way is:

"How can I print the whole line except the first (or the first N, or the Nth) field?"

Or also:

"How can I print only from field N to field M?"

The underlying general question is:

"How can I print a range of fields with awk?"

There are actually quite a few ways to accomplish the task, each has its applicability scenario(s) and its pros and cons. Let's start with methods that only use standard Awk features, then we'll get to GNU awk.

Use a loop

This is the most obvious way: just loop from N to M and print the corresponding fields.

sep = ""
for (i = 3; i<=NF; i++) {
  printf "%s%s", sep, $i
  sep = FS
}
print ""

This is easy, but has some issues: first, the original record spacing is lost. If the input record (line) was, say,

  abc  def   ghi    jkl     mno

the above code will print

ghi jkl mno

instead. This might or might not be a problem. For the same reason, if FS is a complex regular expression, whatever separated the fields in the original input is lost.
On the other hand, if FS is exactly a single-char expression (except space, which is the default and special cased), the above code works just fine.

Assign the empty string to the unwanted fields

So for example one might do:

$1 = $2 = ""; print substr($0, 3)

That presents the same problems as the first solution (formatting is lost), although for different reasons (here it's because awk rebuilds the line with OFS between fields), and introduces empty fields, which have to be skipped when printing the line (in the above example, the default OFS of space is assumed, so we must print starting from the third character; adapt accordingly if OFS is something else).

Delete the unwanted fields

Ok, so it's not possible to delete a field by assigning the empty string to it, but if we modify $0 directly we can indeed remove parts of it and thus fields. We can use sub() for the task:

# default FS
# removes first 2 fields
sub(/^[[:blank:]]*([^[:blank:]]+[[:blank:]]+){2}/,""); print
# removes last 3 fields
sub(/([[:blank:]]+[^[:blank:]]+){3}[[:blank:]]*$/,""); print

# one-char FS, for example ";"
# removes first 2 fields
sub(/^([^;]+;){2}/,""); print
# removes last 3 fields
sub(/(;[^;]+){3}$/,""); print

While this approach has the advantage that it preserves the original formatting (this is especially important if FS is the default, which in awk is slightly special-cased, as can be seen from the first example), it has the problem that it's not applicable at all if FS is a regular expression (that is, when it's not the default and is longer than one character).
It also requires that the Awk implementation in use understands the regex {} quantifier operator, something many awks don't do (although this can be worked around by "expanding" the expression, that is, for example, using "[^;]+;[^;]+;[^;]+;" instead of "([^;]+;){3}". However, the resulting expression might be quite long and awkward - pun intended).

Manually find start and end of fields

Let's now try to find a method that works regardless of FS or OFS. We observe that we can use index($0, $1) to find where $1 begins. We also know the length of $1, so we know where it ends within $0. Now, we can use again index() starting from the next character to find where $2 begins, and so on for all fields of $0. so we can discover the starting positions within $0 for all fields. Sample code:

pos = 0
for (i=1; i<= NF; i++) {
  start[i] = index(substr($0, pos + 1), $i) + pos
  pos = start[i] + length($i)
}

Now, start[1] contains the starting position of field 1 ($1), start[2] the starting position of $2, etc. (As customary in awk, the first character of a string is at position 1.) With this information, printing field 3 to NF without losing information is as simple as doing

first = 3
last = NF
print substr($0, start[first], start[last] - start[first] + length($last))

Seems easy right? Well, this approach has a problem: it assumes that the input has no empty fields, which however are perfectly fine in awk. If some of the fields in the desired range are empty, it may or may not work. So let's see if we can do better.

Manually find the separators

By design, FS can never match the empty string (more on this later), so perhaps we can look for matches of FS (using match()) and use those offsets to extract the needed fields. The idea is the same as in the previous approach, each match is attempted starting from where the previous one left off plus the length of the following field.
If we go this route, however, we must keep in mind that the default FS in awk is special-cased, in that leading and trailing blanks (spaces + tabs) in the record are not counted for the purpose of field splitting, and furthermore fields are separated by runs of blanks despite FS being just a single space. This only happens with the default FS; with any other value, each match terminates exactly one field. Fortunately, it is possible to check whether FS is the default by comparing it to the string " " (a space). If we detect the default FS, we remove leading and trailing blanks from the record, and, for the purpose of matching, change it to its effectively equivalent pattern, that is, "[[:blank:]]+".
If FS is not the default, there is still another special case we should check. The awk specification says that if FS is exactly one character (and is not a space), it must NOT be treated as a regular expression. Since we want to use match() and FS as a pattern, this is especially important, for example if FS is ".", or "+", or "*", which are special regular expression metacharacters but should be treated literally in this case.
All that being said, here's some code that finds and saves all matches of FS:

BEGIN {
  # sep_re is the "effective" FS, so to speak, to be
  # used to find where separators are
  sep_re = FS
  defaultfs = 0

  # ...but check for special cases
  if (FS == " ") {
    defaultfs = 1
    sep_re = "[[:blank:]]+"
  } else if (length(FS) == 1) {
    if (FS ~ /[][^$.*?+{}\\()|]/) {
      sep_re = "\\" FS
    }
  }
}

{
  # save $0 and work on the copy
  record = $0

  if (defaultfs) {
    gsub(/^[[:blank:]]+|[[:blank:]]+$/, "", record)
  }

  # find separators
  i = 0
  while(1) {
    if (match(record, sep_re)) {
      i++
      seps[i] = substr(record, RSTART, RLENGTH)
      record = substr(record, RSTART + RLENGTH)
    } else {
      break
    }
  }

  # ...continued below

With the above code seps[i] contains the string that matched FS between field i and i + 1. We of course also have the fields themselves in $1...$NF, so we can finally write the code that extracts a range of fields from the line:

  # ...continued from above

  result = ""

  first = 3
  last = NF
  for (i = first; i < last; i++) {
    result = result $i seps[i]
  }
  result = result $last
  print result
}

Are we still overlooking something? Unfortunately, yes.
We said earlier that FS can't match the empty string; however, technically we can obviously set it to a value that would ordinarily match the empty string, for example

FS="a*"

That matches zero or more a's, so in particular it will produce a zero-length match if it can't find an "a".
But, just as obviously, an FS that can match a zero-length string is useless as field "separator", so what happens in these cases is that awk just does not allow it to match:

$ echo 'XXXaYYYaaZZZ' | awk -F 'a*' '{for (i=1; i<=NF; i++) print i, $i}'
1 XXX
2 YYY
3 ZZZ

In other words, if awk finds a match of length zero it just ignores it and skips to the next character until it can find a match of length at least 1 for FS.

(Let's leave aside the fact that setting FS to "a*" makes no sense, as in that case what's really wanted is "a+" instead and let's try to make the code handle the worst case.)

In our sample code, we're using match(), which can indeed produce zero-length matches, but we are not checking for those cases; the result is that running it with an FS that can produce zero-length matches will loop forever.

Thus we need to mimic awk's field splitting a little bit more, in that if we find a zero-length match, we just ignore it and try to match again starting from the next character.
So here's the full code to print a range of fields preserving format and separators, with the revised loop to find separators skipping zero-length matches:

BEGIN {
  # sep_re is the "effective" FS, so to speak, to be
  # used to find where separators are
  sep_re = FS
  defaultfs = 0

  # ...but check for special cases
  if (FS == " ") {
    defaultfs = 1
    sep_re = "[[:blank:]]+"
  } else if (length(FS) == 1) {
    if (FS ~ /[][^$.*?+{}\\()|]/) {
      sep_re = "\\" FS
    }
  }
}

{
  # save $0 and work on the copy
  record = $0

  if (defaultfs) {
    gsub(/^[[:blank:]]+|[[:blank:]]+$/, "", record)
  }

  # find separators
  i = 0
  while(1) {
    if (length(record) == 0) break;
    if (match(record, sep_re)) {
      if (RLENGTH > 0) {
        i++
        seps[i] = substr(record, RSTART, RLENGTH)
        record = substr(record, RSTART + RLENGTH)
      } else {
        # ignore zero-length match: go to next char
        record = substr(record, 2)
      }
    } else {
      break
    }
  }

  result = ""

  first = 3
  last = NF
  for (i = first; i < last; i++) {
    result = result $i seps[i]
  }
  result = result $last
  print result
}

A simple optimization of the above code would be to directly skip the next field upon finding a match for FS, eg

# attempt next match after the field that begins here
record = substr(record, RSTART + RLENGTH + length($i))

since, by definition, a field can never match FS, so it can be skipped entirely for the purpose of finding matches of FS.

GNU awk

As it often happens, life is easier for GNU awk users. In this case, thanks to the optional fourth argument to the split() function (a GNU awk extension present at least since 4.0), which is an array where the separators are saved. So all that is needed is something like:

# this does all the hard work, as split() is
# guaranteed to behave like field splitting
nf = split($0, fields, FS, seps)

first = 3
last = NF
for (i = first; i < last; i++) {
  result = result fields[i] seps[i]
}
result = result $last
print result

For more and a slightly different take on the subject, see also this page on the awk.freeshell.org wiki.

Three text processing tasks

Just three problems that came up in different circumstances in the last couple of months.

Ranges, again

Ranges strike again, this time the task is to print or select everything from the first occurrence of /START/ in the input to the last occurrence of /END/, including the extremes or not. So, given this sample input:

 1 xxxx
 2 xxxx
 3 END
 4 aaa
 5 START
 6 START
 7 zzz
 8 START
 9 hhh
10 END
11 ppp
12 END
13 mmm
14 START

we want to match from line 5 to 12 (or from line 6 to 11 in the noninclusive version).

The logic is something along the lines of: when /START/ is seen, start collecting lines. Each time an /END/ is seen (and /START/ was previously seen), print what we have so far, empty the buffer and start collecting lines again, in case we see another /END/ later.

Here's an awk solution for the inclusive case:

awk '!ok && /START/ { ok = 1 }
ok { p = p sep $0; sep = RS }
ok && /END/ { print p; p = sep = "" }' file.txt

and here's the noninclusive case, which is mostly the same code with the order of the blocks reversed:

awk 'ok && /END/ { if (content) print p; p = sep = "" }
ok { p = p sep $0; sep = RS; content = 1 }
!ok && /START/ { ok = 1 }' file.txt

The "content" variable is necessary for the obscure corner case in which the input contains something like

...
START

END
...

If we relied upon "p" not being empty to decide whether to print or not, this case would be indistinguishable from this other one:

...
START
END
...

We could also (perhaps a bit cryptically) avoid the extra variable and rely on "sep" being set instead. We keep the extra variable for the sake of clarity.

Here are two sed solutions implementing the same logic (not really recommended, but since the original request was to solve this with sed). The hold buffer is used to accumulate lines.
Inclusive:

# sed -n
# from first /START/ to last /END/, inclusive version

/START/ {
  H
  :loop
  $! {
    n
    H
    # if we see an /END/, sanitize and print
    /END/ {
      x
      s/^\n//
      p
      s/.*//
      x
    }
    bloop
  }
}

The noninclusive version uses the same logic, except we discard the first /START/ line that we see (done by the "n" in the loop), and, when we see an /END/, we print what we have so far (which crucially does not include the /END/ line itself, which however is included for the next round of accumulation).

# sed -n
# from first /START/ to last /END/, noninclusive version

/START/ {
  :loop
  $! {
    n
    /END/ {
      # recover lines accumulated so far
      x

      # if there something, print
      /./ {
        # remove leading \n added by H
        s/^\n//
        p
      }

      # empty the buffer
      s/.*//

      # recover the /END/ line for next round
      x
    }
    H
    bloop
  }
}

Note that the above solutions assume that no line exists that match both /START/ and /END/. Other solutions are of course possible.

Conditional line join

In this case we have some special lines (identified by a pattern). Every time a special line is seen, all the previous or following lines should be joined to it. An example to make it clear, using /SPECIAL/ as our pattern:

SPECIAL 1
line2
line3
SPECIAL 2
line5
line6
line7
SPECIAL 3
SPECIAL 4
line10
SPECIAL 5

So we want one of the two following outputs, depending on whether we join the special lines to the preceding or the following ones:

# join with following lines
SPECIAL 1 line2 line3
SPECIAL 2 line5 line6 line7
SPECIAL 3
SPECIAL 4 line10
SPECIAL 5
# join with preceding lines
SPECIAL 1
line2 line3 SPECIAL 2
line5 line6 line7 SPECIAL 3
SPECIAL 4
line10 SPECIAL 5

The sample input has been artificially crafted to work with both types of change; in practice, in real inputs either the first or the last line won't match /SPECIAL/, depending on the needed processing.

So here's some awk code that joins each special line with the following ones, until a new special line is found, thus producing the first of the two output shown above:

awk -v sep=" " '/SPECIAL/ && done == 1 {
  print ""
  s = ""
  done = 0
}
{
  printf "%s%s", s, $0
  s = sep
  done = 1
}
END {
  if (done) print""
}' file.txt

And here's the idiomatic solution to produce the second output (join with preceding lines):

awk -v sep=" " '{ ORS = /SPECIAL/ ? RS : sep }1' file.txt

The variable "sep" should be set to the desired separator to be used when joining lines (here it's simply a space).

Intra-block sort

(for want of a better name)

Let's imagine an input file like

alpha:9832
alpha:11
alpha:449
delta:23847
delta:113
gamma:1
gamma:10
gamma:100
gamma:101
beta:5768
beta:4

The file has sections, where the first field names the section (alpha, beta etc.). Now we want to sort each section according to its second field (numeric), but without changing the overall order of the sections. In other words, we want this output:

alpha:11
alpha:449
alpha:9832
delta:113
delta:23847
gamma:1
gamma:10
gamma:100
gamma:101
beta:4
beta:5768

As a variation, blocks can be separated by a blank line, as follows:

alpha:9832
alpha:11
alpha:449

delta:23847
delta:113

gamma:1
gamma:10
gamma:100
gamma:101

beta:5768
beta:4

So the corresponding output should be

alpha:11
alpha:449
alpha:9832

delta:113
delta:23847

gamma:1
gamma:10
gamma:100
gamma:101

beta:4
beta:5768
Shell

The blatantly obvious solution using the shell is to number each section adding a new field at the beginning, then sort according to field 1 + field 3, and finally print the result removing the extra field that we added:

awk -F ':' '$1 != prev {count++} {prev = $1; print count FS $0}' file.txt | sort -t ':' -k1,1n -k3,3n | awk -F ':' '{print substr($0,index($0,FS)+1)}'
alpha:11
alpha:449
alpha:9832
delta:113
delta:23847
gamma:1
gamma:10
gamma:100
gamma:101
beta:4
beta:5768

Instead of reusing awk, the job of the last part of the pipeline could have been done for example with cut or sed.

For the variation with separated blocks, an almost identical solution works. Paragraphs are numbered prepending a new field, the result sorted, and the prepended numbers removed before printing:

awk -v count=1 '/^$/{count++}{print count ":" $0}' file.txt | sort -t ':' -k1,1n -k3,3n | awk -F ':' '{print substr($0,index($0,FS)+1)}'
alpha:11
alpha:449
alpha:9832

delta:113
delta:23847

gamma:1
gamma:10
gamma:100
gamma:101

beta:4
beta:5768

A crucial property of this solution is that empty lines are always thought as being part of the next paragraph (not the previous), so when sorting they remain where they are. This also means that runs of empty lines in the input are preserved in the output.

Perl

The previous solutions treat the input as a single entity, regardless of how many blocks it has. After preprocessing, sort is applied to the whole data, and if the file is very big, many temporary resources (disk, memory) are needed to do the sorting.

Let's see if it's possible to be a bit more efficient and sort each block independently.

Here is an example with perl that works with both variations of the input (without and with separated blocks).

#!/usr/bin/perl

use warnings;
use strict;

sub printblock {
  print $_->[1] for (sort { $a->[0] <=> $b->[0] } @_);
}

my @block = ();
my ($prev, $cur, $val);

while(<>){

  my $empty = /^$/;

  if (!$empty) {
    ($cur, $val) = /^([^:]*):([^:]*)/;
    chomp($val);
  }

  if (@block && ($empty || $cur ne $prev)) {
    printblock(@block);
    @block = ();
  }

  if ($empty) {
    print;
  } else {
    push @block, [ $val, $_ ];
    $prev = $cur;
  }
}

printblock(@block) if (@block);

Of course all the sample code given here must be adapted to the actual input format.

File encryption on the command line

This list is just a reference which hopefully saves some googling.

Let's make it clear that we're talking about symmetric encryption here, that is, a password (or better, a passphrase) is supplied when the file is encrypted, and the same password can be used to decrypt it. No public/private key stuff or other preparation should be necessary. We want a quick and simple way of encrypting stuff (for example, before moving them to the cloud or offsite backup not under our control). As said, file ecryption, not whole filesystems or devices.

Another important thing is that symmetric encryption is vulnerable to brute force attacks, so a strong password should always be used and the required level of security should always be evaulated. It may be that symmetric encryption is not the right choice for a specific situation.

It is worth noting that the password or passphrase that are supplied to the commands are not used directly for encryption/decription, but rather are used to derive the actual encryption/decryption keys. However this is done transparently by the tools (usually through some sort of hashing) and for all practical purposes, these passwords or passphrases are the keys, and should be treated as such.

In particular, one thing that should be avoided is putting them directly on the command line. Although some tools allow that, the same tools generally also offer options to avoid it, and they should definitely be used.

Openssl

Probably the simplest and most commonly installed tool is openssl.

# Encrypt
$ openssl enc -aes-192-cbc -in plain.txt -out encrypted.enc
# Decrypt
$ openssl enc -d -aes-192-cbc -in encrypted.enc -out plain.txt

The above is the basic syntax. The cipher name can of course be different; the man page for the enc openssl subcommand lists the supported algorithms (the official docs also say: "The output of the enc command run with unsupported options (for example openssl enc -help) includes a list of ciphers, supported by your version of OpenSSL, including ones provided by configured engines." Still, it seems that adding a regular -help or -h option wouldn't be too hard). Other useful options:

  • -d to decrypt
  • -pass to specify a password source. In turn, the argument can have various formats: pass:password to specify the password directly in the command, env:var to read it from the environment variable $var, file:pathname to read it from the file at pathname, fd:number to read it from a given file descriptor, and stdin to read it from standard input (equivalent to fd:0, but NOT equivalent to reading it from the user's terminal, which is the default behavior if -pass is not specified)
  • -a to base64-encode the encrypted file (or assume it's base64-encoded if decrypting)

Openssl can also read the data to encrypt from standard input (if no file is specified with -in) and/or write to standard output (if -out is not given). Example with password from file:

# Encrypt
$ tar -czvf - file1 file2 ... | openssl enc -aes-192-cbc -pass file:/path/to/keyfile -out archive.tar.gz.enc
# Decrypt
$ openssl enc -d -aes-192-cbc -pass file:/path/to/keyfile -in archive.tar.gz.enc | tar -xzvf -

GPG

There are two main versions of GPG, the 1.x series and the 2.x series (respectively 1.4.x and 2.0.x at the time of writing).

gpg comes with a companion program, gpg-agent, that can be used to store and retrieve passphrases use to unlock private keys, in much the same way that ssh-agent caches password-protected SSH private keys (actually, in addition to its own, gpg-agent can optionally do the job of ssh-agent and replace it). Using gpg-agent is optional with gpg 1.x, but mandatory with gpg 2.x. In practice, when doing symmetric encryption, the agent is not used, so we won't talk about it here (although we will briefly mention it later when talking about aespipe, since that tool can use it).

GPG 1.4.x
# Encrypt file
$ gpg --symmetric --cipher-algo AES192 --output encrypted.enc plain.txt
# Decrypt file
$ gpg --decrypt --output plain.txt encrypted.enc

# Encrypt stdin to file
$ tar -czvf - file1 file2 ... | gpg --symmetric --cipher-algo AES192 --output archive.tar.gz.enc
# Decrypt file to stdout
$ gpg --decrypt archive.tar.gz.enc | tar -xzvf -

Useful options:

  • -a (when encrypting) create ascii-armored file (ie, a special text file)
  • --cipher-algo ALG (when encrypting) use ALG as cipher algorithm (run gpg --version to get a list of supported ciphers)
  • --batch avoid asking questions to the user (eg whether to overwrite a file). If the output file exists, the operation fails unless --yes is also specified
  • --yes assume an answer of "yes" to most questions (eg when overwriting an output file, which would otherwise ask for confirmation)
  • --no-use-agent to avoid the "gpg: gpg-agent is not available in this session" message that, depending on configuration, might be printed if gpg-agent is not running (it's only to avoid the message; as said, the agent is not used anyway with symmetric encryption)
  • --passphrase string use string as the passphrase
  • --passphrase-file file read passphrase from file
  • --passphrase-fd n read passphrase from file descriptor n (use 0 for stdin)
  • --quiet suppress some output messages
  • --no-mdc-warning (when decrypting) suppress the "gpg: WARNING: message was not integrity protected" message. Probably, a better thing to do is use --force-mdc when encrypting, so GPG won't complain when decrypting.

In any case, GPG will create and populate a ~/.gnupg/ directory if it's not present (I haven't found a way to avoid it - corrections welcome).

Similar to openssl, GPG reads from standard input if no filename is specified at the end of the command line. However, writing to standard output isn't obvious.

When encrypting, if no --output option is given, GPG will create a file with the same name as the input file, with the added .gpg extension (eg file.txt becomes file.txt.gpg), unless input comes from stdin, in which case output goes to stdout. If the input comes from a regular file and writing to standard ouput is desired, --output - can be used. --output can of course also be used if we want an output file name other than the default with .gpg appended.
On the other hand, when decrypting using --decrypt output goes to stdout unless --output is used to override it. If --decrypt is not specified, GPG still decrypts, but the default operation is to decrypt to a file named like the one on the command line but with the .pgp suffix removed (eg file.txt.pgp becomes file.txt); if the file specified does not end in .pgp, then --output must be specified (--output - writes to stdout), otherwise PGP exits with a "unknown suffix" error.

GPG 2.0.x
# Encrypt file
$ gpg --symmetric --batch --yes --passphrase-file key.txt --cipher-algo AES256 --output encrypted.enc plain.txt
# Decrypt file
$ gpg --decrypt --batch --yes --passphrase-file key.txt --output plain.txt encrypted.enc

# Encrypt stdin to file
$ tar -czvf - file1 file2 ... | gpg --symmetric --batch --yes --passphrase-file key.txt --cipher-algo AES256 --output archive.tar.gz.enc
# Decrypt file to stdout
$ gpg --decrypt --batch --yes --passphrase-file key.txt archive.tar.gz.enc | tar -xzvf -

In this case, the --batch option is mandatory (and thus probably --yes too) if we don't want pgp to prompt for the passphrase and instead use the one supplied on the command line with one of the --passphrase* options. The --no-use-agent option is ignored in gpg 2.0.x, as using the agent is mandatory and thus it should always be running (even though it's not actually used when doing symmetric encryption).

aespipe

As the name suggests, aespipe only does AES in its three variants (128, 192, 256). Aespipe tries hard to prevent the user from specifying the passphrase on the command line (and rightly so), so the passphrase(s) must normally be in a file (plaintext or encrypted with GPG). It is of course possible to come up with kludges to work around these restrictions, but they are there for a reason.

Aespipe can operate in single-key mode, where only one key/passphrase is necessary, and in multi-key mode, for which at least 64 keys/passphrases are needed. With 64 keys it operates in multi-key-v2 mode, with 65 keys it switches to multi-key-v3 mode, which is the safest and recommended mode, and the one that will be used for the examples.

So we need a file with 65 lines of random grabage; one way to generate it is as follows:

$ tr -dc '[:print:]' < /dev/random | fold -b | head -n 65 > keys.txt

If the above command blocks, it means that the entropy pool of the system isn't providing enough data. Either generate some entropy by doing some work or using an entropy-gathering daemon, or use /dev/urandom instead (at the price of losing some randomness).

Aespipe can also use a pgp-encrypted key file; more on this later. For now let's use the cleartext one.

# Encrypt a file using aes256
$ aespipe -e AES256 -P keys.txt < plain.txt > encrypted.enc
# Decrypt
$ aespipe -d -P keys.txt < encrypted.enc > plain.txt

As can be seen from the examples, given the way aespipe works (that is, as a pipe), it is not necessary to show its usage to encrypt to/from stdin/stdout, since it's its default and only mode of operation.

Useful options:

  • -C count run count rounds of hashing when generating the encryption key from the passphrase. This stretching helps to slow down brute force attacks. Recommended if using single-key mode, not needed in multi-key mode(s)
  • -e ENCALG (when encrypting) use ENCALG as cipher algorithm (AES128, AES192, AES256)
  • -h HASHALG use HASHALG to generate the actual key from the passphrase (default depends on encryption algorithm, see the man page)

One very important thing to note is that aespipe has a minimum block granularity when encrypting and decrypting; in simple terms, this means that the result of the decryption must always be a multiple of this minumum (16 bytes in single-key mode, 512 bytes in multi-key modes). NULs are added to pad if needed. Here is a blatant demonstration of this fact:

$ echo hello > file.txt.orig
$ ls -l file.txt.orig
-rw-r--r-- 1 waldner users 6 Jul 11 16:52 file.txt.orig
$ aespipe -P keys.txt < file.txt.orig > file.txt.enc
$ aespipe -d -P keys.txt < file.txt.enc > file.txt.dec
$ ls -l file.txt.*
-rw-r--r-- 1 waldner users 512 Jul 11 16:58 file.txt.dec
-rw-r--r-- 1 waldner users 512 Jul 11 16:57 file.txt.enc
-rw-r--r-- 1 waldner users   6 Jul 11 16:52 file.txt.orig
$ od -c file.txt.dec 
0000000   h   e   l   l   o  \n  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0000020  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
*
0001000

Some file formats can tolerate garbage at the end (eg tar), other can't, so this is something to keep into account when using aespipe. In the cases where the original size is known, it may be possible to postprocess the decrypted file to remove the padding but this may not always be practical:

$ origsize=$(wc -c < file.txt.orig)
$ truncate -s "$origsize" file.txt.dec
# alternatively
$ dd if=file.txt.dec bs="$origsize" count=1 > file.txt.likeorig

In the cases where the exact byte size is needed and no postprocessing is possible or wanted, another tool should be used (eg gpg or openssl).

Ok, so let's now see how to use an encrypted keyfile with aespipe. The file should be encrypted with GPG, which in turn can do symmetric encryption (as previously seen in this same article) or public-key encryption (using a public/private key pair, which should be already generated and available - not covered here).
Let's encrypt our keys.txt file both with symmetric and public-key encryption (separately)

# using symmetric encryption
$ gpg --symmetric --output keys.enc.sym keys.txt
# enter passphrase, or use some --passphrase* option to specify one

# using public key encryption
$ gpg --encrypt --recipient 199705C4 --output keys.enc.pubk keys.txt
# no passphrase is required, as only the public key is used to encrypt
# here "199705C4" is the id of the (public) key

Now, we want to encrypt or decrypt some file using the keys contained in our password-protected keyfile(s). This is done using the -K option (instead of -P) to aespipe. Let's start with the symmetrically enctrypted keyfile (keys.enc.sym):

# encrypt
$ aespipe -e aes256 -K keys.enc.sym < plain.txt > encrypted.enc
# aespipe prompts for the gpg passphrase to decrypt the keyfile

# decrypt
$ aespipe -e aes256 -K keys.enc.sym < encrypted.enc > plain.txt
# same thing, passphrase for keyfile is prompted

Now with the public-key encrypted keyfile:

# encrypt
$ aespipe -e aes256 -K keys.enc.pubk < plain.txt > encrypted.enc
# to decrypt keys.enc.pubk, the private key is needed, 
# aespipe prompts for the passphrase to unlock the private key

# decrypt
$ aespipe -e aes256 -K keys.enc.pubk < encrypted.enc > plain.txt
# same thing, passphrase to unlock the private key is prompted

So far, nothing special. However, for this last case (keyfile encrypted with public key cryptography), aespipe can actually use gpg-agent (if it's running) to obtain the passphrase needed to unlock the private key. This is done with the -A option, which tells aespipe the path to the socket where gpg-agent is listening. Assuming gpg-agent has already seen the passphrase to unlock the private key, it can transmit it to aespipe.

# The gpg-agent socket information is in the GPG_AGENT_INFO environment variable
# in the session where the agent is running, or one to which the variable has been exported. For example:
$ echo "$GPG_AGENT_INFO"
/tmp/gpg-gXM3Pm/S.gpg-agent:4897:1
# encrypt using a public-key encrypted keyfile, but tell aespipe to ask gpg-agent for the passphrase
$ aespipe -e aes256 -A "$GPG_AGENT_INFO" -K keys.enc.pubk < plain.txt > encrypted.enc
# similar for decryption

Other utilities

Let's have a look at some other utilities that are simpler but lack the flexibility provided by the previous ones.

mcrypt

This seems to be almost unusable, as doing practically anything beyond simple, optionless encryption produces a message like

Signal 11 caught. Exiting.

so it doesn't seem to be a good candidate for serious use. Some research shows many user in the same situation. More information is welcome.

aescrypt

This is a little-known program, however aescrypt is open source and very simple to use. It is multiplatform and has even a GUI for graphical operation. Here, however, we'll use the command-line version.

# encrypt a file
$ aescrypt -e -p passphrase file.txt
# creates file.txt.aes

# decrypt a file
$ aescrypt -d -p passphrase file.txt.aes
# creates file.txt

# encrypt standard input
$ tar -czvf - file1 file2 ... | aescrypt -e -p passphrase - -o archive.tar.gz.aes

# decrypt to stdout
$ aescrypt -d -p passphrase -o - archive.tar.gz.aes | tar -xzvf -

If no -p option is specified, aescrypt interactively prompts for the passphrase.
If no -o option is specified, a file with the same name and the .aes suffix is created when encrypting, and one with the .aes suffix removed when decrypting.

Since putting passwords directly on the command line is bad, it is possible to put the passphrase in a file and tell aescrypt to read it from the file. However, the file is not a simple text file; it has to be in a format that aescrypt recognizes. To create it, the documentation suggests using the aescrypt_keygen utility as follows:

$ aescrypt_keygen -p somesupercomplexpassphrase keyfile.key

The aescrypt_keygen program is only available in the source code package and not in the binary one (at least in the Linux version). However, since this file, according to the documentation, is nothing more than the UTF-16 encoding of the passphrase string, it's easy to produce the same result without the dedicated utility:

# generate keyfile
$ echo somesupercomplexpassphrase | iconv -f ascii -t utf-16 > keyfile.key

Once we have a keyfile, we can encrypt/decrypt using it:

$ aescrypt -e -k keyfile.key file.txt
# etc.
ccrypt

The ccrypt utility is another easy-to-use encryption program that implements the AES(256) algorithm. Be sure to read the man page and the FAQ.

Warning: when not reading from standard input, ccrypt overwrites the source file with the result of the encryption or decryption. This means that, if the encryption process is interrupted, a file could be left in an only partially encrypted state. On the other hand, when encrypting standard input this (obviously) does not happen. Sample usage:

# encrypt a file; overwrites the unencrypted version, creates file.txt.cpt
$ ccrypt -e file.txt

# decrypt a file; overwrites the encrypted version, creates file.txt
$ ccrypt -d file.txt.cpt

In this mode, multiple file arguments can be specified, and they will all be encrypted/decrypted. It is possible to recursively encrypt files contained in subdirectories if the -r/--recursive option is specified.

If no files are specified, ccrypt operates like a pipe:

# Encrypt standard input (example)
$ tar -czvf - file1 file2 ... | ccrypt -e > archive.tar.gz.enc
# Decrypt to stdout (example)
$ ccrypt -d < archive.tar.gz.enc | tar -xzvf -

To use the command non-interactively, it is possible to specify the passphrase in different ways:

  • -K|--key passphrase: directly in the command (not recommended)
  • -E|--envvar var: the passphrase is the content of environment variable $var

A useful option might be the -x|--keychange, which allows changing the passphrase of an already encrypted file; the old and new passphrases are prompted - or specified on the command line with -K/-H (--key/--key2) or -E/-F (--envvar/--envvar2) respectively, the file is decrypted with the old passphrase and reencrypted with the new one.

7-zip

The compression/archiving utility 7-zip can apparently do AES256 encryption, deriving the encryption key from the passphrase specified by the user with the -p option:

# encrypt/archive, prompt for passphrase
$ 7z a -p archive.enc.7z file1 file2 ...

# encrypt/archive, passphrase on the command line
$ 7z a -ppassphrase archive.enc.7z file1 file2 ...

# encrypt/archive standard input (prompt for passphrase)
$ tar -cvf - file1 file2 ... | 7z a -si -p archive.enc.tar.7z

# decrypt/extract, prompt for passphrase
$ 7z x -p archive.enc.7z [ file1 file2 ... ]

# decrypt/extract, passphrase on the command line
$ 7z x -ppassphrase archive.enc.7z [ file1 file2 ... ]

# decrypt/extract to stdout (prompt for passphrase)
$ 7z x -so -p archive.enc.tar.7z | tar -xzvf -

It looks like there's no way to run in batch (ie, non-interactive) mode without explicitly specifying the passphrase on the command line.

A simple SameGame implementation

Writing a game is a good way to learn and/or practice a new language, so here it is: a samegame implementation written in Python using the good pygame library.

Link to download: same.py. The only dependency is the pygame library. The game should work with python2 and python3, on all the platforms where python is available (tested on Linux and Windows).

Screenshot:

same-screenshot

Running the program with -h or --help shows the supported options:

$ ./same.py -h
Usage:
same.py [ -h|--help ]
same.py [ -l|--load f ] [ -g|--gameid n ] [ -c|--colors n ] [ -s|--cellsize n ] [ -x|--cols n ] [ -y|--rows n ]

-h|--help        : this help
-l|--load f      : load saved game from file "f" (disables all options)
-g|--gameid n    : play game #n (default: random betwen 0 and 100000)
-c|--colors n    : use "n" colors (default: 5)
-s|--cellsize n  : force a cellsize of "n" pixels (default: 30)
-x|--cols n      : force "n" columns (default: 17)
-y|--rows n      : force "n" rows (default: 15)

During the game, the following keybindings are supported:

u       undo move
ctrl-r  redo move
r       restart current game (same number)
n       start new game (different number)
q/ESC   exit the game
ctrl-s  save the current state of the game (for later retrieval with --load)
1-3     change the color scheme
a       toggle highlighting of current cell group

Some random notes:

  • At any time, the current state of the game is held in a big dictionary called gameinfo. When saving the game, this data structure is serialized to a file using version 2 of the pickle protocol (so it can be read both from python2 and python3). Games are saved in the current directory using a filename like "_samepy.64622-20.sav" where the two numbers indicate respectively the game number and the current move at the time of saving.
    It would have been nice to use some more standard format (eg JSON), but the data structures used here cannot be serialized into JSON (eg dictionaries with tuples as keys). (Ok, I cheated: with some work it is in fact possible using custom encoders and decoders, but here it's probably not worth the effort.)
  • It is possible to override the default values for cellsize, rows and column, even all three at the same time (within reason). If overriding one or more of these values results in too small/big cells, or too few/many rows or columns, an error is printed.
  • The game number that can be specified with -g is used to seed the random number generator before (randomly) populating the game board, so, on the same machine and with the same python version, the same number will always produce the same game layout. If the python major version changes, that is no longer true: game #100 with python2 is different from game #100 with python3. It might even change between, say, python3.3 and python3.4 (although it seems not to), or when using the same python version on different machines; more information is welcome, as usual.
  • By default 5 colors are used; this can be changed with the -c command line switch. The fewer the colors, the easier it is to solve the game; with two colors success is practically certain. There are three different palettes (ie, color schemes), that can be activated during the game with the keys 1-3. If you don't like them (I don't like them too much, but I'm also too lazy), or want to add more palettes, it's easy to find the place in the code where they can be changed.
  • There is more than one way to keep game history for undo/redo purposes. One could just remember the moves made by the player (ie the groups of cells that were removed at each turn), and upon undo/redo go backwards/forwards in this history, each time readding/removing a group of cells and recalculating the resulting board after the insertion or removal. This needs little memory to save the game history, but needs some calculation for each undo and redo. It's true that one of those two functions (the one that removes the cells) must be written anyway, to allow the player to actually play.
    However, the approach followed here is to separately save each board layout in sequence, and designate one of those states as "current" using an index into the sequence. This way, undo and redo are as simple as updating this index to point to the previous or the next saved state respectively (ie, subtracting or adding 1 to it). Restarting the game is (almost) just a matter of setting the pointer to move 0.
    So undo/redo/restart are very simple, but more memory is used to store all the information (this is also apparent by the size of the serialized saved game).
    In retrospective, if I were to rewrite it, I wold probably use the first approach.
  • The scoring system is quite simple: removing a group of N cells scores N^2 points. This differs slightly from other implementations of the game.
  • For some reason, the game is slow on machines with few resources. The highlighting of the current cell group, for example, has a certain lag, and so has the removal of cells following a click. It is possible to toggle highlighting on/off using the a key during the game, which makes it a bit better. The algorithms are certainly not optimal, however I think that alone doesn't explain these delays. Is it really all redrawing overhead? More info welcome.