Skip to content

Many ways to encrypt passwords

Specifically, using crypt(3). One typical use case is, you have a plaintext password and need the damn full thing to put into the /etc/shadow file, which nowadays is usually something like:

$5$sOmEsAlT$pKHkGjoFXUgvUv.UYQuekdpjoZx7mqXlIlKJj6abik7   # sha-256


$6$sOmEsAlT$F3DN61SEKPHtTeIzgzyLe.rpctiym/qxz5xQz9YM.PyTdH7R13ZDXj6sDMeZg5wklbYJYSqDBXcH4UnAWQrRN0   # sha-512

The input to the crypt(3) library function is a cleartext password and a salt. Here we assume the salt is provided, but it's easy to generate a random one (at least one that's "good enough").

In the case of sha-256 and sha-512 hashes (identified respectively by the $5$ and $6$ in the first field, which are also the only ones supported by Linux along with the old md5 which uses code $1$) the salt can be augmented by prepending the rounds=<N>$ directive, to change the default number of rounds used by the algorithm, which is 5000. So for example we could supply a salt like


and thus use 500000 rounds (this is called stretching and is used to make brute force attacks harder). If the rounds= argument is specified, the output of crypt() includes it as well, since its value must be known every time the hash is recalculated.

It seems there's no utility to directly get the hash string (there used to be a crypt(1) command which however had troubles related to the export of cryptographic software, which made it so weak that many distros stopped shipping it). So we'll have to find some command that calls the crypto(3) function.

In the following examples, we assume the algorithm number, the salt and the password are stored in the shell variables $alg, $salt and $password respectively:


This way, the code doesn't hardcode anything and can be reused.


$ perl -e 'print crypt($ARGV[1], "\$" . $ARGV[0] . "\$" . $ARGV[2]), "\n";' "$alg" "$password" "$salt"


# python 2/3
$ python -c 'import crypt; import sys; print (crypt.crypt(sys.argv[2],"$" + sys.argv[1] + "$" + sys.argv[3]))' "$alg" "$password" "$salt"


Yes, MySQL has a built-in function that uses crypt(3):

$ mysql -B -N -e "select encrypt('$password', '\$$alg\$$salt');" 

Obviously, extra care should be taken with this one if $password or $salt contain quotes or other characters that are special to MySQL.


$ php -r 'echo crypt($argv[2], "\$" . $argv[1] . "\$" . $argv[3]) . "\n";' "$alg" "$password" "$salt"


$ ruby -e 'puts ARGV[1].crypt("$" + ARGV[0] + "$" + ARGV[2]);' "$alg" "$password" "$salt"


This utility comes with the whois package (at least in Debian). Here it's better to introduce another separate variable to hold the number of rounds:

# password as before

(and of course the other examples can be adapted to use the three variables instead of two). Then it can be used as follows:

$ mkpasswd -m sha-512 -R "$rounds" -S "$salt" "$password"

If using the standard number of rounds the -R option can be omitted, of course. Here the algorithm is specified by name, so the $alg variable is not used.

Some notes on macvlan/macvtap

There's not a lot of documentation about these interfaces. Here are some notes to summarize what I've been able to gather so far. Surely there's more to it (corrections and/or more information welcome).


Macvlan interfaces can be seen as subinterfaces of a main ethernet interface. Each macvlan interface has its own MAC address (different from that of the main interface) and can be assigned IP addresses just like a normal interface.

So with this it's possible to have multiple IP addresses, each with its own MAC address, on the same physical interface. Applications can then bind specifically to the IP address assigned to a macvlan interface, for example. The physical interface to which the macvlan is attached is often referred to as "the lower device" or "the upper device"; here we'll use the term "lower device".

The main use of macvlan seems to be container virtualization (for example LXC guests can be configured to use a macvlan for their networking and the macvlan interface is moved to the container's namespace), but there are other scenarios, mostly very specific cases, like using virtual MAC addresses (see for example this keepalived feature).

A macvlan interface can work in one of four modes, defined at creation time.

  • VEPA (Virtual Ethernet Port Aggregator) is the default mode. If the lower device receives data from a macvlan in VEPA mode, this data is always sent "out" to the upstream switch or bridge, even if it's destined for another macvlan in the same lower device. Since macvlans are almost always assigned to virtual machines or containers, this makes it possible to see and manage inter-VM traffic on a real external switch (whereas with normal bridging it would not leave the hypervisor), with all the features provided by a "real" switch. However, at the same time this implies that, for VMs to be able to communicate, the external switch should send back inter-VM traffic to the hypervisor out of the same interface it was received from, something that is normally prevented from happening by STP. This feature (the so-called "hairpin mode" or "reflective relay") isn't widely supported yet, which means that if using VEPA mode with an ordinary switch, inter-VM traffic leaves the hypervisor but never comes back (unless it's sent back at the IP level by a router somewhere, but then there's nothing special about that, it has always worked that way).
    Since there are few switches supporting hairpin mode, VEPA mode isn't used all that much yet. However it's worth mentioning that Linux's own internal bridge implementation does support hairpin mode in recent versions; assuming eth0 is a port of br0, hairpin mode can be anabled by doing

    # echo 1 > /sys/class/net/br0/brif/eth0/hairpin_mode

    or using a recent version of brctl:

    # brctl hairpin br0 eth0 on

    or even better, using the bridge program that comes with recent versions of iproute2:

    # bridge link set dev eth0 hairpin on

    So a Linux box could very well be used in the role of "external switch" as mentioned above.

  • Bridge mode: this works almost like a traditional bridge, in that data received on a macvlan in bridge mode and destined for another macvlan of the same lower device is sent directly to the target (if the target macvlan is also in bridge mode), rather than being sent outside. This of course works well with non-hairpin switches, and inter-VM traffic has better performance than VEPA mode, since the external round-trip is avoided. In the words of a kernel developer,

    The macvlan is a trivial bridge that doesn't need to do learning as it
    knows every mac address it can receive, so it doesn't need to implement
    learning or stp. Which makes it simple stupid and and fast.

  • Private mode: this is essentially like VEPA mode, but with the added feature that no macvlans on the same lower device can communicate, regardless of where the packets come from (so even if inter-VM traffic is sent back by a hairpin switch or an IP router, the target macvlan is prevented from receiving it). I haven't tried, but I suppose that it is the operating mode of the target macvlan that determines whether it receives the traffic or not. This mode is useful, of course, if we really want macvlan isolation.
  • Passthru mode: this mode was added later, to work around some limitation of macvlans (more details here). I'm not 100% clear on what's the problem passthru mode tries to solve, as I was able to set promiscuous mode, create bridges, vlans and sub-macv{lan,tap} interfaces in KVM guests using a plain macvtap in VEPA mode for their networking (so no need for passthru). Since I'm surely missing something, more information (as usual) is welcome.

VEPA, bridged and private mode come from a standard called EVB (edge virtual bridging); a good article which provide more information can be found here.

Curiously (at least, in the case of the three original operating modes), the operating mode is per-macvlan interface rather than global (per-physical device); I guess that it's then more or less mandatory to configure all the macvlans of the same lower device to operate in the same mode, or at least match the macvlan modes so that only intended inter-VM traffic is possible; not sure what would happen, for instance, if a macvlan using VEPA mode tries to communicate with another one using bridge mode, or viceversa. This may well be worth investigating.

Irrespective of the mode used for the macvlan, there's no connectivity from whatever uses the macvlan (eg a container) to the lower device. This is by design, and is due to the the way macvlan interfaces "hook into" their physical interface. If communication with the host is needed, the solution is kind of easy: just create another macvlan on the host on the same lower device, and use this to communicate with the guest.

The documentation of iproute2 about setting operating mode for macvlans isn't complete, since neither "ip link help" nor the man pages mention how to do that. Fumbling around a bit, it can be seen that the syntax is

# ip link add link eth2 macvlan2 type macvlan mode aaa    # hit enter here to force an error message
Error: argument of "mode" must be "private", "vepa", "bridge" or "passthru"

Even more undocumented (if possible) is the way to show the operating mode of a macvlan, which turns out to be

# ip -d link show macvlan2
27: macvlan2@eth2: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT 
    link/ether 26:8a:3c:07:7d:f4 brd ff:ff:ff:ff:ff:ff
    macvlan  mode vepa 

Let's hope that all this appears in the documentation soon.

The MAC address of the macvlan is normally autogenerated; to explicitly specify one, the following syntax can be used (which also specifies custom name and operating mode at the same time):

# ip link add link eth2 FOOMACVLAN address 56:61:4f:7c:77:db type macvlan mode bridge

Final note, it's also possible to create a macvlan interface and bridge it (eg brctl addif br0 macvlan2); though it's a bit weird, it does work fine.

macvtap interfaces

A macvtap is a virtual interfaces based on macvlan (thus tied to another interface) vaguely similar (not much in fact) to a regular tap interface. A macvtap interface is similar to a normal tap interface in that a program can attach to it and read/write frames. However, the similarities end here. The most prominent user of macvtap interfaces seems to be libvirt/KVM, which allows guests to be connected to macvtap interfaces. Doing so allows for (almost) bridged-like behavior of guests but without the need to have a real bridge on the host, as a regular ethernet interface can be used as the macvtap's lower device.

Some notes about macvtap (more information is always welcome):

  • Since it's based on macvlan, macvtap shares the same operating modes it can be in (VEPA, bridge, private and passthru)
  • Similarly, a guest using a macvatp interface cannot communicate directly with its lower device in the host. In fact, if you run tcpdump on the macvtap interface on the host, no traffic will be seen. Again this is by design, but can be surprising. This link has some details and suggests workarounds for KVM in case this functionality is needed. A quick workaround is to create a macvlan (not macvtap) interface on the host, which will then be visible from the guests. (On a side note, this is also a way to use routed mode for the macvtap guests: put the host's macvlan and all guests on the same IP subnet, configure the guests to use the host macvlan's IP as their default gateway, and have the host do NAT between the macvlan and the physical interface. But then, in this case, it's probably easier to use a real bridge).
  • Creation of a macvtap interface is not done by opening /dev/net/tun; instead, it looks like the only way to create one is to directly send appropriate messages to the kernel via a netlink socket (at least, that's how iproute2 and libvirt do it; strace and/or the source will show the details, as there seems to be no documentation whatsoever). This makes it a bit more complicated than a normal tun/tap interface.
  • macvtap interfaces are persistent by default. Once the macvtap interface has been created via netlink, an actual chracter device file appears under /dev (this does not happen with normal tap interfaces), The device file is called /dev/tapNN, where NN is the interface index of the macvtap (can be seen for example with "ip link show"). It's this device file that has to be opened by programs wanting to use the interface (eg libvirtd/qemu to connect a guest).
  • One consequence of there being an actual device file for the macvtap interface is that traffic entering the interface can be seen and "stolen" to the intended recipient by simply reading from the device file; doing "cat /dev/tap22" (for example) while a guest VM is using it dumps the raw ehernet frames and prevents the VM from seeing them. On the other hand, neither seeing outgoing traffic nor injecting frames by writing to the device file from the outside seem to be possible.
  • If a VM is connected to the macvtap, the MAC address of the macvtap interface as seen on the host is the same that is seen by the guest; this is different from regular tap interfaces, where the guest is somehow "behind" the tap interface (the vnetX interfaces on the host have a MAC address which is not the same that the guest uses).
  • All traffic for guests connected to a macvtap does show up if running tcpdump on the lower device, even in bridge mode and for guest-to-guest traffic. However, as said, tcpdump (on the host) on the macvtap device itself shows no traffic.
  • If the lower device is a wireless card, macvtap doesn't work (the guest is isolated, nothing enters, nothing exits). Perhaps it's just that it only works with some wireless cards, and I happened to have one that doesn't work. Again, I could not find more information.

As said, creating a macvtap interface via code is a bit complicated, but luckily iproute2 can do it on the command line. To create a macvtap interface called macvtap2, with eth2 as its lower physical interface:

# ip link add link eth2 macvtap2 address 00:22:33:44:55:66 type macvtap mode bridge
# ip link set macvtap2 up
# ip link show macvtap2
18: macvtap2@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN mode DEFAULT qlen 500
    link/ether 00:22:33:44:55:66 brd ff:ff:ff:ff:ff:ff
# ls -l /dev/tap18 
crw------- 1 root root 250, 1 May 26 10:51 /dev/tap18

To delete the interface, the usual command can be used:

# ip link del macvtap2

Two links which provide good information about macvtap:

Smart ranges in sed

Since there seem to be still quite a few people who want to do this with sed...let's see how to select ranges of lines in the same way as with awk (explained here).

We should also avoid the same issue described there, that is, if other /BEGIN/ lines are found while we are inside a range, those lines should be printed. So with this input:

2 foo
3 bar
5 baz

at least lines 2 to 5 should be printed (line 1, or 6, or both may also be printed, depending on whether and which range endpoint we are including/excluding).

We're going to assume a sed with ERE (-E) support (as should be the norm these days anyway).

From BEGIN to END, inclusive

This is obviously the easy one:

# print lines from /BEGIN/ to /END/, inclusive
$ sed '/BEGIN/,/END/!d'
$ sed -n '/BEGIN/,/END/p'

No mysteries here. Let's get to the interesting cases.

From BEGIN to END, excluding END

# print lines from /BEGIN/ to /END/, excluding /END/
$ sed '/BEGIN/!d; :loop; n; /END/d; $!bloop'

We start a loop when we see a /BEGIN/, and keep looping until we see an /END/, at which point we delete the line so it's not printed.

From BEGIN to END, excluding BEGIN

# print lines from /BEGIN/ to /END/, excluding /BEGIN/
$ sed -E '/BEGIN/!d; :loop; N; /END/{ s/^[^\n]*\n//; p; d;}; $!bloop'

Same loop, but the lines are accumulated in the pattern space, and the first of them is removed before printing the whole block (note that the "D" command cannot be used for that purpose here, as it starts a new cycle).

From BEGIN to END, not inclusive

This is of course just a small variation on the preceding one, in that we delete both the first and the last line:

# print lines from /BEGIN/ to /END/, excluding both lines
$ sed -E '/BEGIN/!d; :loop; N; /END/{ s/^[^\n]*\n//; s/\n?[^\n]*$//; /./p; d;}; $!bloop'

Since we're excluding both the start and the end line, what's left after removing them may be empty, so we check that there's at least one character left and we only print the pattern space if that is the case.

For anything more complex, just use awk!

Pulling out strings

This is a generic text-processing need that often occurs in different kinds of scripts. Simply put, you want to get a list of the strings in the file (or files) that match a certain pattern. Let's use this simple file as an example:


Our pattern is (using ERE syntax) "foobar[0-9]+", that is, "foobar" followed by any number of digits. We will refine it a bit later.

Using common shell tools, we have several possibilities.

GNU grep

Probably the simplest one, if GNU grep is available, is to use its -o option, to return only the part of the input that matches the pattern, so:

$ grep -Eo 'foobar[0-9]+' test.txt

As said, this needs GNU grep due to the -o option.

GNU awk and BusyBox awk

These two awk implementations support, as a non-standard extension, the assignment of a regular expression to RS, and make whatever matched RS available in the special variable RT (mawk seems to support the former feature, but not the latter, which make it unsuitable to be used in the way we describe here). So here's how to use these awks for the task:

$ gawk -v RS='foobar[0-9]+' 'RT{print RT}' test.txt

Note that using RS/RT this way allows to match patterns that contain newlines, something that's not easily achieved with other tools (except Perl, see below).

These methods are easy and quick; however, if none of the above implementations is available, we need to use something more standard.

Standard awk

With standard awk, a way to extract all occurrences is to use a loop over each line, repeatedly using match():

$ cat matches.awk
  line = $0
  while (match(line, /foobar[0-9]+/) > 0) {
    print substr(line, RSTART, RLENGTH)
    line = substr(line, RSTART + RLENGTH)
$ awk -f matches.awk test.txt

Here the original line is saved (in case it's needed for further processing) and a copy is used to find matches. Since match() only finds the first match in the string, when a match is found it's removed so running match() again can find the following occurrence (if any). For this reason, the above code will loop forever if it's given a pattern that can match the empty string, like for example a*. When you do that, you really want a+ instead anyway, so use the latter. The code above is a common awk idiom to find all matches of a pattern.


With sed the task is a bit complicated. Basically, we need to somehow "mark" the parts of the data that match our pattern, so we can later delete everything that's not between markers, leaving thus only what we want.

A safe character to use as marker is the newline character (\n), since sed guarantees that, under normal conditions, no input line as seen in the pattern space will contain that character. For the first of the following solutions to work, a sed implementation that recognize \n in the RHS and the special bracket expression [^\n] (any character except \n) is needed. And since our pattern is a ERE (though it could be rewritten as BRE), we need a sed that recognizes EREs. GNU sed has all these features, and we're going to assume it in the examples.

That said, let's see a couple of ways to solve the task with sed.

One somewhat laborious solution is as follows:

$ sed -E '
t ok
s/(foobar[0-9]+)[^\n]*/\1/g' test.txt

Here we prepend a \n to each match, then delete what's before the very first match in the line (zero or more non-\n followed by a \n at the beginning of the string). Finally we delete all the parts between matches, which leaves us with just the matches, nicely separated by \n characters.

Another approach to the problem is implemented with the following code (which also has the benefit of using standard syntax; changing the ERE into BRE (foobar[0-9][0-9]*) and converting all the "\n" in the RHS to literal escaped newlines would allow this solution to be used with a standard sed):

$ sed -E '
D' test.txt

Here the approach is to "isolate" each match with a \n before and one after (if the pattern space doesn't already have one). If the line begins with a match, it's printed with "P" (up to the following \n, which is what we want). Regardless, the part up to and including the first \n is deleted (with "D"). If something is left, go to the beginning to do the previous steps again, until the whole pattern space is entirely consumed. If there were no matches in the original line, "D" will just delete it entirely and start a new cycle. Rinse and repeat for every input line.


With perl we can do it pretty easily thanks to its powerful regular expression matching operators:

$ perl -ne 'print "$_\n" for (/foobar\d+/g);' test.txt

If the pattern we want has newlines in it, we can just tell perl to slurp the file with perl -n000e and we're set.

Context comes to town

All the solutions seen so far strictly match a pattern, regardless of where it appears. In other words, they ignore the context of the matches. However there may be cases where this is important. In our example input data, we might want to match foobar[0-9]+ only if it's delimited, where "delimited" here is defined as "preceded by either a hash (#) or beginning of line, and followed by either a hash or end of line". Clearly, with this new requirements we don't want the foobar12 in the last line.

We thus need to consider the context in the regular expressions, making them include a larger text, so that matches only happen where there's data that we want; however, since the matched text will now be larger than what we need, we need to subsequently "clean up" the match, extracting only what we want from it. Our regular expression becomes now (ERE syntax)


Let's see how to modify the previous solutions to work with context.

GNU grep

Grep can't really edit text, so it would seem like it's out of the discussion here, but with a silly trick we can still use it:

$ grep -Eo '(^|#)foobar[0-9]+(#|$)' test.txt | grep -Eo 'foobar[0-9]+'

The first grep prints all matches with their context, and the second one, operating only on the good data, strictly "extracts" the matches that we need.

GNU awk and BusyBox awk

Setting RS to a non-default value obviously causes awk to stop working in line-oriented mode, so the beginning of line and end-of line anchors in our regular expression need to be augmented to consider the newline character.

Now, with the extended RS, RT will contain the full match with context, so we use gsub() to clean it up:

$ gawk -v RS='(^|#|\n)foobar[0-9]+(#|\n|$)' 'RT{gsub(/^(#|\n)|(#|\n)$/, "", RT); print RT}' test.txt

The critical part here is obviously the gsub(), which should be written carefully to remove the context stuff and only leave what we want.

Standard awk

Here we don't change RS so we're using the traditional line-oriented mode:

$ cat matches2.awk
  line = $0
  while (match(line, /(^|#)foobar[0-9]+(#|$)/)>0) {
    m = substr(line, RSTART, RLENGTH)
    gsub(/^#|#$/, "", m); print m
    line = substr(line, RSTART + RLENGTH)
$ awk -f matches2.awk test.txt


Things start to get complicated with sed if we want context. However we can still do it.

Of the two sed solutions presented previously, the easiest to adapt is the second one, so here it is:

$ sed -E '
/^#?foobar[0-9]+#?\n/ {
D' test.txt

Again, the critical bit is the part where the context (that we needed to match only the "correct" parts, but no longer want) is removed. This part will be highly dependent on the actual input data and problem requirements.


Perl is again an easy winner, as we can match with context and pull out only the interesting parts in a single go:

$ perl -ne 'print "$_\n" for (/(?:^|#)(foobar\d+)(?:#|$)/g);' test.txt

The regular expressions for what comes before and after are non-capturing, so the list returned byt the overall match is already made of clean strings, which we thus just need to print.

Overlap problems

You might have noticed that at the same time we introduced context to the matches, we also introduced the potential for overlap. Consider the following sample input data:


If we run for example the above GNU awk solution on this data, we get:

$ gawk -v RS='(^|#|\n)foobar[0-9]+(#|\n|$)' 'RT{gsub(/^(#|\n)|(#|\n)$/, "", RT); print RT}' test.txt

The foobar9999 is missed since the regular expression that matches foobar3 also "consumes" its surrounding context (the leading and trailing hash) and thus applying the regex with context again on what's left fails to match the second occurrence of the pattern.

However, this does not happen with all the solutions; only with some of them. The standard awk and the sed solutions still work since the previous match is deleted from the line, and the extended pattern we use to include context works if the match is at the beginning of a line without a delimiter, too. In the example, once #foobar3# has been matched and removed what's left is "^foobar9999#blah$", and the expression we're using for the match can still match again it since the pattern is at the very beginning and ^ is a possible anchor.
Of course, this happens to work because of the specific combination of input data and regular expressions that we're using; generally speaking, this doesn't have to be the case. It will depend on the actual situation.

The modern RE engine answer to safely solve the overlapping context problem is, naturally, lookaround, which turns actual consumed characters into zero-length assertions, and leaves them available for the next match attempt. This means that sed and awk are excluded, since their RE engines do not support lookaround.

What's left is GNU grep (with its -P option to match in PCRE mode, where available), and of course perl.


$ grep -Po '(?<=^|#)foobar[0-9]+(?=#|$)' test2.txt

There's also a pcregrep utility that comes with the PCRE library, with a syntax similar to that of grep. In particular, it supports the -o option, se we can also do:

$ pcregrep -o '(?<=^|#)foobar[0-9]+(?=#|$)' test2.txt

Let's try perl:

$ perl -ne 'print "$_\n" for (/(?<=^|#)(foobar\d+)(?=#|$)/g);' test2.txt
Variable length lookbehind not implemented in regex m/(?<=^|#)(foobar\d+)(?=#|$)/ at -e line 1. seems PCRE is more advanced than perl itself in this particular feature. As man pcrepattern informs us,

The contents of a lookbehind assertion are restricted such that all the strings it matches must have a fixed length. However, if there are several top-level alternatives, they do not all have to have the same fixed length. Thus


is permitted, but


causes an error at compile time. Branches that match different length strings are permitted only at the top level of a lookbehind assertion. This is an extension compared with Perl, which requires all branches to match the same length of string. An assertion such as


is not permitted, because its single top-level branch can match two different lengths, but it is acceptable to PCRE if rewritten to use two top-level branches:


So what can we do with perl? We have two possibilities.

We note that, strictly speaking, and in this particular case, only what follows the match has to be preserved for the next attempt; the lookbehind is not strictly needed, and we can replace it with a regular match. Thus:

$ perl -ne 'print "$_\n" for (/(?:^|#)(foobar\d+)(?=#|$)/g);' test2.txt

Another way to solve the problem is a bit ugly, but it works: we can just move the ^ anchor outside the lookbehind and make it part of a regular alternation; since it's a zero-length match anyway, nothing is harmed:

$ perl -ne 'print "$_\n" for (/(?:^|(?<=#))(foobar\d+)(?=#|$)/g);' test2.txt

It is important to understand that there's no generic rule here, and the solution will necessarily have to depend on the problem at hand. Depending on the actual situation, transforming a variable-length lookbehind into something accepted by perl may not always be so easy (or even possible).

Diskless iSCSI boot with PXE HOWTO

Here we will boot a machine (diskless or not, but even if it has a disk it won't be used) entirely from the network using PXE and the iSCSI protocol.

There are a few options to boot a system whose root partition is on iSCSI:

  • The machine could have a local bootloader that loads a local kernel and initrd. With suitable options, the initrd scripts are directed to log into an iSCSI LUN and use it as /. In this case, the LUN that is used as root filesystem does not need to have a kernel or bootloader installed.
  • Same as above, but the kernel and initrd are downloaded using PXE (via TFTP or HTTP).
  • The most interesting option (and the one that will be described here) is booting directly the iSCSI LUN via PXE. In this case, the LUN looks exactly like a local disk, with partitions, MBR, bootloader (grub) etc. The MBR is read and executed, which loads the second-stage bootloader and so on, just as if the disk were local.

A peculiar thing about iSCSI is that it doesn't really like the network going away while a session is connected. For this reason it is very important that the network be stable and reliable, but there are also a few specific boot-time tweaks to do in the Linux distribution that is being run from iSCSI. One of them is, of course, supplying the needed iSCSI information to the kernel; another one is preventing the initscripts from trying to (re)configure the network on the interface that is being used for the iSCSI session, as this may cause it to go down temporarily. In this case, the network is configured early, by the initrd, and should not be touched afterwards.

For this example, we will boot a Debian Wheezy over iSCSI, using PXE to read the LUN right from the very beginning (MBR and bootloader stage). For this to work, a PXE implementation that supports booting from iSCSI is obviously needed. iPXE is one such implementation (see here for more information on how to setup a more complete PXE infrastructure); here we will assume that the booting client is sent iPXE commands.


Debian does not (yet?) support direct installation to iSCSI, so there are two ways to do this: the first way is to transfer an existing installation to the LUN (eg using dd or rsync). The second (described here) is to use debootsrap on an existing helper machine to partition, install and prepare the LUN. The specific tweaks described starting from "iSCSI boot configuration" have to be performed regardless of whether it's an existing or a new install (if it's an existing installation, remember to chroot into it before).

When commands are shown, the prompt shows where they have to be run: "helper" is the helper machine, "client" is the chroot environment (ie the future iSCSI boot client).

Log into the LUN

We assume that our LUN is provided by the SAN at (, is called and has a size of 10G. So from a (possibly Debian or Ubuntu) machine with open-iscsi installed, we can log into it:

helper# iscsiadm -m discovery -t sendtargets -p,1
helper# iscsiadm -m node -T '' -p -l
Logging in to [iface: default, target:, portal:,3260] (multiple)
Login to [iface: default, target:, portal:,3260] successful.
helper# ls -l /dev/disk/by-path
lrwxrwxrwx 1 root root  9 Nov  2 15:03 -> ../../sda

To make things more interesting (not much), we're going to use the newer GPT partitioning. For simplicity, here we'll create a 512MB swap partition and a 9.5G root partition. On BIOS systems, which are still the majority, GPT also needs a small partition at the beginning of the disk, the so-called "BIOS boot partition" (type EF02). See here, here and here for more info (all three documents are very interesting reads). So here's the disk layout:

helper# gdisk -l /dev/sda
GPT fdisk (gdisk) version 0.8.5

Partition table scan:
  MBR: protective
  BSD: not present
  APM: not present
  GPT: present

Found valid GPT with protective MBR; using GPT.
Disk /dev/sda: 20971520 sectors, 10.0 GiB
Logical sector size: 512 bytes
Disk identifier (GUID): 67D92849-CD16-4CB1-8B3B-0758E62227CA
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 20971486
Partitions will be aligned on 2048-sector boundaries
Total free space is 2014 sectors (1007.0 KiB)

Number  Start (sector)    End (sector)  Size       Code  Name
   1            2048            8191   3.0 MiB     EF02  BIOS boot partition
   2            8192         1056767   512.0 MiB   8200  Linux swap
   3         1056768        20971486   9.5 GiB     8300  Linux filesystem
helper# mkfs.ext4 /dev/sda3
mke2fs 1.42.5 (29-Jul-2012)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
622592 inodes, 2489339 blocks
124466 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=2550136832
76 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks: 
	32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done 

helper# mkswap /dev/sda2
Setting up swapspace version 1, size = 524284 KiB
no label, UUID=e4f25981-3886-4939-a5cf-b05a0c7058a6
System installation

Let's mount the partition and install a minimal system with debootstrap:

helper# mkdir /mnt/chroot
helper# mount /dev/sda3 /mnt/chroot
helper# debootstrap wheezy /mnt/chroot
I: Retrieving Release
I: Retrieving Release.gpg
I: Checking Release signature
I: Configuring tasksel...
I: Configuring tasksel-data...
I: Base system installed successfully.

Now let's chroot into the system to finish the install:

helper# mount -t proc none /mnt/chroot/proc
helper# mount -t sysfs none /mnt/chroot/sys
helper# mount --bind /dev /mnt/chroot/dev
helper# chroot /mnt/chroot /bin/bash

Let's create /etc/mtab which is needed by many programs:

client# cp /proc/mounts /etc/mtab
client# sed -i '\|^/dev/sda3|,$!d' /etc/mtab

The sed command removes the first lines from the file, which are not relevant for the chrooted system, and keeps only lines from the one starting with /dev/sda3 to the end (replace sda3 if your partition name is different, of course).

Now let's create /etc/fstab. In this case, the best option is working with UUIDS, so let's find them:

client# blkid /dev/sda2 /dev/sda3
/dev/sda2: UUID="e4f25981-3886-4939-a5cf-b05a0c7058a6" TYPE="swap" 
/dev/sda3: UUID="6c816f51-0613-45e7-a15b-bc2d5cd00f88" TYPE="ext4"
client# echo 'UUID=6c816f51-0613-45e7-a15b-bc2d5cd00f88 / ext4 errors=remount-ro 0 1' >> /etc/fstab
client# echo 'UUID=e4f25981-3886-4939-a5cf-b05a0c7058a6 none swap sw 0 0' >> /etc/fstab
client# cat /etc/fstab
UUID=6c816f51-0613-45e7-a15b-bc2d5cd00f88 / ext4 errors=remount-ro 0 1
UUID=e4f25981-3886-4939-a5cf-b05a0c7058a6 none swap sw 0 0

Here we can install any extra package that we want:

client# apt-get install vim less openssh-server locales

This is also the time to do any other needed customization (eg localization, setting hostname, repositories, etc.).

Finally, we need to install a kernel, a bootloader and the initramfs utilities that we'll use later:

client# apt-get install linux-image-amd64 grub2 initramfs-tools

When prompted, we choose to install grub to /dev/sda, just as we'd do with a local hard disk.

iSCSI boot configuration

Now it's time to finally do what it takes for the actual boot process to work. Basically, we need a special initrd that configures the network, logs into the iSCSI target LUN, mounts it as / and calls pivot_root() on it. We will provide the needed information in the form of kernel command line arguments.

The open-iscsi package includes the necessary initrd hooks to do the above, so let's install it:

client# apt-get install open-iscsi

The relevant bit are in /usr/share/initramfs-tools/scripts/local-top/iscsi, where we learn that we can pass information by setting various ISCSI_* variables. We also want early (ie, kernel-level) IP configuration, which again can be done with special arguments to the kernel. We pass all this information by modifying the grub kernel command line, so we need the following line in the client's /etc/default/grub:

GRUB_CMDLINE_LINUX=" ISCSI_TARGET_IP= ISCSI_TARGET_PORT=3260 root=UUID=6c816f51-0613-45e7-a15b-bc2d5cd00f88 ip="

Here we're using static IP configuration, use "ip=dhcp" for DHCP (here the full story). Also, the GRUB_CMDLINE_LINUX_DEFAULT variable is normally set to "quiet", but it's probably better to remove that to be able to see what happens at boot. It can be readded back later if wanted.

Also note that if the SAN needs authentication more variables are needed, most likely ISCSI_USERNAME and ISCSI_PASSWORD.

Looking into /usr/share/initramfs-tools/hooks/iscsi, we learn that for the initrd update process to know that we want the iSCSI stuff included, we need to create the file /etc/iscsi/iscsi.initramfs:

client# touch /etc/iscsi/iscsi.initramfs

We also see that the file /etc/iscsi/initiatorname.iscsi gets copied into the inird and sourced to learn the initiator name, so let's write it inside it in the expected format:

client# echo "" > /etc/iscsi/initiatorname.iscsi

Now to apply all our changes, we regenerate grub config and the initrd:

client# update-grub
Generating grub.cfg ...
Found linux image: /boot/vmlinuz-3.2.0-4-amd64
Found initrd image: /boot/initrd.img-3.2.0-4-amd64
client# update-initramfs -u
update-initramfs: Generating /boot/initrd.img-3.2.0-4-amd64

We also need to set a root password, otherwise we won't be able to login:

client# passwd
Enter new UNIX password: 
Retype new UNIX password: 
passwd: password updated successfully

Lastly, as we said we don't want that Debian initscripts try to configure eth0 at boot. This is achieved in a simple way by either removing any reference to eth0 in /etc/network/interfaces, or just telling Debian that the configuration is "manual":

auto eth0
    iface eth0 inet manual
# other interfaces here ...

We can finally exit the chroot environment and log out of the iSCSI LUN in the helper machine:

client# exit
helper# umount /mnt/chroot/{dev,proc,sys,}
helper# iscsiadm -m node -T '' -p -u


Let's summarize what happens when our client is booted:

  • iPXE configures the network (either via DHCP or statically)
  • iPXE logs into the iSCSI LUN, mapping it as a local disk.
  • The MBR is read, and the boot process is kickstarted, which loads the kernel and the initrd.
  • Early IP configuration is performed during the boot, and an initrd script logs into the iSCSI LUN as specified on the kernel command line (the kernel is unaware of the PXE login)
  • pivot_root() is called on the iSCSI partition specified on the command line with root=, and from there the boot process proceeds normally

So we need to configure the first three steps. Using iPXE, all that we have to do is sending this iPXE script to the client:

set initiator-iqn

This is the bare minimum; if your SAN needs authentication, then username and password should also be set before attempting to boot (see the iPXE docs, and SAN URIs explained).

Test it!

So if we boot our client, we should see that iPXE logs into the LUN and loads GRUB:

Registered as SAN device 0x80
Booting from SAN device 0x80
GRUB loading.
Welcome to GRUB!

and after GRUB has booted the kernel, something like this in the kernel messages:

[    2.073406] scsi2 : iSCSI Initiator over TCP/IP
[    2.335112] scsi 2:0:0:0: Direct-Access     EQLOGIC  100E-00          4.3  PQ: 0 ANSI: 5
[    2.337709] scsi 2:0:0:0: Attached scsi generic sg1 type 0
[    2.349859] sd 2:0:0:0: [sda] 20971520 512-byte logical blocks: (10.7 GB/10.0 GiB)
[    2.351322] sd 2:0:0:0: [sda] Write Protect is off
[    2.352271] sd 2:0:0:0: [sda] Mode Sense: 77 00 00 08
[    2.353451] sd 2:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[    2.368450]  sda: sda1 sda2 sda3
[    2.370812] sd 2:0:0:0: [sda] Attached SCSI disk
[    3.396538] EXT4-fs (sda3): mounted filesystem with ordered data mode. Opts: (null)
[    4.810052] Adding 524284k swap on /dev/sda2.  Priority:-1 extents:1 across:524284k 
[    4.824409] EXT4-fs (sda3): re-mounted. Opts: (null)
[    4.959888] EXT4-fs (sda3): re-mounted. Opts: errors=remount-ro

At this point, we can use this machine and do all the normal administrative operations (add/remove packages, upgrades, kernel configuration, etc.) in the usual way, as if it had a local hard disk.