Skip to content

On pipes, subshells and descriptors

Everybody has run, at some point, some form of shell pipeline:

sort file.txt | grep foo

and so on. Now, how exactly shells implement that varies from shell to shell. The POSIX standard is a bit vague on the subject:

...each command of a multi-command pipeline is in a subshell environment; as an extension, however, any or all commands in a pipeline may be executed in the current environment.

which in practice seems to imply that each shell can implement pipelines putting none, one or any number of processes in their own subshells. And that is indeed what happens; there are well-known differences in implementations between shells.

Usually, what matters is whether the last element of a pipeline is run in a subshell or in the current shell. A blatant example of this, and one that often bites people, is using a while loop as follows:

count=0
somecommand | while IFS= read -r line; do
  count=$((count + 1))
  ...
done

echo "Count is $count"   # this prints 0 in most shells

Here we have a pipeline, and if the shell runs the last element of the pipeline (the while loop here) in a subshell, any variable set there does not affect the parent, and when "count" is printed at the end, it will still be 0 because here we are back in the parent. Here's another example that does not work as expected in shells that run the last part of a pipeline in a subshell:

echo 200 | read num

Here, if "num" is printed, it will have whatever value it had before the pipeline. Of the most popular shells, bash runs the last element of a pipeline in a subshell; ksh runs it in the current shell. Good shell programming should not depend on what the shell does. (As a matter of fact, in a second I'm going to present some code that is heavily dependent on what the shell does in those cases; it was part of a quick and dirty hack, so good practices probably don't matter too much there.)

For shells like bash that run the last part in a subshell, there are workarounds: see this page for more information.

Now here's another, more subtle example that is affected by the behavior that I just described. In the CentOS network install behind a proxy article, I use a carefully crafted shell script that is run by socat, mentioning that it MUST be written that way otherwise it won't work. Here is the script again (for simplicity, using netcat instead of socat to connect to the proxy):

#!/bin/bash
# mangle.sh: invoke as "mangle.sh <mirror name> <proxy_address> <proxy_port>"

mirror=$1
proxy_addr=$2
proxy_port=$3

{ nc "$proxy_addr" "$proxy_port" && exec 1>&- ; } < <(
  gawk -v mirror="$mirror" '/^Host: /{$0="Host: " mirror "\r"}
  /^GET /{$2="http://" mirror "/" $2}
  {print; fflush("")}' )

Here is the history of this script. The very first version was as follows:

#!/bin/sh
# mangle.sh: invoke as "mangle.sh <mirror name> <proxy_address> <proxy_port>"

mirror=$1
proxy_addr=$2
proxy_port=$3

gawk -v mirror="$mirror" '/^Host: /{$0="Host: " mirror "\r"}
/^GET /{$2="http://" mirror "/" $2}
{print; fflush("")}' | nc "$proxy_addr" "$proxy_port"

(note the shell is /bin/sh, not bash)
This was not working. The client sent the request, it was correctly mangled by the script, the server received it and sent back the reply, which socat relied back to the client, but after that the client seemed to hang. Some investigation showed that the server was indeed closing the connection after sending all the data, but the EOF was not being propagated back to the client (which would unblock it). Further investigation revealed that nc was indeed detecting the server close, and terminating; so why did socat seem to be unaware of it?

Let's revisit how the above script was invoked by socat:

socat TCP-L:4444 EXEC:"./mangle.sh centos.cict.fr localproxy.example.com 8080"

With EXEC, socat runs the specified program connecting its standard input and output back to itself, so it can send/receive data to the program. Since this is a shell script, what is started is effectively a shell, with its standard input and output connected to socat. But the script contains a pipeline, so the shell spawns other processes (purposely vague for now) to run the various parts of the pipeline. All these processes inherit the parent's descriptors, so these processes start out with their standard inputs and outputs connected to socat. Then, gawk's stdout and nc's stdin are further redirected to the pipe connecting them. But gawk's stdin and nc's stdout remain connected to the external socat.

So when nc terminates, socat should detect EOF on that channel, right?

Wrong! Remember, there is another process that is sharing the same channel, and that is the initial shell that was spawned by socat to run the script. When a file descriptor is shared, it's not considered closed until all its open instances are closed! This means that, for socat to detect EOF on that channel, both nc and the parent shell must close it; it is not enough if only nc terminates. This is perhaps obvious, and I do know that all the descriptors must be closed; only, in this case, I was not considering the full picture.

Once I realized this, I immediately saw the solution (or, that's what I thought):

#!/bin/sh
# mangle.sh: invoke as "mangle.sh <mirror name> <proxy_address> <proxy_port>"

mirror=$1
proxy_addr=$2
proxy_port=$3

gawk -v mirror="$mirror" '/^Host: /{$0="Host: " mirror "\r"}
/^GET /{$2="http://" mirror "/" $2}
{print; fflush("")}' | { nc "$proxy_addr" "$proxy_port" && exec 1>&- ; }

That is, once nc terminates, close fd 1 in the shell too. But this, again, was NOT working! Why not? Because /bin/sh on that system is bash, and bash runs the last element of a pipeline in a subshell! So I was closing fd 1 in that subshell, leaving the parent unaffected.

Alright. So we need to have this part:

{ nc "$proxy_addr" "$proxy_port" && exec 1>&- ; }

run in the context of the main shell. With bash, one can use process substitution and do this:

#!/bin/bash
# mangle.sh: invoke as "mangle.sh <mirror name> <proxy_address> <proxy_port>"

mirror=$1
proxy_addr=$2
proxy_port=$3

{ nc "$proxy_addr" "$proxy_port" && exec 1>&- ; } < <(
  gawk -v mirror="$mirror" '/^Host: /{$0="Host: " mirror "\r"}
  /^GET /{$2="http://" mirror "/" $2}
  {print; fflush("")}' )

which is the final (working) version. Yes, gawk still runs in a subshell, but it doesn't matter here. What matters is that after nc terminates because it gets EOF from the proxy, file descriptor 1 is closed in the context of the current shell; once that happens, all the instances are closed and socat DOES see EOF from the script, propagates it back to the client, and everything works.

Note that, with a shell that runs the last element of a pipeline in the context of the current shell (eg ksh), the penultimate version should work too.

Thanks to Gerhard Rieger who with his hints made me realize that what I initially saw as a problem in socat was instead due to my incomplete understanding of what was going on.

Update 19/02/2012: Recent versions of Bash introduced an option that allows the last command in a pipeline to be executed in the current shell, not in a subshell: lastpipe (enabled with shopt -s lastpipe). Here's the description from the manual page: "If set, and job control is not active, the shell runs the last command of a pipeline not executed in the background in the current shell environment."

Smart ranges in awk, part two

In a past article we saw how to use flags in awk to selectively process lines included in a range, delimited by two distinct patterns marking the beginning and end of the interval of interest. But let's focus on the case, also common, where there is a single pattern, and what's wanted is to select lines between occurrences of that pattern. As usual, the examples will just print the selected lines, but any kind of processing can be done on them.

A simple example input may be

nothing interesting here
nothing interesting here
--
interesting stuff here
interesting stuff here
--
nothing interesting here
nothing interesting here
nothing interesting here
--
other interesting stuff here
--
nothing interesting here
nothing interesting here

In this example, the lines with two dashes delimit the interesting blocks. So we may want to print the contents of, say, the second interesting block, or, for that matter, the n-th interesting block, which translates into printing lines between the m-th and the m+1-th occurrence of the pattern (m may be equal to n or not, depending on whether a delimiter line can be only either the beginning or the ending of a block, or it's allowed to be both things at the same time - for example when there are two consecutive blocks). A further variation may involve whether to print the delimiter lines or not.

This approach can be used if the delimiter lines have to be printed:

$ awk -v m=3 'count == m; /^--$/{count++; if(count==m) print}' file.txt
--
other interesting stuff here
--

If the delimiter lines are not wanted, here's how to do it:

$ awk -v m=3 '/^--$/{count++; next} count == m' file.txt
other interesting stuff here

Between first and last

Another, somewhat unrelated, variation may be "select all lines between the first and the last occurrence of a pattern". This is tricky, because when reading the input sequentially it's not possible to know whether the pattern we're seeing will be "the last".

If it can be assumed that there will be exactly two occurrences overall, this degenerated case is simpler to manage:

awk 'f; /pattern/{f=!f; if(f)print}' file.txt

("f" is for "flag") or, if the delimiter lines should not be printed,

awk '/pattern/{f=!f; next}; f' file.txt

Both can be written differently, by shuffling around when to print and/or the position of the flag, but the idea should be clear.

But let's see now hot to solve the general case of this problem, that is, when the pattern can occur an arbitrary number of times (including zero or one).

One approach is to make two passes, the first to discover where the first and last occurrence are, the second to actually print the lines in between:

awk 'NR==FNR{if(/pattern/){last=NR;if(!first)first=NR};next} FNR>=first && FNR<=last' file.txt file.txt

If the pattern never occurs, nothing is printed; if the pattern occurs only once, only that line is printed because "first" and "last" will have the same value. It can be trivially modified to exclude the delimiter lines:

awk 'NR==FNR{if(/pattern/){last=NR-1;if(!first)first=NR+1};next} FNR>=first && FNR<=last' file.txt file.txt

and this has the result that if the pattern occurs only once not even that line will be printed. This method uses a popular awk idiom, but it makes two passes over the file.

Now for a single-pass approach (which, obviously, should also handle the degenerated cases):

awk '!ok && /pattern/{ok=1} ok{p=p s $0;s=RS} /pattern/{print p;p=s=""}' file.txt

Since we can't know whether a matching line will be the last, the idea here is to accumulate lines in a buffer (an array could be used as well), and when we see a matching line (which could potentially be the last), print the buffer. The buffer includes the matching line itself, so the result is that the first and last matching delimiter lines are included in the output.

Here is a version that excludes those first and last delimiter lines (but still includes any matching line that occurs in between):

awk '!ok && /pattern/{ok=1; next} /pattern/{if(p)print p;p=s=""} ok{p=p s $0;s=RS}' file.txt

Removing newlines in text files

This comes up so often (though for reasons I can't imagine) that it deserves its own space.

Basically, what we want to do is to remove all the newlines (optionally replacing them with something else, say spaces) from a file, so from this input:

$ cat file.txt
line1
line2
line3
line4
line5

we get this output:

line1line2line3line4line5

If newlines are replaced by spaces, which seems to be a somewhat more common requirement, then we want this:

line1 line2 line3 line4 line5

Here I'm going to assume that the output should be a regular text stream, that is, correctly terminated with a newline character. If this requirement is removed, things are easier: basically tr alone can do the job (see below). But we want a real text file as output.

So let's look at a number of ways to accomplish our goal using common shell tools. If the newlines should be replaced with something other than a space, it's easy to adapt the examples accordingly.

tr

The easiest way would seem to be with the good old tr:

$ tr '\n' ' ' < file.txt
line1 line2 line3 line4 line5 $

(To remove newlines instead of replacing them with spaces, just use

tr -d '\n' < file.txt

in this and the following commands)

But we immediately see that this has two issues: the first is that it replaces ALL the newline characters, so the output does not end with a newline, which, strictly speaking, is not correct for a text file. That can be "fixed" wth this kludgy code:

$ tr '\n' ' ' < file.txt; echo
line1 line2 line3 line4 line5 
$

Apart from the ugliness, there is another issue: if you inspect the output with tools like od or hexdump (or carefully watch the output of the first example run above), you'll see that the last item has a trailing space, which is the result of the conversion of the ending newline. Maybe we don't want that extra space, so to remove it we have to add another kludge to the already kludgy code:

$ { tr '\n' ' ' < file.txt; echo; } | sed '$s/ $//'
line1 line2 line3 line4 line5
$

So tr is not good enough for our purposes.

paste

The oft-unknown paste utility can be (ab)used perfectly for the task:

$ paste -s -d ' ' file.txt
line1 line2 line3 line4 line5
$ paste -s -d '' file.txt
line1line2line3line4line5
$

sed

Note that the sed examples assume a modern sed, like GNU sed, that understands the syntax used.

To do the job with sed, we can use this code:

$ sed ':a;$!{N;s/\n/ /;ba;}' file.txt
line1 line2 line3 line4 line5
$ sed ':a;$!{N;s/\n//;ba;}' file.txt
line1line2line3line4line5
$

This works correctly because sed always adds a newline whenever it prints the pattern space. What the code does is to accumulate input lines in the pattern space, replacing (or removing) the newline inserted by the N command as each new line is read in. This is executed in a loop, which ends when the last line of input is reached. At that time, sed will print out the (non-newline-terminated) pattern space, and add a trailing newline, to give us the neat output we want.

Or if you prefer to explicitly slurp the file, you can do this:

$ sed ':a;$!{N;ba;};s/\n/ /g' file.txt
line1 line2 line3 line4 line5
$

This is not too bad, but it still slurps the whole file in memory, which may not be very efficient if the file is big.

awk

If we want to use awk, there are a couple of ways to do it. The most straightforward, which uses the string concatenation idiom, is as follows:

$ awk '{a=a s $0;s=" "}END{print a}' file.txt
line1 line2 line3 line4 line5
$ awk '{a=a $0}END{print a}' file.txt
line1line2line3line4line5
$

However, if the file is huge, the string variable a becomes as huge, because it accumulates all the lines, so it's the same issue as sed above: while this is not a major problem memory-wise nowadays, it will probably perform suboptimally (to say the least). To improve on that, we can use the same idea as above, but without storing lines in memory, instead printing them as we go using printf:

$ awk '{printf "%s%s",s,$0;s=" "}END{print""}' file.txt
line1 line2 line3 line4 line5
$ awk '{printf "%s",$0}END{print""}' file.txt
line1line2line3line4line5
$ awk -v ORS= '1;END{print RS}' file.txt  # similar to the previous one, without explicit printf
line1line2line3line4line5
$

Compare the two approaches applied to huge files, and you'll see a big difference in performance.

Here are two other awk ways:

$ awk 'NR>1{printf "%s ",p}{p=$0}END{print p}' file.txt
line1 line2 line3 line4 line5
$ awk '$1=$1' RS= FS='\n' file.txt  # slurps the whole file, assumes no empty lines in the input
line1 line2 line3 line4 line5
$

Perl

Apologies to all the real Perl programmers!

Since Perl can be used in a sed- and awk-way, all the methods described for these tools can be implemented in Perl. However, in some cases, Perl can express the same things in more compact ways:

# slurp the file, replace all the newlines except the last
$ perl -p0777e 's/\n(?!$)/ /gs' file.txt
line1 line2 line3 line4 line5
$ perl -p0777e 's/\n(?!$)//gs' file.txt
line1line2line3line4line5
$

Here a negative lookahead is used to check that the newline character is not the last character in the file. Alternatively, we can avoid slurping and just replace all the newlines only if we are not at the end of file:

$ perl -pe 's/\n/ / if ! eof' file.txt
line1 line2 line3 line4 line5
$ perl -pe 'chomp if ! eof' file.txt
line1line2line3line4line5
$

This is nice because as said it doesn't slurp the file, and relies on the eof function which awk does not provide. Another (more compact) way again exploits eof:

$ perl -ple '$\=eof()?"\n":" "' file.txt
line1 line2 line3 line4 line5
# same but more obfuscated
$ perl -ple '$\=(eof)?$/:$"' file.txt
line1 line2 line3 line4 line5
$

The special variable $\ is Perl's output record separator, ie much like ORS in awk (you can even use $ORS as a synonym).

CentOS network install behind a proxy

Warning: this is a gross hack, which may or may not work for you. It did work for me, but that doesn't really mean much. It's also very inefficient and resource-intensive. However, it's quick and dirty, and if you have no alternatives, it may be worth a try.

As most people sadly discover (the writer among them), the CentOS net install does not support installation behind an HTTP proxy.

But still, let's see if it's possible to work around that limitation.

Plan

This started out as a crazy idea, and turned out to be actually working (at least with my proxies), so hopefully it will be useful to other people.
I had been playing with socat lately, and I figured it could help for this task.

The idea is: trick the CentOS installer into thinking it's talking directly with the mirror, but instead send its requests to the local office proxy. Of course they can't just be forwarded unchanged; we need to intercept and mangle them into a format that is palatable to the proxy.

The basic technique is very simple: take requests like this, coming from the installer

GET /something HTTP/1.0      # or 1.1
Host: somehost
...rest of headers here...

turn them into

GET http://some.centos.mirror/something HTTP/1.0     # or 1.1
Host: some.centos.mirror
...rest of headers here...

and forward this to the proxy. With a little luck, requests will always have that form, and the proxy will like the modified version.
Also, obviously, forward the responses coming from the proxy back to the installer.

All that is needed is socat, bash, perl or awk, and the name of a CentOS mirror (and a pair of crossed fingers).

Implementation

We're using a host called fake.example.com where socat is installed. To make it work, we need to point the CentOS installer at this host, either by name or IP address.

The basic socat command to run is this:

$ socat TCP-L:4444 EXEC:some_mangling_code

Socat spawns the mangling code and connects itself to the code's standard input and output. So what the code needs to do is to edit the data it receives on standard input, send it to the local proxy, read the replies and print them on standard output, where socat will read them and forward them back to the installer.

The mangling code to achieve the transformation described above is quite straightforward. The text editing part can be easily implemented in sed, awk or perl, and another instance of socat can be used to connect to the proxy (netcat could be used as well). Here is an example bash script to implement it:

#!/bin/bash
# mangle.sh: invoke as "mangle.sh <mirror name> <proxy_address> <proxy_port>"

mirror=$1
proxy_addr=$2
proxy_port=$3

{ socat - TCP:"$proxy_addr":"$proxy_port" && exec 1>&- ; } < <(
  gawk -v mirror="$mirror" '/^Host: /{$0="Host: " mirror "\r"}
  /^GET /{$2="http://" mirror "/" $2}
  {print; fflush("")}' )

If you have netcat installed, you may use that rather than socat, as it's probably a somewhat lighter process to spawn:

#!/bin/bash
# mangle.sh: invoke as "mangle.sh <mirror name> <proxy_address> <proxy_port>"

mirror=$1
proxy_addr=$2
proxy_port=$3

{ nc "$proxy_addr" "$proxy_port" && exec 1>&- ; } < <(
  gawk -v mirror="$mirror" '/^Host: /{$0="Host: " mirror "\r"}
  /^GET /{$2="http://" mirror "/" $2}
  {print; fflush("")}' )

In any case, the code MUST be written that way, and not some other simpler or more obvious way, due to some subletites related to the way pipelines and file descriptors are handled by the shell. This is interesting enough that it deserves an article on its own; I won't go into the details here. But the above code (with process substitution and explicitly closing descriptor 1) did work for me, while other variations did not. Also the buffer flushing code is vital to quickly send data to the proxy, otherwise awk would buffer its output seeing that its stdout is not connected to a terminal.

Lines should be terminated by CR+LF as dictated by the standard. What the awk code does is: if the line starts with "Host: ", replace the whole line with a brand new Host: header with the name of the chosen mirror; if the line starts with "GET ", prepend "http://<mirror name>/" to the path that is specified. All other lines are printed verbatim.

I'm using the CentOS mirror "centos.cict.fr" here, but it's only an example and any mirror can be used, as long as the appropriate CentOS directory for the mirror is used in the installer text box. Here is the list of mirrors from the official page: North America, Europe, other regions. Pick one that is close to you, and note the CentOS directory to use for it.

So in the end here's the complete command to start the socat redirector/mangler:

socat TCP-L:4444,reuseaddr,fork EXEC:"./mangle.sh centos.cict.fr localproxy.example.com 8080"

Since multiple TCP requests are likely to come from the installer, the fork option to socat spawns a child instance to manage each request, while keeping the main instance listening for future requests. Note that the above mechanism will spawn a ridiculous number of processes, so keep this in mind.

Test it

I was surprised that this worked at all. Here are some screenshots:

As said, the "Web site name" should point to the redirector, but the "CentOS directory" should match the real CentOS directory on the real mirror that was specified as argument to mangle.sh.

The installer seems to like it:

In fact, from here the installation can be completed without problems, at least in my tests.

And finally:

We were able to fool the installer.

Socat 2

Socat 2 is still in beta, but according to the documentation, its new address chains feature should make the task described in this article easier.

There is an example at this page demonstrating unidirectional EXEC addresses:

socat - "exec1:gzip % exec1:gunzip | tcp:remotehost:port"

This can be used for our task, but it needs some modification:

  • Of course we want to edit the stream, not compress it. In the "left to right" direction, we'll apply some mangling code (similar to the one we used with socat 1, but without the socat/netcat part); in the "right to left" direction (replies from the proxy), we can use the special NOP address to pass everything unchanged, as we're not mangling the replies;
  • We want a listening "server" rather than stdin/stdout;
  • Multiple distinct TCP streams are likely to come from the installer, so we need to fork a child to service each of them.

So here's a socat 2 version of the CentOS installer proxifier:

socat TCP-L:4444,reuseaddr,fork "EXEC1:./mangle.awk centos.cict.fr % NOP | TCP:localproxy.example.com:8080"

And here's a sample mangle.awk:

#!/usr/bin/gawk -f
# mangle.awk: invoke as "mangle.awk <mirror name>"

BEGIN{mirror = ARGV[1]; ARGC--}
/^Host: /{$0 = "Host: " mirror "\r"}
/^GET /{$2 = "http://" mirror "/" $2}
{print; fflush("")}

or if you prefer Perl:

#!/usr/bin/perl -p 
# mangle.pl: invoke as "mangle.pl <mirror name>"

BEGIN{use IO::Handle; autoflush STDOUT; $mirror = $ARGV[0]; shift}
s%^Host: .*%Host: ${mirror}\r%; s%^GET (.*)%GET http://${mirror}/$1%;

From a few tests, the socat 2 version seems to run fine most of the time, but there are some occasional hiccups where the installer reports an error and it should be told to retry the operation, after which it usually succeeds (not investigated).

Conclusions

Please note that a huge, HUGE number of processes (thousands) are spawned on the redirector, especially in the last stage of the install, where individual packages are downloaded. This is very resource-intensive for the machine where the redirector runs, so be considerate when choosing where to implement the redirector. Perhaps using netcat instead of another instance of the (hevier) socat in the shell script could be marginally better, but it doesn't change the number of processes that are spawned.

Also, it makes a number of assumptions about the requests and replies exchanged by client and server, and it will most likely break if something deviates from those assumptions (eg, HTTP redirects just to name one).

Nevertheless, despite being such a crude hack, it seems to work surprisingly well, at least in the environment where it needed to run. There even was no need to mangle the replies from the proxy. As usual, YMMV.

Anyway, let's hope that future releases will have native proxy support!

Access partitions in non-disk block devices with kpartx

Ever wondered why for normal disk devices (eg /dev/sda), device files for the contained partitions are usually available (eg /dev/sad1 etc.), while for other non-disk devices (eg, disk images, LVM or software RAID volumes) there are no such device files? How to access such partitions?

A typical scenario is an LVM logical volume that is used as virtual disk by a guest VM, and the guest OS creates partitions on it. On the host, you just see, say, /dev/vg0/guestdisk, yet it does contain partitions:

# sfdisk -l /dev/mapper/vg0-guestdisk 

Disk /dev/mapper/vg0-guestdisk: 4568 cylinders, 255 heads, 63 sectors/track
Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0

   Device Boot Start     End   #cyls    #blocks   Id  System
/dev/mapper/vg0-guestdisk1   *      0+   4376    4377-  35158221   83  Linux
/dev/mapper/vg0-guestdisk2       4377    4567     191    1534207+  82  Linux swap / Solaris
/dev/mapper/vg0-guestdisk3          0       -       0          0    0  Empty
/dev/mapper/vg0-guestdisk4          0       -       0          0    0  Empty

But those mysterious devices /dev/mapper/vg0-guestdisk1 etc. are nowhere.

The same can happen for plain disk images:

# sfdisk -l guest.img
Disk guest.img: cannot get geometry

Disk guest.img: 1305 cylinders, 255 heads, 63 sectors/track
Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0

   Device Boot Start     End   #cyls    #blocks   Id  System
guest.img1   *      0+    497     498-   4000153+  83  Linux
guest.img2        498    1119     622    4996215   83  Linux
guest.img3       1120    1304     185    1486012+  82  Linux swap / Solaris
guest.img4          0       -       0          0    0  Empty

and also for some software (md) RAID devices.

Anyway, in all these cases, it sometimes happens that one needs to do "something" with the inner partitions (eg, mount them, or recreating or resizing a file system, etc.). That obviously needs a device node to use, to avoid losing sanity. Here's where the neat utility kpartx saves the day.

Basically, what kpartx does is to scan a device or file and apply some magic to detect the partition table in it, and create devices corresponding to those partitions. Since it uses the device mapper, the devices it creates go under /dev/mapper, which may be somewhat confusing because that's also where other devices created using the device mapper (LVM volumes, SAN multipath devices), and against which kpartx may be run, live.

Depending on the distribution, kpartx comes either as part of multipath-tools, or packaged separately.

Some examples

So let's take a partitioned LVM logical volume, the one shown in the previous example:

# kpartx -l /dev/mapper/vg0-guestdisk
vg0-guestdisk1 : 0 70316442 /dev/mapper/vg0-guestdisk 63
vg0-guestdisk2 : 0 3068415 /dev/mapper/vg0-guestdisk 70316505

With -l, kpartx only displays what it found and the devices it would create, but doesn't actually create them. To create them, use -a:

# kpartx -a /dev/mapper/vg0-guestdisk

Nothing seems to happen, but let's have a look under /dev/mapper:

# ls -l /dev/mapper/vg0-guestdisk*
brw-rw---- 1 root disk 251, 0 2010-09-24 18:57 /dev/mapper/vg0-guestdisk
brw-rw---- 1 root disk 251, 3 2010-09-24 18:54 /dev/mapper/vg0-guestdisk1
brw-rw---- 1 root disk 251, 4 2010-09-24 18:54 /dev/mapper/vg0-guestdisk2

And now we can access them just fine:

# mount /dev/mapper/vg0-guestdisk1 /mnt
# ls /mnt
bin  boot  cdrom  dev  etc  home  initrd  initrd.img  initrd.img.old  lib  lost+found  media  mnt  opt  proc  root  sbin  srv  sys  tmp  usr  var  vmlinuz  vmlinuz.old

But what happened? Let's have a look. After all, the new devices are just device maps (yes, on top of the main logical volume, which is itself a device map):

# dmsetup table /dev/mapper/vg0-guestdisk1
0 70316442 linear 251:0 63
# dmsetup table /dev/mapper/vg0-guestdisk2
0 3068415 linear 251:0 70316505

What the above fields mean is as follows (values for the first of the two maps):

  • 0: starting block of the map
  • 70316442: number of blocks in the map (in this case this is the total number of blocks in the "device")
  • linear: mapping mode. Linear just means that: blocks are mapped sequentially from the source to this map
  • 251:0: mapped device; here it's the man logical volume vg0-guestdisk, as could be seen from the previous ls output
  • 63: starting block on the mapped device; this means that block 0 in the vg0-guestdisk1 map corresponds to block 63 in the vg0-guestdisk logical volume, block 1 here corresponds to block 64 there, etc.

A block here is 512 bytes, which means that 70316442 blocks are 36002018304 bytes, or about 33GiB or 36GB, depending on whether you like binary or decimal units (in case anybody cares at all, that is).

As a small aside just for completeness, I said that the "partitioned" device (/dev/mapper/vg0-guestdisk) is itself a device map, so here it is:

# dmsetup table /dev/mapper/vg0-guestdisk
0 73400320 linear 104:3 384

Which shows that this logical volume is a linear a map (LVM also allows for striped maps) built on top of the device with major 104 and minor 3, which on this system is nothing else than /dev/cciss/c0d0p3, a partition in an HP hardware RAID volume, which was previously turned into an LVM physical volume and added to the volume group vg0.
For an excellent introduction to the device mapper, which is what LVM, multipath devices and some disk encryption technologies are built upon, I suggest this Linux Gazette article which is quite enlightening.

Disk images

For disk images, kpartx can still be used, but since they are not real block devices, a block device needs to be associated to the file first. This sounds like a job for loopback devices, and indeed kpartx is smart enough to associate a loopback device automatically if it sees that what it's being asked to use is not a real block device:

# losetup -a    # no loop devices in use now
# kpartx -a guest.img
# losetup -a
/dev/loop0: [6801]:131312 (guest.img)
# ls -l /dev/mapper/loop0*
brw-rw---- 1 root disk 251, 5 2010-09-24 23:22 /dev/mapper/loop0p1
brw-rw---- 1 root disk 251, 7 2010-09-24 23:22 /dev/mapper/loop0p2

No need to add that /dev/mapper/loop0p1 and /dev/mapper/loop0p2 are maps that reference /dev/loop0 (which in turn is associated to our image file).

Conclusion

When the devices created by kpartx are no longer needed, the maps can be removed (either manually using dmsetup remove, or with kpartx -d). The devices should also be removed before the partitioning is changed (with fdisk, etc.) because it seems that otherwise kpartx sometimes is not able to delete the old maps, giving errors like "ioctl: LOOP_CLR_FD: Device or resource busy" when trying to delete or update the old maps. So to be safe, it's better to run kpartx -d, change the partitions, then again kpartx -a. If old maps lie around and are accidentally used, disaster is likely as they will be referencing the start and end of partitions that no longer exist, resulting in mapping now-unrelated parts of the device.

kpartx makes working with embedded partitions much easier, a scenario especially common in virtualization.

kpartx can handle different types of partition tables besides the classical DOS format, including BSD, Solaris, Sun and GPT (not tried, but it would seem so by looking at the source).

Finally, kpartx can be used manually on the command line, but it can also be integrated in udev rules to run automatically when the main device is created, so the corresponding devices for the partitions are created too. For example, many distributions run kpartx in a udev rule when a multipath device (eg /dev/mapper/mpath1 etc.) is created, so its partitions will show up as well as the main device.