Ever wondered why for normal disk devices (eg /dev/sda), device files for the contained partitions are usually available (eg /dev/sad1 etc.), while for other non-disk devices (eg, disk images, LVM or software RAID volumes) there are no such device files? How to access such partitions?
A typical scenario is an LVM logical volume that is used as virtual disk by a guest VM, and the guest OS creates partitions on it. On the host, you just see, say,
# sfdisk -l /dev/mapper/vg0-guestdisk Disk /dev/mapper/vg0-guestdisk: 4568 cylinders, 255 heads, 63 sectors/track Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0 Device Boot Start End #cyls #blocks Id System /dev/mapper/vg0-guestdisk1 * 0+ 4376 4377- 35158221 83 Linux /dev/mapper/vg0-guestdisk2 4377 4567 191 1534207+ 82 Linux swap / Solaris /dev/mapper/vg0-guestdisk3 0 - 0 0 0 Empty /dev/mapper/vg0-guestdisk4 0 - 0 0 0 Empty
But those mysterious devices
The same can happen for plain disk images:
# sfdisk -l guest.img Disk guest.img: cannot get geometry Disk guest.img: 1305 cylinders, 255 heads, 63 sectors/track Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0 Device Boot Start End #cyls #blocks Id System guest.img1 * 0+ 497 498- 4000153+ 83 Linux guest.img2 498 1119 622 4996215 83 Linux guest.img3 1120 1304 185 1486012+ 82 Linux swap / Solaris guest.img4 0 - 0 0 0 Empty
and also for some software (md) RAID devices.
Anyway, in all these cases, it sometimes happens that one needs to do "something" with the inner partitions (eg, mount them, or recreating or resizing a file system, etc.). That obviously needs a device node to use, to avoid losing sanity. Here's where the neat utility kpartx saves the day.
Basically, what kpartx does is to scan a device or file and apply some magic to detect the partition table in it, and create devices corresponding to those partitions. Since it uses the device mapper, the devices it creates go under
Depending on the distribution, kpartx comes either as part of multipath-tools, or packaged separately.
Some examples
So let's take a partitioned LVM logical volume, the one shown in the previous example:
# kpartx -l /dev/mapper/vg0-guestdisk vg0-guestdisk1 : 0 70316442 /dev/mapper/vg0-guestdisk 63 vg0-guestdisk2 : 0 3068415 /dev/mapper/vg0-guestdisk 70316505
With -l, kpartx only displays what it found and the devices it would create, but doesn't actually create them. To create them, use -a:
# kpartx -a /dev/mapper/vg0-guestdisk
Nothing seems to happen, but let's have a look under /dev/mapper:
# ls -l /dev/mapper/vg0-guestdisk* brw-rw---- 1 root disk 251, 0 2010-09-24 18:57 /dev/mapper/vg0-guestdisk brw-rw---- 1 root disk 251, 3 2010-09-24 18:54 /dev/mapper/vg0-guestdisk1 brw-rw---- 1 root disk 251, 4 2010-09-24 18:54 /dev/mapper/vg0-guestdisk2
And now we can access them just fine:
# mount /dev/mapper/vg0-guestdisk1 /mnt # ls /mnt bin boot cdrom dev etc home initrd initrd.img initrd.img.old lib lost+found media mnt opt proc root sbin srv sys tmp usr var vmlinuz vmlinuz.old
But what happened? Let's have a look. After all, the new devices are just device maps (yes, on top of the main logical volume, which is itself a device map):
# dmsetup table /dev/mapper/vg0-guestdisk1 0 70316442 linear 251:0 63 # dmsetup table /dev/mapper/vg0-guestdisk2 0 3068415 linear 251:0 70316505
What the above fields mean is as follows (values for the first of the two maps):
- 0: starting block of the map
- 70316442: number of blocks in the map (in this case this is the total number of blocks in the "device")
- linear: mapping mode. Linear just means that: blocks are mapped sequentially from the source to this map
- 251:0: mapped device; here it's the man logical volume vg0-guestdisk, as could be seen from the previous ls output
- 63: starting block on the mapped device; this means that block 0 in the vg0-guestdisk1 map corresponds to block 63 in the vg0-guestdisk logical volume, block 1 here corresponds to block 64 there, etc.
A block here is 512 bytes, which means that 70316442 blocks are 36002018304 bytes, or about 33GiB or 36GB, depending on whether you like binary or decimal units (in case anybody cares at all, that is).
As a small aside just for completeness, I said that the "partitioned" device (/dev/mapper/vg0-guestdisk) is itself a device map, so here it is:
# dmsetup table /dev/mapper/vg0-guestdisk 0 73400320 linear 104:3 384
Which shows that this logical volume is a linear a map (LVM also allows for striped maps) built on top of the device with major 104 and minor 3, which on this system is nothing else than /dev/cciss/c0d0p3, a partition in an HP hardware RAID volume, which was previously turned into an LVM physical volume and added to the volume group vg0.
For an excellent introduction to the device mapper, which is what LVM, multipath devices and some disk encryption technologies are built upon, I suggest this Linux Gazette article which is quite enlightening.
Disk images
For disk images, kpartx can still be used, but since they are not real block devices, a block device needs to be associated to the file first. This sounds like a job for loopback devices, and indeed kpartx is smart enough to associate a loopback device automatically if it sees that what it's being asked to use is not a real block device:
# losetup -a # no loop devices in use now # kpartx -a guest.img # losetup -a /dev/loop0: [6801]:131312 (guest.img) # ls -l /dev/mapper/loop0* brw-rw---- 1 root disk 251, 5 2010-09-24 23:22 /dev/mapper/loop0p1 brw-rw---- 1 root disk 251, 7 2010-09-24 23:22 /dev/mapper/loop0p2
No need to add that
Conclusion
When the devices created by kpartx are no longer needed, the maps can be removed (either manually using
kpartx makes working with embedded partitions much easier, a scenario especially common in virtualization.
kpartx can handle different types of partition tables besides the classical DOS format, including BSD, Solaris, Sun and GPT (not tried, but it would seem so by looking at the source).
Finally, kpartx can be used manually on the command line, but it can also be integrated in udev rules to run automatically when the main device is created, so the corresponding devices for the partitions are created too. For example, many distributions run kpartx in a udev rule when a multipath device (eg
Ok, but do you know of any way to map the blocks of an LVM LV-hosted filesystem back to the underlying physical disk (presuming a sequential, contiguous LV hosting the filesystem). Basically, I'm trying to find a Linux LVM2 equivalent to doing a vxunroot.
Sorry I'm not aware of such a tool. IIUC what it does, assuming a simple contiguous LV entirely contained in a single PV, it might be possible to do something by moving each LV block back by N blocks (where N is the size of the LVM metadata in the PV), but this is just wild speculation, I may be misunderstanding (and even if not, it may not be possible to do what I suggest or it may be wrong).
You might want to look into libguestfs. It doesn't need root, and doesn't open up the security holes of using mount on untrusted guest images.
Thanks. Libguestfs is indeed one of those things that have been in my TODO list for quite a while, I hope I'll be able to give it a go soon.
No problem. We're keen to get people to try out the Debian and Ubuntu version too, although I guess from reading your blog you are using CentOS (which will ship with it in CentOS 6).
In fact I use many different distros, the article about CentOS was only because that happened on a time I had to install CentOS (and actually, CentOS isn't exactly my favorite, but I digress). It would be nice to see a Gentoo ebuild of libguestfs, but I see from the related bug that the road may be a bit longer (unless one is willing to experiment a bit).
But I won't mind trying it out under Debian, Ubuntu or Fedora/RHEL (which I suppose have the best support, seeing where you work) when I have time.