Here we will boot a machine (diskless or not, but even if it has a disk it won't be used) entirely from the network using PXE and the iSCSI protocol.
There are a few options to boot a system whose root partition is on iSCSI:
- The machine could have a local bootloader that loads a local kernel and initrd. With suitable options, the initrd scripts are directed to log into an iSCSI LUN and use it as /. In this case, the LUN that is used as root filesystem does not need to have a kernel or bootloader installed.
- Same as above, but the kernel and initrd are downloaded using PXE (via TFTP or HTTP).
- The most interesting option (and the one that will be described here) is booting directly the iSCSI LUN via PXE. In this case, the LUN looks exactly like a local disk, with partitions, MBR, bootloader (grub) etc. The MBR is read and executed, which loads the second-stage bootloader and so on, just as if the disk were local.
A peculiar thing about iSCSI is that it doesn't really like the network going away while a session is connected. For this reason it is very important that the network be stable and reliable, but there are also a few specific boot-time tweaks to do in the Linux distribution that is being run from iSCSI. One of them is, of course, supplying the needed iSCSI information to the kernel; another one is preventing the initscripts from trying to (re)configure the network on the interface that is being used for the iSCSI session, as this may cause it to go down temporarily. In this case, the network is configured early, by the initrd, and should not be touched afterwards.
For this example, we will boot a Debian Wheezy over iSCSI, using PXE to read the LUN right from the very beginning (MBR and bootloader stage). For this to work, a PXE implementation that supports booting from iSCSI is obviously needed. iPXE is one such implementation (see here for more information on how to setup a more complete PXE infrastructure); here we will assume that the booting client is sent iPXE commands.
Installation
Debian does not (yet?) support direct installation to iSCSI, so there are two ways to do this: the first way is to transfer an existing installation to the LUN (eg using dd or rsync). The second (described here) is to use debootsrap on an existing helper machine to partition, install and prepare the LUN. The specific tweaks described starting from "iSCSI boot configuration" have to be performed regardless of whether it's an existing or a new install (if it's an existing installation, remember to chroot into it before).
When commands are shown, the prompt shows where they have to be run: "helper" is the helper machine, "client" is the chroot environment (ie the future iSCSI boot client).
Log into the LUN
We assume that our LUN is provided by the SAN at 10.10.10.10 (san.example.com), is called iqn.2007-08.com.example.san:rootp and has a size of 10G. So from a (possibly Debian or Ubuntu) machine with open-iscsi installed, we can log into it:
helper# iscsiadm -m discovery -t sendtargets -p 10.10.10.10 10.10.10.10:3260,1 iqn.2007-08.com.example.san:rootp helper# iscsiadm -m node -T 'iqn.2007-08.com.example.san:rootp' -p 10.10.10.10 -l Logging in to [iface: default, target: iqn.2007-08.com.example.san:rootp, portal: 10.10.10.10,3260] (multiple) Login to [iface: default, target: iqn.2007-08.com.example.san:rootp, portal: 10.10.10.10,3260] successful. helper# ls -l /dev/disk/by-path ... lrwxrwxrwx 1 root root 9 Nov 2 15:03 ip-10.10.10.10:3260-iscsi-iqn.2007-08.com.example.san:rootp-lun-0 -> ../../sda ...
Partitioning
To make things more interesting (not much), we're going to use the newer GPT partitioning. For simplicity, here we'll create a 512MB swap partition and a 9.5G root partition. On BIOS systems, which are still the majority, GPT also needs a small partition at the beginning of the disk, the so-called "BIOS boot partition" (type EF02). See here, here and here for more info (all three documents are very interesting reads). So here's the disk layout:
helper# gdisk -l /dev/sda GPT fdisk (gdisk) version 0.8.5 Partition table scan: MBR: protective BSD: not present APM: not present GPT: present Found valid GPT with protective MBR; using GPT. Disk /dev/sda: 20971520 sectors, 10.0 GiB Logical sector size: 512 bytes Disk identifier (GUID): 67D92849-CD16-4CB1-8B3B-0758E62227CA Partition table holds up to 128 entries First usable sector is 34, last usable sector is 20971486 Partitions will be aligned on 2048-sector boundaries Total free space is 2014 sectors (1007.0 KiB) Number Start (sector) End (sector) Size Code Name 1 2048 8191 3.0 MiB EF02 BIOS boot partition 2 8192 1056767 512.0 MiB 8200 Linux swap 3 1056768 20971486 9.5 GiB 8300 Linux filesystem helper# mkfs.ext4 /dev/sda3 mke2fs 1.42.5 (29-Jul-2012) Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) Stride=0 blocks, Stripe width=0 blocks 622592 inodes, 2489339 blocks 124466 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=2550136832 76 block groups 32768 blocks per group, 32768 fragments per group 8192 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632 Allocating group tables: done Writing inode tables: done Creating journal (32768 blocks): done Writing superblocks and filesystem accounting information: done helper# mkswap /dev/sda2 Setting up swapspace version 1, size = 524284 KiB no label, UUID=e4f25981-3886-4939-a5cf-b05a0c7058a6
System installation
Let's mount the partition and install a minimal system with debootstrap:
helper# mkdir /mnt/chroot helper# mount /dev/sda3 /mnt/chroot helper# debootstrap wheezy /mnt/chroot I: Retrieving Release I: Retrieving Release.gpg I: Checking Release signature ... I: Configuring tasksel... I: Configuring tasksel-data... I: Base system installed successfully.
Now let's chroot into the system to finish the install:
helper# mount -t proc none /mnt/chroot/proc helper# mount -t sysfs none /mnt/chroot/sys helper# mount --bind /dev /mnt/chroot/dev helper# chroot /mnt/chroot /bin/bash client#
Let's create /etc/mtab which is needed by many programs:
client# cp /proc/mounts /etc/mtab client# sed -i '\|^/dev/sda3|,$!d' /etc/mtab
The sed command removes the first lines from the file, which are not relevant for the chrooted system, and keeps only lines from the one starting with /dev/sda3 to the end (replace sda3 if your partition name is different, of course).
Now let's create /etc/fstab. In this case, the best option is working with UUIDS, so let's find them:
client# blkid /dev/sda2 /dev/sda3 /dev/sda2: UUID="e4f25981-3886-4939-a5cf-b05a0c7058a6" TYPE="swap" /dev/sda3: UUID="6c816f51-0613-45e7-a15b-bc2d5cd00f88" TYPE="ext4" client# echo 'UUID=6c816f51-0613-45e7-a15b-bc2d5cd00f88 / ext4 errors=remount-ro 0 1' >> /etc/fstab client# echo 'UUID=e4f25981-3886-4939-a5cf-b05a0c7058a6 none swap sw 0 0' >> /etc/fstab client# cat /etc/fstab # UNCONFIGURED FSTAB FOR BASE SYSTEM UUID=6c816f51-0613-45e7-a15b-bc2d5cd00f88 / ext4 errors=remount-ro 0 1 UUID=e4f25981-3886-4939-a5cf-b05a0c7058a6 none swap sw 0 0
Here we can install any extra package that we want:
client# apt-get install vim less openssh-server locales
This is also the time to do any other needed customization (eg localization, setting hostname, repositories, etc.).
Finally, we need to install a kernel, a bootloader and the initramfs utilities that we'll use later:
client# apt-get install linux-image-amd64 grub2 initramfs-tools
When prompted, we choose to install grub to /dev/sda, just as we'd do with a local hard disk.
iSCSI boot configuration
Now it's time to finally do what it takes for the actual boot process to work. Basically, we need a special initrd that configures the network, logs into the iSCSI target LUN, mounts it as / and calls pivot_root() on it. We will provide the needed information in the form of kernel command line arguments.
The open-iscsi package includes the necessary initrd hooks to do the above, so let's install it:
client# apt-get install open-iscsi
The relevant bit are in /usr/share/initramfs-tools/scripts/local-top/iscsi, where we learn that we can pass information by setting various ISCSI_* variables. We also want early (ie, kernel-level) IP configuration, which again can be done with special arguments to the kernel. We pass all this information by modifying the grub kernel command line, so we need the following line in the client's /etc/default/grub:
GRUB_CMDLINE_LINUX_DEFAULT="" GRUB_CMDLINE_LINUX="ISCSI_INITIATOR=iqn.2007-08.com.example.client:client ISCSI_TARGET_NAME=iqn.2007-08.com.example.san:rootp ISCSI_TARGET_IP=10.10.10.10 ISCSI_TARGET_PORT=3260 root=UUID=6c816f51-0613-45e7-a15b-bc2d5cd00f88 ip=10.10.10.50::10.10.10.1:255.255.255.0:client:eth0:off"
Here we're using static IP configuration, use "ip=dhcp" for DHCP (here the full story). Also, the GRUB_CMDLINE_LINUX_DEFAULT variable is normally set to "quiet", but it's probably better to remove that to be able to see what happens at boot. It can be readded back later if wanted.
Also note that if the SAN needs authentication more variables are needed, most likely ISCSI_USERNAME and ISCSI_PASSWORD.
Looking into /usr/share/initramfs-tools/hooks/iscsi, we learn that for the initrd update process to know that we want the iSCSI stuff included, we need to create the file /etc/iscsi/iscsi.initramfs:
client# touch /etc/iscsi/iscsi.initramfs
We also see that the file /etc/iscsi/initiatorname.iscsi gets copied into the inird and sourced to learn the initiator name, so let's write it inside it in the expected format:
client# echo "InitiatorName=iqn.2007-08.com.example.client:client" > /etc/iscsi/initiatorname.iscsi
Now to apply all our changes, we regenerate grub config and the initrd:
client# update-grub Generating grub.cfg ... Found linux image: /boot/vmlinuz-3.2.0-4-amd64 Found initrd image: /boot/initrd.img-3.2.0-4-amd64 done client# update-initramfs -u update-initramfs: Generating /boot/initrd.img-3.2.0-4-amd64
We also need to set a root password, otherwise we won't be able to login:
client# passwd Enter new UNIX password: Retype new UNIX password: passwd: password updated successfully
Lastly, as we said we don't want that Debian initscripts try to configure eth0 at boot. This is achieved in a simple way by either removing any reference to eth0 in /etc/network/interfaces, or just telling Debian that the configuration is "manual":
#/etc/network/interfaces auto eth0 iface eth0 inet manual # other interfaces here ...
We can finally exit the chroot environment and log out of the iSCSI LUN in the helper machine:
client# exit helper# umount /mnt/chroot/{dev,proc,sys,} helper# iscsiadm -m node -T 'iqn.2007-08.com.example.san:rootp' -p 10.10.10.10 -u
PXE
Let's summarize what happens when our client is booted:
- iPXE configures the network (either via DHCP or statically)
- iPXE logs into the iSCSI LUN, mapping it as a local disk.
- The MBR is read, and the boot process is kickstarted, which loads the kernel and the initrd.
- Early IP configuration is performed during the boot, and an initrd script logs into the iSCSI LUN as specified on the kernel command line (the kernel is unaware of the PXE login)
- pivot_root() is called on the iSCSI partition specified on the command line with root=, and from there the boot process proceeds normally
So we need to configure the first three steps. Using iPXE, all that we have to do is sending this iPXE script to the client:
#!ipxe set initiator-iqn iqn.2007-08.com.example.client:client sanboot iscsi:san.example.com:6:3260:0:iqn.2007-08.com.example.san:rootp
This is the bare minimum; if your SAN needs authentication, then username and password should also be set before attempting to boot (see the iPXE docs, and SAN URIs explained).
Test it!
So if we boot our client, we should see that iPXE logs into the LUN and loads GRUB:
... Registered as SAN device 0x80 Booting from SAN device 0x80 GRUB loading. Welcome to GRUB!
and after GRUB has booted the kernel, something like this in the kernel messages:
... [ 2.073406] scsi2 : iSCSI Initiator over TCP/IP [ 2.335112] scsi 2:0:0:0: Direct-Access EQLOGIC 100E-00 4.3 PQ: 0 ANSI: 5 [ 2.337709] scsi 2:0:0:0: Attached scsi generic sg1 type 0 [ 2.349859] sd 2:0:0:0: [sda] 20971520 512-byte logical blocks: (10.7 GB/10.0 GiB) [ 2.351322] sd 2:0:0:0: [sda] Write Protect is off [ 2.352271] sd 2:0:0:0: [sda] Mode Sense: 77 00 00 08 [ 2.353451] sd 2:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 2.368450] sda: sda1 sda2 sda3 [ 2.370812] sd 2:0:0:0: [sda] Attached SCSI disk [ 3.396538] EXT4-fs (sda3): mounted filesystem with ordered data mode. Opts: (null) ... [ 4.810052] Adding 524284k swap on /dev/sda2. Priority:-1 extents:1 across:524284k [ 4.824409] EXT4-fs (sda3): re-mounted. Opts: (null) [ 4.959888] EXT4-fs (sda3): re-mounted. Opts: errors=remount-ro ...
At this point, we can use this machine and do all the normal administrative operations (add/remove packages, upgrades, kernel configuration, etc.) in the usual way, as if it had a local hard disk.
Update 07/11/2014: According to the README.Debian file that comes with open-iscsi, when passing iSCSI variables from grub their names should be lowercase, though uppercase seems to work just as fine. It is also possible to set the various variables directly in the /etc/iscsi/iscsi.initramfs file (this time in uppercase).
If manual configuration is necessary, there are two ways to include iSCSI boot
options in your initramfs:1) Touch /etc/iscsi/iscsi.initramfs and provide options on the command line.
This provides flexibility, but if passwords are used, is not very secure.
Available boot line options:
iscsi_initiator, iscsi_target_name, iscsi_target_ip,
iscsi_target_port, iscsi_target_group, iscsi_username,
iscsi_password, iscsi_in_username, iscsi_in_password
See iscsistart --help for a description of each option2) Provide iSCSI option in /etc/iscsi/iscsi.initramfs.
Available options:
ISCSI_INITIATOR, ISCSI_TARGET_NAME, ISCSI_TARGET_IP,
ISCSI_TARGET_PORT, ISCSI_TARGET_GROUP, ISCSI_USERNAME
ISCSI_PASSWORD, ISCSI_IN_USERNAME, ISCSI_IN_PASSWORDExample Syntax:
ISCSI_INITIATOR="iqn.1993-08.org.debian:01:9b3e5634fdb9"
ISCSI_TARGET_NAME=iqn.2008-01.com.example:storage.foo
ISCSI_TARGET_IP=192.168.1.1
ISCSI_TARGET_PORT=3160
ISCSI_USERNAME="username"
ISCSI_PASSWORD="password"
ISCSI_IN_USERNAME="in_username"
ISCSI_IN_PASSWORD="in_password"
ISCSI_TARGET_GROUP=1Remember to set proper permissions if username/passwords are used.
Update2 29/01/2015: It seems iPXE has trouble booting from those iSCSI targets that use multiple IP addresses (eg most Dell Equallogic), where there is a "main" or "group" IP to which initiators connect, but the actual session is established against another IP to which the initator is directed after the first contact with the main IP (typically, the IP of a specific iSCSI interface on the storage). iPXE seems to have trouble understanding this form of "redirection", and just fails the iSCSI login. On the other hand, connecting to single-IP targets like iSCSI Enterprise Target works fine.
This page has been very helpful in gaining an understanding of iPXE with iSCSI.
My world runs on a home grown LFS 8.1 Linux version that boots through syslinux 6.03 on regular disks
This config has been moved to a Synology iSCSI Lun/Target.
The Lun is partitioned using gpt with only 2 partitions for / (ext4) and /boot (fat32)
The gptmbr is put in place
The syslinux is put in place on /boot and extlinux -i . is run
When powering up the machine, I get to the iPXE prompt
= run dhcp -> route shows that I got IP/mask and gateway addresses
= set the iSCSI initiator identifier
= sanboot the iSCSI Lun on the Synology that was prepared above
At that point I see:
. the SAN device 0x80 gets registered
. it tells me 'Booting from SAN device 0x80'
. A "SYSLINUX 6.03...." banner is shown
.... immediately followed by the machine resetting
It looks as if syslinux/extlinux is barely starting and crashing for reasons it cannot tell on the console...
Any pointers as to why the recipe on this page should not be used for my setup, or pointers how I could address this are appreciated :-)
Sadly, I don't have any experience booting syslinux.
Thanks for this.
I confirm this works also for ubuntu 12.04.
What you need to do is to install ubuntu 12.04 to iscsi first. (login to iscsi target, use entire disk and lvm)
When you boot up you will have a lot of ip-config problems.
To solve that just drop into recovery mode, make file system r/w, and then perform the above modifications to grub:
i.e. modify /etc/default/grub and add those two lines
then update-grub.
Dou you have problems with reboots with thic configuration.
I use debian 7 and when I try to reboot client I get sysnc SCSI cache and system halts.
I'm not seeing any problems. When I reboot I see that Debian correctly detects that / is on iSCSI and delays/avoids any umount or deactivation attempt. Other than that, which is expected and correct, the system reboots just fine.
Sample output:
/etc/fstab on that machine:
Great article. Worked more or less flawlessly with Debian Wheezy.
When trying with Ubuntu Lucid I ran into the issue that installing open-iscsi in the chroot environment lead to apt attempting to start the service (which fails in a chroot). this left dpkg in a bad state, despite being able to proceed with the rest of this doc. Upon trying to install any software, apt would attempt to finish the install, restarting iscsid and dropping your rootfs when booting this live.
The solution was to temporarily modify /etc/init.d/open-iscsi, blank out the "start_daemon" function, finish the apt install, then restore the initscript (and disable it on boot using 'update-rc.d -f open-iscsi remove'.
Great article here, most of this stuff appears either undocumented or very sparsely documented, and this article pieced together information I found across the net into a clear explanation.
A great follow up article would be avoiding GRUB entirely, as without it the entire boot process could be dynamic based on kernel arguments (asthe kernel and initramfs would be loaded not via iscsi, booting could be done via ipxe+http and no "sanboot" is needed.
Thanks for sharing your findings! In principle, booting without grub (ie getting the kernel + initramfs directly from PXE) should be easier. I might write a short howto for that in the future.