Foreword: please note that the code available here is only for demonstration purposes. If you want to be serious, you'll have to make it more robust and integrate it with other code. Also, the description is by no means a definitive reference on the subject, but rather the result of my experimentation. Please report any bug or error you find in the code or otherwise in this article. Thanks.
Link to the source tarball described in the article: simpletun.
Update 18/07/2010: Thanks to this post, I've learned that recent versions of iproute2 can (finally) create tun/tap devices, although the functionality is (still?) blissfully undocumented. Thus, installing tunctl (UML utilities) or OpenVPN just to be able to create tun devices is no longer needed. The following is with iproute2-2.6.34:
# ip tuntap help
Usage: ip tuntap { add | del } [ dev PHYS_DEV ]
[ mode { tun | tap } ] [ user USER ] [ group GROUP ]
[ one_queue ] [ pi ] [ vnet_hdr ]
Where: USER := { STRING | NUMBER }
GROUP := { STRING | NUMBER }
Tun/tap interfaces are a feature offered by Linux (and probably by other UNIX-like operating systems) that can do userspace networking, that is, allow userspace programs to see raw network traffic (at the ethernet or IP level) and do whatever they like with it. This document attempts to explain how tun/tap interfaces work under Linux, with some sample code to demonstrate their usage.
How it works
Tun/tap interfaces are software-only interfaces, meaning that they exist only in the kernel and, unlike regular network interfaces, they have no physical hardware component (and so there's no physical "wire" connected to them). You can think of a tun/tap interface as a regular network interface that, when the kernel decides that the moment has come to send data "on the wire", instead sends data to some userspace program that is attached to the interface (using a specific procedure, see below). When the program attaches to the tun/tap interface, it gets a special file descriptor, reading from which gives it the data that the interface is sending out. In a similar fashion, the program can write to this special descriptor, and the data (which must be properly formatted, as we'll see) will appear as input to the tun/tap interface. To the kernel, it would look like the tun/tap interface is receiving data "from the wire".
The difference between a tap interface and a tun interface is that a tap interface outputs (and must be given) full ethernet frames, while a tun interface outputs (and must be given) raw IP packets (and no ethernet headers are added by the kernel). Whether an interface functions like a tun interface or like a tap interface is specified with a flag when the interface is created.
The interface can be transient, meaning that it's created, used and destroyed by the same program; when the program terminates, even if it doesn't explicitly destroy the interface, the interfaces ceases to exist. Another option (the one I prefer) is to make the interface persistent; in this case, it is created using a dedicated utility (like tunctl or openvpn --mktun
Once a tun/tap interface is in place, it can be used just like any other interface, meaning that IP addresses can be assigned, its traffic can be analyzed, firewall rules can be created, routes pointing to it can be established, etc.
With this knowledge, let's try to see how we can use a tun/tap interface and what can be done with it.
Creating the interface
The code to create a brand new interface and to (re)attach to a persistent interface is essentially the same; the difference is that the former must be run by root (well, more precisely, by a user with the CAP_NET_ADMIN capability), while the latter can be run by an ordinary user if certain conditions are met. Let's start with the creation of a new interface.
First, whatever you do, the device
The next step in creating the interface is issuing a special
If the ioctl() succeeds, the virtual interface is created and the file descriptor we had is now associated to it, and can be used to communicate.
At this point, two things can happen. The program can start using the interface right away (probably configuring it with at least an IP address before), and, when it's done, terminate and destroy the interface. The other option is to issue a couple of other special tunctl or openvpn --mktun
The basic code used to create a virtual interface is shown in the file
#include <linux /if.h>
#include <linux /if_tun.h>
int tun_alloc(char *dev, int flags) {
struct ifreq ifr;
int fd, err;
char *clonedev = "/dev/net/tun";
/* Arguments taken by the function:
*
* char *dev: the name of an interface (or '\0'). MUST have enough
* space to hold the interface name if '\0' is passed
* int flags: interface flags (eg, IFF_TUN etc.)
*/
/* open the clone device */
if( (fd = open(clonedev, O_RDWR)) < 0 ) {
return fd;
}
/* preparation of the struct ifr, of type "struct ifreq" */
memset(&ifr, 0, sizeof(ifr));
ifr.ifr_flags = flags; /* IFF_TUN or IFF_TAP, plus maybe IFF_NO_PI */
if (*dev) {
/* if a device name was specified, put it in the structure; otherwise,
* the kernel will try to allocate the "next" device of the
* specified type */
strncpy(ifr.ifr_name, dev, IFNAMSIZ);
}
/* try to create the device */
if( (err = ioctl(fd, TUNSETIFF, (void *) &ifr)) < 0 ) {
close(fd);
return err;
}
/* if the operation was successful, write back the name of the
* interface to the variable "dev", so the caller can know
* it. Note that the caller MUST reserve space in *dev (see calling
* code below) */
strcpy(dev, ifr.ifr_name);
/* this is the special file descriptor that the caller will use to talk
* with the virtual interface */
return fd;
}
The tun_alloc() function takes two parameters:
char *devcontains the name of an interface (for example, tap0, tun2, etc.). Any name can be used, though it's probably better to choose a name that suggests which kind of interface it is. In practice, names like tunX or tapX are usually used. If*devis '\0', the kernel will try to create the "first" available interface of the requested type (eg, tap0, but if that already exists, tap1, and so on).int flagscontains the flags that tell the kernel which kind of interface we want (tun or tap). Basically, it can either take the value IFF_TUN to indicate a TUN device (no ethernet headers in the packets), or IFF_TAP to indicate a TAP device (with ethernet headers in packets).
Additionally, another flag IFF_NO_PI can be ORed with the base value. IFF_NO_PI tells the kernel to not provide packet information. The purpose of IFF_NO_PI is to tell the kernel that packets will be "pure" IP packets, with no added bytes. Otherwise (if IFF_NO_PI is unset), 4 extra bytes are added to the beginning of the packet (2 flag bytes and 2 protocol bytes). IFF_NO_PI need not match between interface creation and reconnection time. Also note that when capturing traffic on the interface with Wireshark, those 4 bytes are never shown.
A program can thus use the following code to create a device:
char tun_name[IFNAMSIZ]; char tap_name[IFNAMSIZ]; char *a_name; ... strcpy(tun_name, "tun1"); tunfd = tun_alloc(tun_name, IFF_TUN); /* tun interface */ strcpy(tap_name, "tap44"); tapfd = tun_alloc(tap_name, IFF_TAP); /* tap interface */ a_name = malloc(IFNAMSIZ); a_name[0]='\0'; tapfd = tun_alloc(a_name, IFF_TAP); /* let the kernel pick a name */
At this point, as said before, the program can either use the interface as is for its purposes, or it can set it persistent (and optionally assign ownership to a specific user/group). If it does the former, there's not much more to be said. But if it does the latter, here's what happens.
Two additional ioctl()s are available, which are usually used together. The first syscall can set (or remove) the persistent status on the interface. The second allows assigning ownership of the interface to a regular (non-root) user. Both features are implemented in the programs tunctl (part of UML utilities) and openvpn --mktun
...
/* "delete" is set if the user wants to delete (ie, make nonpersistent)
an existing interface; otherwise, the user is creating a new
interface */
if(delete) {
/* remove persistent status */
if(ioctl(tap_fd, TUNSETPERSIST, 0) < 0){
perror("disabling TUNSETPERSIST");
exit(1);
}
printf("Set '%s' nonpersistent\n", ifr.ifr_name);
}
else {
/* emulate behaviour prior to TUNSETGROUP */
if(owner == -1 && group == -1) {
owner = geteuid();
}
if(owner != -1) {
if(ioctl(tap_fd, TUNSETOWNER, owner) < 0){
perror("TUNSETOWNER");
exit(1);
}
}
if(group != -1) {
if(ioctl(tap_fd, TUNSETGROUP, group) < 0){
perror("TUNSETGROUP");
exit(1);
}
}
if(ioctl(tap_fd, TUNSETPERSIST, 1) < 0){
perror("enabling TUNSETPERSIST");
exit(1);
}
if(brief)
printf("%s\n", ifr.ifr_name);
else {
printf("Set '%s' persistent and owned by", ifr.ifr_name);
if(owner != -1)
printf(" uid %d", owner);
if(group != -1)
printf(" gid %d", group);
printf("\n");
}
}
...
These additional ioctl()s must still be run by root. But what we have now is a persistent interface owned by a specific user, so processes running as that user can successfully attach to it.
As said, it turns out that the code to (re)attach to an existing tun/tap interface is the same as the code used to create it; in other words,
- The interface must exist already and be owned by the same user that is attempting to connect (and probably be persistent)
- the user must have read/write permissions on /dev/net/tun
- The flags provided must match those used to create the interface (eg if it was created with IFF_TUN then the same flag must be used when reattaching)
This is possible because the kernel allows the TUNSETIFF ioctl() to succeed if the user issuing it specifies the name of an already existing interface and he is the owner of the interface. In this case, no new interface has to be created, so a regular user can successfully perform the operation.
So this is an attempt to explain what happens when
- If a non-existent or no interface name is specified, that means the user is requesting the allocation of a new interface. The kernel thus creates an interface using the given name (or picking the next available name if an empty name was given). This works only if done by root.
- If the name of an existing interface is specified, that means the user wants to connect to a previously allocated interface. This can be done by a normal user, provided that: the user has appropriate rights on the clone device AND is the owner of the interface (set at creation time), AND the specified mode (tun or tap) matches the mode set at creation time.
You can have a look at the code that implements the above steps in the file
In any case, no non-root user is allowed to configure the interface (ie, assign an IP address and bring it up), but this is true of any regular interface too. The usual methods (suid binary wrapper, sudo, etc.) can be used if a non-root user needs to do some operation that requires root privileges.
This is a possible usage scenario (one I use all the time):
- The virtual interfaces are created, made persistent, assigned to an user, and configured by root (for example, by initscripts at boot time, using tunctl or equivalent)
- The regular users can then attach and detach as many times as they wish from virtual interfaces that they own.
- The virtual interfaces are destroyed by root, for example by scripts run at shutdown time, perhaps using
tunctl -d or equivalent
Let's try it
After this lengthy but necessary introduction, it's time to do some work with it. So, since this is a normal interface, we can use it as we would another regular interface. For our purposes, there is no difference between tun and tap interfaces; it's the program that creates or attaches to it that must know its type and accordingly expect or write data. Let's create a persistent interface and assign it an IP address:
# openvpn --mktun --dev tun2 Fri Mar 26 10:29:29 2010 TUN/TAP device tun2 opened Fri Mar 26 10:29:29 2010 Persist state set to: ON # ip link set tun2 up # ip addr add 10.0.0.1/24 dev tun2
Let's fire up a network analyzer and look at the traffic:
# tshark -i tun2 Running as user "root" and group "root". This could be dangerous. Capturing on tun2 # On another console # ping 10.0.0.1 PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data. 64 bytes from 10.0.0.1: icmp_seq=1 ttl=64 time=0.115 ms 64 bytes from 10.0.0.1: icmp_seq=2 ttl=64 time=0.105 ms ...
Looking at the output of tshark, we see...nothing. There is no traffic going through the interface. This is correct: since we're pinging the interface's IP address, the operating system correctly decides that no packet needs to be sent "on the wire", and the kernel itself is replying to these pings. If you think about it, it's exactly what would happen if you pinged another interface's IP address (for example eth0): no packets would be sent out. This might sound obvious, but could be a source of confusion at first (it was for me).
Knowing that the assignment of a /24 IP address to an interface creates a connected route for the whole range through the interface, let's modify our experiment and force the kernel to actually send something out of the tun interface (NOTE: the following works only with kernels < 2.6.36; later kernels behave differently, as explained in the comments):
# ping 10.0.0.2 PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data. From 10.0.0.1 icmp_seq=2 Destination Host Unreachable From 10.0.0.1 icmp_seq=3 Destination Host Unreachable ... # on the tshark console ... 0.000000 10.0.0.1 -> 10.0.0.2 ICMP Echo (ping) request 0.999374 10.0.0.1 -> 10.0.0.2 ICMP Echo (ping) request 1.999055 10.0.0.1 -> 10.0.0.2 ICMP Echo (ping) request ...
Now we're finally seeing something. The kernel sees that the address does not belong to a local interface, and a route for 10.0.0.0/24 exists through the tun2 interface. So it duly sends the packets out tun2. Note the different behavior here between tun and tap interfaces: with a tun interface, the kernel sends out the IP packet (raw, no other headers are present - try analyzing it with tshark or wireshark), while with a tap interface, being ethernet, the kernel would try to ARP for the target IP address:
# pinging 10.0.0.2 now, but through tap2 (tap) # ping 10.0.0.2 PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data. # on the tshark console ... 0.111858 82:03:d4:07:62:b6 -> Broadcast ARP Who has 10.0.0.2? Tell 10.0.0.1 1.111539 82:03:d4:07:62:b6 -> Broadcast ARP Who has 10.0.0.2? Tell 10.0.0.1 ...
Furthermore, with a tap interface the traffic will be composed by full ethernet frames (again, you can check with the network analyzer). Note that the MAC address for a tap interface is autogenerated by the kernel at interface creation time, but can be changed using the SIOCSIFHWADDR ioctl() (look again in drivers/net/tun.c, function tun_chr_ioctl()). Finally, being an ethernet interface, the MTU is set to 1500:
# ip link show dev tap2
7: tap2: mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 500
link/ether 82:03:d4:07:62:b6 brd ff:ff:ff:ff:ff:ff
Of course, so far no program is attached to the interface, so all these outgoing packets are just lost. So let's do a step ahead and write a simple program that attaches to the interface and reads packets sent out by the kernel.
A simple program
We're going to write a program that attaches to a tun interface and reads packets that the kernel sends out that interface. Remember that you can run the program as a normal user if the interface is persistent, provided that you have the necessary permissions on the clone device
...
/* tunclient.c */
char tun_name[IFNAMSIZ];
/* Connect to the device */
strcpy(tun_name, "tun77");
tun_fd = tun_alloc(tun_name, IFF_TUN | IFF_NO_PI); /* tun interface */
if(tun_fd < 0){
perror("Allocating interface");
exit(1);
}
/* Now read data coming from the kernel */
while(1) {
/* Note that "buffer" should be at least the MTU size of the interface, eg 1500 bytes */
nread = read(tun_fd,buffer,sizeof(buffer));
if(nread < 0) {
perror("Reading from interface");
close(tun_fd);
exit(1);
}
/* Do whatever with the data */
printf("Read %d bytes from device %s\n", nread, tun_name);
}
...
If you configure tun77 as having IP address 10.0.0.1/24 and then run the above program while trying to ping 10.0.0.2 (or any address in 10.0.0.0/24 other than 10.0.0.1, for that matter), you'll read data from the device:
# openvpn --mktun --dev tun77 --user waldner Fri Mar 26 10:48:12 2010 TUN/TAP device tun77 opened Fri Mar 26 10:48:12 2010 Persist state set to: ON # ip link set tun77 up # ip addr add 10.0.0.1/24 dev tun77 # ping 10.0.0.1 ... # on another console $ ./tunclient Read 84 bytes from device tun77 Read 84 bytes from device tun77 ...
If you do the math, you'll see where these 84 byetes come from: 20 are for the IP header, 8 for the ICMP header, and 56 are the payload of the ICMP echo message as you can see when you run the ping command:
$ ping 10.0.0.2 PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data. ...
Try experimenting with the above program sending various traffic types through the interface (also try using tap), and verify that the size of the data you're reading is correct for the interface type. Each read() returns a full packet (or frame if using tap mode); similarly, if we were to write, we would have to write an entire IP packet (or ethernet frame in tap mode) for each write().
Now what can we do with this data? Well, we could for example emulate the behavior of the target of the traffic we're reading; again, to keep things simple, let's stick with the ping example. We could analyze the received packet, extract the information needed to reply from the IP header, ICMP header and payload, build an IP packet containing an appropriate ICMP echo reply message, and send it back (ie, write it into the descriptor associated with the tun/tap device). This way the originator of the ping will actually receive an answer. Of course you're not limited to ping, so you can implement all kinds of network protocols. In general, this implies parsing the received packet, and act accordingly. If using tap, to correctly build reply frames you would probably need to implement ARP in your code. All of this is exactly what User Mode Linux does: it attaches a modified Linux kernel running in userspace to a tap interface that exist on the host, and communicates with the host through that. Of course, being a full Linux kernel, it does implement TCP/IP and ethernet. Newer virtualization platforms like libvirt use tap interfaces extensively to communicate with guests that support them like
In the same way, you can attach with your own code to the interface and practice network programming and/or ethernet and TCP/IP stack implementation. To get started, you can look at (you guessed it)
Tunnels
But there's another thing we can do with tun/tap interfaces. We can create tunnels. We don't need to reimplement TCP/IP; instead, we can write a program to just relay the raw data back and forth to a remote host running the same program, which does the same thing in a specular way. Let's suppose that our program above, in addition to attaching to the tun/tap interface, also establishes a network connection to a remote host, where a similar program (connected to a local tun/tap interface as well) is running in server mode. (Actually the two programs are the same, who is the server and who is the client is decided with a command line switch). Once the two programs are running, traffic can flow in either direction, since the main body of the code will be doing the same thing at both sites. The network connection here is implemented using TCP, but any other mean can be used (ie UDP, or even ICMP!). You can download the full program source code here: simpletun.
Here is the main loop of the program, where the actual work of moving data back and forth between the tun/tap interface and the network tunnel is performed. For clearness, debug statements have been removed (you can find the full version in the source tarball).
...
/* net_fd is the network file descriptor (to the peer), tap_fd is the
descriptor connected to the tun/tap interface */
/* use select() to handle two descriptors at once */
maxfd = (tap_fd > net_fd)?tap_fd:net_fd;
while(1) {
int ret;
fd_set rd_set;
FD_ZERO(&rd_set);
FD_SET(tap_fd, &rd_set); FD_SET(net_fd, &rd_set);
ret = select(maxfd + 1, &rd_set, NULL, NULL, NULL);
if (ret < 0 && errno == EINTR) {
continue;
}
if (ret < 0) {
perror("select()");
exit(1);
}
if(FD_ISSET(tap_fd, &rd_set)) {
/* data from tun/tap: just read it and write it to the network */
nread = cread(tap_fd, buffer, BUFSIZE);
/* write length + packet */
plength = htons(nread);
nwrite = cwrite(net_fd, (char *)&plength, sizeof(plength));
nwrite = cwrite(net_fd, buffer, nread);
}
if(FD_ISSET(net_fd, &rd_set)) {
/* data from the network: read it, and write it to the tun/tap interface.
* We need to read the length first, and then the packet */
/* Read length */
nread = read_n(net_fd, (char *)&plength, sizeof(plength));
/* read packet */
nread = read_n(net_fd, buffer, ntohs(plength));
/* now buffer[] contains a full packet or frame, write it into the tun/tap interface */
nwrite = cwrite(tap_fd, buffer, nread);
}
}
...
(for the details of the read_n() and cwrite() functions, refer to the source; what they do should be obvious. Yes, the above code is not 100% correct with regard to select(), and makes some naive assumptions like expecting that read_n() and cwrite() do not block. As I said, the code is for demonstration purposes only)
Here is the main logic of the above code:
- The program uses select() to keep both descriptors under control at the same time; if data comes in from either descriptor, it's written out to the other.
- Since the program usese TCP, the receiver will see a single stream of data, which makes recognizing packet boundaries difficult. So when a packet or frame is written to the network, its length is prepended (2 bytes) to the actual packet.
- When data comes in from the tap_fd descriptor, a single read reads a full packet or frame; thus this can directly be written to the network, with its length prepended. Since that length number is a short int, thus longer than one byte, written in "raw" binary format, ntohs()/htons() are used to interoperate between machines with different endianness.
- When data comes in from the network, thanks to the aforementioned trick, we can know how long the next packet is going to be by reading the two-bytes length that precedes it in the stream. When we've read the packet, we write it to the tun/tap interface descriptor, where it will be received by the kernel as coming "from the wire".
So what can you do with such a program? Well, you can create a tunnel! First, create and confgure the necessary tun/tap interfaces on the hosts at both ends of the tunnel, including assigning them an IP address. For this example, I'll assume two tun interfaces: tun11, 192.168.0.1/24 on the local computer, and tun3, 192.168.0.2/24 on the remote computer. simpletun connects the hosts using TCP port 55555 by default (you can change that using the -p command line switch). The remote host will run simpletun in server mode, and the local host will run in client mode. So here we go (the remote server is at 10.2.3.4):
[remote]# openvpn --mktun --dev tun3 --user waldner Fri Mar 26 11:11:41 2010 TUN/TAP device tun3 opened Fri Mar 26 11:11:41 2010 Persist state set to: ON [remote]# ip link set tun3 up [remote]# ip addr add 192.168.0.2/24 dev tun3 [remote]$ ./simpletun -i tun3 -s # server blocks waiting for the client to connect [local]# openvpn --mktun --dev tun11 --user waldner Fri Mar 26 11:17:37 2010 TUN/TAP device tun11 opened Fri Mar 26 11:17:37 2010 Persist state set to: ON [local]# ip link set tun11 up [local]# ip addr add 192.168.0.1/24 dev tun11 [local]$ ./simpletun -i tun11 -c 10.2.3.4 # nothing happens, but the peers are now connected [local]$ ping 192.168.0.2 PING 192.168.0.2 (192.168.0.2) 56(84) bytes of data. 64 bytes from 192.168.0.2: icmp_seq=1 ttl=241 time=42.5 ms 64 bytes from 192.168.0.2: icmp_seq=2 ttl=241 time=41.3 ms 64 bytes from 192.168.0.2: icmp_seq=3 ttl=241 time=41.4 ms 64 bytes from 192.168.0.2: icmp_seq=4 ttl=241 time=41.0 ms --- 192.168.0.2 ping statistics --- 4 packets transmitted, 4 received, 0% packet loss, time 2999ms rtt min/avg/max/mdev = 41.047/41.599/42.588/0.621 ms # let's try something more exciting now [local]$ ssh waldner@192.168.0.2 waldner@192.168.0.2's password: Linux remote 2.6.22-14-xen #1 SMP Fri Feb 29 16:20:01 GMT 2008 x86_64 Welcome to remote! [remote]$
When a tunnel like the above is set up, all that can be seen from the outside is just a connection (TCP in this case) between the two peer simpletuns. The "real" data (ie, that exchanged by the high level applications - ping or ssh in the above example) is never exposed directly on the wire (although it IS sent in cleartext, see below). If you enable IP forwarding on a host that is running simpletun, and create the necessary routes on the other host, you can reach remote networks through the tunnel.
Also note that if the virtual interfaces involved are of the tap kind, it is possible to transparently bridge two geographically distant ethernet LANs, so that the devices think that they are all on the same layer 2 network. To do this, it's necessary to bridge, on the gateways (ie, the hosts that run simpletun or another tunneling software that uses tap interfaces), the local LAN interface and the virtual tap interface together. This way, frames received from the LAN are also sent to the tap interface (because of the bridge), where the tunneling application reads them and send them to the remote peer; there, another bridge will ensure that frames so received are forwarded to the remote LAN. The same thing will happen in the opposite direction. Since we are passing ethernet frames between the two LANs, the two LANs are effectively bridged together. This means that you can have 10 machines in London (for instance) and 50 in Berlin, and you can create a 60-computer ethernet network using addresses from the 192.168.1.0/24 subnet (or any subnet address you want, as long as it can accommodate at least 60 host addresses). However, do NOT use simpletun if you want to set up something like that!
Extensions and improvements
simpletun is very simple and simplistic, and can be extended in a number of ways. First of all, new ways of connecting to the peer can be added. For example, UDP connectivity could be implemented, or, if you're brave, ICMP (perhaps also over IPv6). Second, data is currently passed in cleartext over the network connection. But when the data is in the program's buffer it could be changed somehow before being transmitted, for example it could be encrypted (and similarly decrypted at the other end).
However, for the purpose of this tutorial, the limited version of the program should already give you an idea of how tunnelling using tun/tap works. While simpletun is a simple demonstration, this is the way many popular programs that use tun/tap interfaces work, like OpenVPN, vtun, or Openssh's VPN feature.
Finally, it's worth noting that if the tunnel connection is over TCP, we can have a situation where we're running the so-called "tcp over tcp"; for more information see "Why tcp over tcp is a bad idea". Note that applications like OpenVPN use UDP by default for this very reason, and using TCP is well-known for reducing performance (although in some cases it's the only option).

Dear Waldner!
I am facing following problem while compiling.
/usr/include/linux/if.h:165: error: field 'ifru_addr' has incomplete type
/usr/include/linux/if.h:166: error: field 'ifru_dstaddr' has incomplete type
/usr/include/linux/if.h:167: error: field 'ifru_broadaddr' has incomplete type
/usr/include/linux/if.h:168: error: field 'ifru_netmask' has incomplete type
/usr/include/linux/if.h:169: error: field 'ifru_hwaddr' has incomplete type
Please let me know if you know the error, and how can I get rid of it.
Hi saime,
try replacing
#include <linux/if.h>
with
#include <net/if.h>
and see if that helps.
Thanx Waldner,
It works, though I could also corrected error while using #include before I got this reply.
But solution you proposed is also work perfectly.
I have one more question, if you can help me with it. Actually I wanna read these packets from java. Is it any possibility to read TUN packets via java. As I can see from tun_alloc call I am getting descriptor but it is C-compatible.
Other solution for me would be call this function from java via JINI and then write it into one file and then again read it from Java.
But is there any easier solution.
Thanks in advance.
Hi saime,
I'm not really able to help you much with Java. I think there are some libraries floating around to access POSIX syscalls from Java (never tried myself, so I can't really speak); for example this http://www.bmsi.com/java/posix/index.html seems to be reasonably recent; others are old and unmaintained, you can still find them if you google a bit.
If such a library is not good enough for you, then I'm afraid JNI is your best bet.
Using TAP with Java is indirectly possible, look at the source of this project:
http://p2pvpn.org/
P2PVPN is a Java program and uses BitTorrent-Trackers for finding other VPN-clients (only invited users get the keys to join such a VPN). Cool idea, never tested this program.
Hi Waldner,
I was wondering about the format of the bytes.
Now I am able to setup tun, collect the data via C program and send it to the named pipe and retrieve it from java code at the other end of the pipe. But I was wondering about the format. Do I need to change the format? and besides that, how can I represent this data in the way that the wireshark or other network analyzers display, Any hing?
thanks in advance.
br,
Saime
Hi Saime,
Wireshark wants its data in pcap format. This is not just a raw dump of the data; it contains additional information. The format is documented for example here in the Wireshark wiki.
Of course, a pcap file is also what you get by default if you run Wireshark directly on the interface while traffic is flowing, and then save it (File -> Save...).
In case you want to write the pcap file yourself, you can use libpcap, which is the low-level library upon which tcpdump, Wireshark and other network analyzers are built. There are some Java wrappers to use libpcap from Java: see for example here and here. Both have tutorial and examples, so you should be able to write your own Java code to capture the packets and save them in pcap format.
I am trying to do something similar. Can you describe how you were able to create a tun and retrieve data to java code.
I didn't. The code in the article is C code. To work with tun devices with java, you may check out these urls:
http://www.bmsi.com/java/posix/index.html To use POSIX system calls with Java
http://p2pvpn.org/ A VPN using tun/tap devices written in Java
http://www.koders.com/java/fid6C0CBC76450F649DE9081FE8596BB50FEC023D88.aspx Sample Java code from the P2PVPN application.
I am attempting to use java native code to have java code attach to a tun device that what setup with the above C code. Can you describe how you were able to create a tun and retrieve data to java code.
Sorry Waldner, the question was meant for Saime.
Hi Waldner,
Jpcap also looks to me the better option even not to read from c file but directly reading packets from network or a tun-interface. Nevertheless direct from a network will be better.
I was wondering about something, how can I write exactly bytes what I read from tun-interface. I mean as a matter of fact ping packet should be 84 bytes but entire buffer with 1500 characters is filled. Should it not be that only 84 char of the buffer should be filled?
Second problem I have with named pipe. but I dont if you know about it as java code is keep on reading though I have stopped c program to write into pipe. (it is kind of weird for me because as far as I know it should be a FIFO)
Thanks in advance and thanx for Jpcap library info.
Hi Saime,
I'm not sure what you mean "reading directly from a network". Isn't that what you do when you use Wireshark or your own pcap/Jpcap program to sniff traffic passing through the interface? That should do what you want.
As for the buffer filling, if you're talking about writing packets to the pcap output file, as documented in the Wireshark wiki page you only have to write the actual packet bytes in the file, with no padding; but you should also write the correct packet length in the packet header (again refer to that page for the details). However, if you use a library like Jpcap, it should have functions that automatically take care of writing the data to the output file in the correct format; for Jpcap, see for example the writePacket method in the JpcapWriter class. See also the good tutorial on the Jpcap web site.
Hello,
regarding IPv6, there is a difference between Distributions or (kernel) configurations:
Start simpletun on the first computer:
./simpletun -i tun0 -s &
ifconfig tun0 up
ifconfig tun0 add fe80::1234/64
And on the second computer:
./simpletun -i tun0 -c computer1 &
ifconfig tun0 up
ifconfig tun0 add fe80::5678/64
ping6 -I tun0 fe80::1234
I have run this on my 2 Debian PCs. There it worked as expected: The pings were answered through the tun device.
But then run this also on 2 SUSE PCs. On the SUSE PCs the IPv6 pings were transmitted through the tun device,
as it could be seen by wireshark. But the kernel has otherwise ignored the IPv6 pings!
So there must be a difference in kernel configuration between Debian and SUSE, which is related to this.
Has anyone an idea, what parameter / compile time configuration pevents the kernel from handling IPv6 packets received at the tun device (but not on eth)?
Greetings
Juergen
Hi Juergen,
I don't have a SUSE system to test. For what it's worth, it works for me on Debian, Gentoo and Arch Linux. The only thing I can think of is to check that there are no firewall rules on the SUSE boxes that block the ICMPv6 packets in either direction (check withip6tables-save ).
I'm assuming that the basic simpletun IPv4 connection is successful? If the other end's firewall is dropping IPv4 packets, it may look like the peers are connected whereas they are not. Try running simpletun with -d to get debug messages and make sure that the two peers are able to connect successfully.
Hello Waldner,
with the -d option and also with wireshark I saw that the IPv6 and the IPv4 packets were transmitted through the tun device tunnel.
The SUSE kernel responded to the IPv4 pings but not to the IPv6 pings.
But the SUSE kernel responded to IPv6 pings transmitted through the ethernet device.
I have also tried with an embedded linux 2.6.15 kernel without firewall. There I saw the same behavior.
Therefore I don't believe that this is caused by firewall.
Greetings
Juergen
Hi Juergen,
ok, fair enough. I'm a bit out of ideas here; just out of curiosity, have you tried to
1) use tap instead of tun and
2) use non-link-local addresses (eg something in the 2000::/3 range)?
Hello Waldner,
1: Not yet
2: I have tried with an fc00:: prefix.
Now I have found this old bug for report for freeBSD with a patch for the tun device
http://www.mail-archive.com/freebsd-net@freebsd.org/msg05969.html
So I will take a look into the kernel sources.
Greetings
Juergen
Hello,
I have just found in the linux git history that there was a bug in the tun driver:
commit f271b2cc78f09c93ccd00a2056d3237134bf994c
Author: Max Krasnyansky
Date: Mon Jul 14 22:18:19 2008 -0700
tun: Fix/rewrite packet filtering logic
Please see the following thread to get some context on this
http://marc.info/?l=linux-netdev&m=121564433018903&w=2
Basically the issue is that current multi-cast filtering stuff in
the TUN/TAP driver is seriously broken.
Original patch went in without proper review and ACK. It was broken and
confusing to start with and subsequent patches broke it completely.
To give you an idea of what's broken here are some of the issues:
- Very confusing comments throughout the code that imply that the
character device is a network interface in its own right, and that packets
are passed between the two nics. Which is completely wrong.
- Wrong set of ioctls is used for setting up filters. They look like
shortcuts for manipulating state of the tun/tap network interface but
in reality manipulate the state of the TX filter.
- ioctls that were originally used for setting address of the the TX filter
got "fixed" and now set the address of the network interface itself. Which
made filter totaly useless.
- Filtering is done too late. Instead of filtering early on, to avoid
unnecessary wakeups, filtering is done in the read() call.
The list goes on and on :)
So I have to replace the kernel.
Greetings
Juergen
Juergen,
I posted a message describing a similar problem, but wanted to reply directly in hopes this reaches you. Posted below:
"I'm using an embedded build of kernel 2.6.35. My application sits between the tun and a serial port, mostly just passing IPv6 packets back and forth. When I run a ping6, I see the echo reply and write it to the tun; the kernel increments bytes rx'd on the tun etc., however ping6 *usually* doesn't get the data. I say usually, because there is one exception: the first ping6 works; all subsequent pings fail. Anyone seen this, or have a guess as to what the issue would be?
Juergen - Were you able to resolve the issue you were seeing? "
Thanks in advance - Matthew
Hello Matthew,
yes I have solved the problem by porting back the bugfix to the older kernel.
But kernel 2.6.35 should already contain this bugfix.
I guess that your problem could be related to the configuration of the tun/tap device
-> verify with ifconfig
Or there goes something wrong related to IPv6 neighborhoud discovery or something related.
-> verify with tcpdump
Greetings
Juergen
Excellent tutorial.
Has anyone come across this problem though, all the above works perfectly for me on Debian Etch 2.6.18-6-686, but on Debian Lenny 2.6.31 write always returns -1
E.g i can open a tap interface fine, do the ioctl without error, I can even read packets from the tap interface, however write always returns -1 errno 22 (invalid argument)
On Mac OS X there is a Kernel extensions to create virtual network interfaces: TunTap.
http://tuntaposx.sourceforge.net
Another nice multipurpose network tool: "socat". It's possible to create the simpletun-test with this tool. socat is a relay for bidirectional data transfer between two independent data channels. It has many options... :-)
http://www.dest-unreach.org/socat/
http://www.dest-unreach.org/socat/doc/socat.html#EXAMPLES
Yes, in fact I use socat all the time, and setting up a simpletun-like connection is a matter of one line of code on the client and one on the server (with the additional benefit that the server can be made forking so it doesn't terminate when the client disconnects, SSL can be added to the mix etc.). Just as an example with SSL and a self-signed cert (10.100.1.131:4444 is the address and port where the server is listening):
server# socat TUN:172.16.44.1/24,iff-up,tun-name=ssl0,tun-type=tun \OPENSSL-LISTEN:4444,pf=ip4,cipher=HIGH,method=TLSv1,verify=1,cert=selfsign.pem,key=selfsign.key,cafile=selfsign.pem,fork,su=nobody
client# socat TUN:172.16.44.2/24,iff-up,tun-name=ssl0,tun-type=tun \OPENSSL:10.100.1.131:4444,cipher=HIGH,method=TLSv1,verify=1,cert=selfsign.pem,key=selfsign.key,cafile=selfsign.pem,su=nobody
That's pretty much it, one may of course want to add routes etc.
Of course, as much as I like it, using socat this way doesn't give the same insight on the internal workings of a tun/tap interface.
Hi, Waldner,
This is an excellent tutorial!
I have a question about using tap interfaces. I'm trying to set up a tap interface on a host, where one end will be the network interface for a virtual machine. On the other side of the interface, I want to read all the outgoing VM traffic and send it to another host (or hosts) which also have VM's and taps running.
I've had limited success so far: I successfully capture the outgoing VM network traffic. However, it seems as though the host side of the tap interface is handling the traffic as well, and will even respond to the VM via the tap interface. I don't want it to do this! How do I disable the host's network stack for the tap interface?
Thanks,
Dan
Hi Dan,
how did you set it up? You didn't provide much detail, so there are just wild guesses:
Many virtualization platforms like KVM/virt-manager set up things so the communication between the VM and the host is via a tap interface. In those cases, the virtualization code attaches to the tap interface to relay the packets from/to the VM, so the host sees them as incoming/outgoing on the tap interface. If you're also trying to attach to the tap interface at the same time, I wouldn't be surprised (although I haven't tried) to see a race condition between your code and the virtualization code, where you step on each other's toes and end up reading only part of the data each, depending on who issues the read() call first. In this situation, packets that are not caught by your code are obviously caught by the virtualization code, which sends them to the host where they are processed.
If your code is the only entity attached to the interface and there is no contention, then you are somehow sending the packets to the tap interface thus making them visible to the host (as opposed to blocking them and sending them somewhere else, eg to another host via the network).
Finally, consider whether what you're trying to do could not be accomplished using the standard available tools (iproute2, iptables) without the need to write ad-hoc code.
Ok, I actually tried my first guess above, and if I try to connect to a tap interface that is in use (by a KVM virtual machine) I get ioctl(TUNSETIFF): Device or resource busy, so that hypothesis can be ruled out.
Still, it's difficult to tell what your problem may be without more information.
I have a tun which I use to forward packets to some other nodes (there is not a tunnel among them). The packets that send through the tun interface are received from the service but I cannot received the responses in the tun. Do you have any solution ?
You provide very little information, so it's difficult to tell. Check that the node running the service has a route back to the tun node. Also, what does it mean "there is not a tunnel"?
I am a newbie, when I run the code copy from simpletun.c, get this error, can I have some hints? Thanks.
if( (err = ioctl(fd, TUNSETIFF, (void *) &ifr)) < 0 ) {
close(fd);
printf("error: %s \n", strerror(errno));
return err;
error: Operation not permitted
To run that code, you have to either be root, or the interface must exist already and owned by the user you're running as.
I got that once posted the comment, thanks!
Could you advice I got nothing from
char buffer[];
nread = read(tun_fd,buffer,sizeof(buffer));
printf("Read %d bytes from device %s\n", nread, tun_name);
printf("Buffer Length: %d", sizeof(buffer));
printf("DATA: %s\n", buffer);
Thank you, this is a great tutorial.
I came accross it while trying to understand kvm networking. Helps a lot.
Very nice tutorial - Thanks
This is a noob question, but can you not use sysfs instead of ioctl to write to the tun/tap devices?
Strictly speaking, to write you use write(); ioctl() is used to set certain operating parameters on the interface. As far as I know, you cannot set those parameters via sysfs/procfs, but new information is always welcome.
It seems I spoke too soon. There are indeed a number of sysfs entries for a tun/tap interface (and for any interface, for that matter). For example:
however only few of them are writable. Some of those can also be set using iproute2 (mtu, tx_queue_len). So I think it can still be said that ioctl() are needed.
I've had no trouble opening the "/dev/net/tun" device, and creating the tap device. I don't persist the interface, since I only want the interface running when my program is running, and I'm using libevent to process the network packets (as opposed to using select). My purpose for this is basically to take a machine with two network ports (i.e. eth0, eth1), and virtualize them. By setting the physical ports up in promiscuous mode, and clearing all routes and IP addresses (ifconfig flush ethX), and instead assigning those routes and IP addresses to the tapX devices, then I have a situation where I can read every packet coming in the physical interface (using RAW sockets), write them to the TAP interface, and Vice versa. This way, the kernel will drop all the packets from the physical interfaces (after I get them in the raw socket), and I can firewall/view/modify all packets coming in and out of the system in user space. This is just background info, so you can understand where I'm coming from. I wonder if there are better, more standard ways of doing this, but that's not my real question here.
I use the following code to create my tap interface:
int tuntap_init(char *dev, int tun_or_tap) {
struct ifreq ifr;
int tuntap_fd, err;
char *tundev = "/dev/net/tun";
if((tuntap_fd = open(tundev, O_RDWR)) < 0) {
perror("opening /dev/net/tun");
return tuntap_fd;
}
memset(&ifr, 0, sizeof(ifr));
if(tun_or_tap) { //TAP
ifr.ifr_flags = IFF_TAP | IFF_NO_PI;
}
else { //TUN
ifr.ifr_flags = IFF_TUN;
}
/*
If a desired name is given, try that one
otherwise, a default name will be assigned
*/
if(*dev) {
strncpy(ifr.ifr_name, dev, IFNAMSIZ);
}
/*
Setup TUN/TAP device
*/
if((err = ioctl(tuntap_fd, TUNSETIFF, (void *)&ifr)) < 0) {
perror("ioctl(TUNSETIFF)");
return err;
}
}
So, the first time I run this, it works great, the tap device is created, I can set up the interfaces the way I want. The problem I have is that once I exit this program that created the tap device (ctrl-c or kill), I cannot ever get it to run again successfully without rebooting. After killing the program, the tap device does disappear from the network devices (as seen by ifconfig) as expected. Running the program a second time, however, generates the following error (note, this program creates tap0 and tap1 devices):
sudo ./taptest -1 eth0 -2 eth1
ioctl(TUNSETIFF): Invalid argument
Once I reboot, this problem goes away, and it works again 1 time. I've tried removing the tun kernel module and re-adding it (rmmod tun...modprobe tun), without any success.
What am I doing wrong?
Thanks!
Your code is a bit odd. How are you using tuntap_init(), and in particular, its return value? As written, after the function has been called, the program has no way to read/write data from/to the tun interface because no file descriptor is returned. Although it's not certain that this is related to your problem, I would start by adding a
return tuntap_fd;
at the end of your code (and making use of it in the caller).
Then, you may want to copy the interface name into the provided string, so the caller can see which name was chosen by the kernel (if it was asked to do so):
strcpy(dev, ifr.ifr_name);
The caller must reserve enough space in the buffer pointed to by "dev".
Also, depending on what you do afterwards, you may want to close(tuntap_fd) if the ioctl() returns error.
Sorry, somehow the rest of my code was truncated:
/*
If the system assigns a different name,
copy the name back to the dev name buffer
*/
strcpy(dev, ifr.ifr_name);
return tuntap_fd;
}
So - a little more investigation shows that iproute2 also has the same error:
#ip tuntap add tap0 mode tap
ioctl(TUNSETIFF): Invalid argument
Finally, I was able to get the program working again if:
ifconfig tap0 down
ifconfig tap1 down
rmmod -f tun
modprobe tun
Then it works again...
I discovered that I can create the tap devices using iproute2, and then connect to them later on in my program. Then, I simply leave the tap devices created always and I they persist after my program exits. Would this be a better way of doing this?
Thanks.
Yes, if you have a version of iproute2 recent enough, you can create interfaces with it. "Better" is subjective anyway, and depends on what you need to do. If you plan to use existing tools to connect to the tap interface, then it's probably quicker and easier to also use existing tools to create it (eg iproute2, tunctl, openvpn can all do it), so you don't have to bother with writing code yourself. If instead you need to do something special or specific to your task that cannot be accomplished with existing tools, then of course writing your own code (C or otherwise) is the way to go. Another reason for writing your own code is to learn and understand better how things work.
I'm using Debian Squeeze btw
Hi Waldner,
This is a great tutorial given the rather sketchy txt file that comes with the Tap Tun interface and has helped me get a tap interface up and running. I am trying to forward Ethernet frames to an external API which cannot see the linux protocol stack. I am nearly there but wondered if I am missing something. When I read in ethernet frames they do not seem to conform to spec. I see no pre-amble. I am assuming the interface is filtering that out but yet the source and dest mac addresses seem to be in the wrong place. As "read" is a blocking call, when a packet arrives at my tap interface which is set-up as instructed, I wait for the bytes read... for example a simple ICMP ping. I see the correct number of bytes.. I dump the packet buffer to a txt file and expect to at least see a length field at the specified place.. I can see the TCP header starting with the version but it seems to be in the wrong place. Am I missing something. Is there something I am seriously overlooking here? any help would be really appreciated as I am loosing sleep over this.
Best Regards
Rob
Hi Rob,
Difficult to tell without seeing the real thing, but in my experience, ethernet frames you read from the tap interface do not have a preamble, if by that you mean the "10101010" etc. pattern used at the physical level in real ethernet networks to synchronize with the start of the frame. The first byte of the frame you read from the tap interface is the first byte of the destination MAC, and the other fields follow.
Also if the frames you're reading are coming from the external application, make sure that they are correctly formed, that they are of type ethernet II/DIX, and also watch out for added fields like VLAN tags etc. Ah and of course if you can sniff the traffic with Wireshark or tcpdump, that will also be a great help. Finally, make sure that fields longer than one byte are using the correct endianness.
Hi Waldner, I worked out that the IFF_NO_PI Flag was not set. I was getting confused since the source MAC address was always 33:33:00:00:00:16 .. but worked out quickly what that was. My biggest problem is now wondering about the read function.. The problem I have is that I am trying to forward all ethernet traffic to an external API of a programmable device. Older Ethernet frames are easy since they have a length field but Ethernet II packets have a type field. I am assuming because the actually "read" command gets executed in Linux kernel space , that the kernel does not interrupt in the middle of a read, so I can safely assume when the read function returns say 200 as the number of bytes it has read, I can safely assume this is the entire frame.. Or can I? If I can't they are dangerous assumptions and the whole thing can fall over. But apart from that the interface is working
Glad you sorted it. I didn't think of the NO_PI flag in my previous reply, but yes, that's something that would give you extra stuff at the beginning, well spotted.
Regarding read(), although I'm not 100% sure, I'm reasonably confident that when reading tun/tap devices you read a whole packet/frame at a time. It's not written anywhere explicitly, but I've never seen otherwise. This is consistent with write() (you must write a whole packet/frame at once otherwise the packet is discarded because it's invalid) and it's also sensible because otherwise you would be forced to peek inside what you've read to determine whether what you got is a complete packet/frame or only a part of it, which would rapidly become complicated to handle correctly.
Update: it also seems that other programs that use tun/tap devices make the same assumption.
Hi Waldner, I wonder if you know if Ethernet Frame CRC is handled by the NIC rather than the TAP interface? and if there's a way of specifying which NIC the TAP will attach to as I have multiple cards & want to just see & forward traffic from eth1 not eth0.
Best Regards
Rob
Hi Rob,
regarding the CRC: it seems it's not part of the frame you get when read()ing the tap descriptor. It's also easy to verify with a simple test (eg, ping): a standard ping under Linux gives you 98 bytes of data from the tap descriptor; of these, 84 are the IP+ICMP stuff, and you have 14 bytes left which must be the ethernet data. Since the source and destination MAC are 6 bytes each, the type/length is another 2 bytes, 6+6+2=14 and thus there's nothing else.
Regarding traffic: that tap interface doesn't attach to any NIC; it is an interface on its own. You attach to it to read its traffic. What you get when reading the tap interface is the traffic that the kernel decides has to be routed through the tap interface. So, you have to manage this at the kernel/routing level (iptables, iproute2). You want that the kernel only send the traffic you're interested in to the tap interface.
Hi Waldner, I've been using the Tap interface for a while now but seem to have hit a little speed bump.
My Userland API which sends out packets uses 32bit Int values so I adjusted the Tap Read to read 32bit integers.. For some reason when I do a read I am always 8 bytes short in the value returned for the number of bytes read. TCP Dump on another machine shows this so I never get replys to my ICMP ping.. If I add 8 bytes to number of bytes read, the ping packet is valid so the data is there & it all works properly. However.. this is a hack & want to understand what's going on here.
I am assuming this is an issue with the Linux Read System call & 32bit integers.. have you seen this before? I could keep the rx buffer as char buffer but casting gets messy in C++ since the API takes a structure & a 32bit int pointer to data.
Sorry I'm not sure I follow you here. What does it mean "I adjusted the Tap Read to read 32bit integers"? The read() on the tap fd should read an entire packet or frame every time it's called. Not sure how or where 32 or 64-bit integers come in here.
What I mean is that originally I had declared the buffer uint8_t buffer[_MAX_MTU_SIZE]; and then called
bytes_read = read(_this->_tap_file_descriptor,
_this->_buffer,
sizeof(buffer);
but due to the API I need to call requiring packets to be a 32 but integer pointer & a length I adjusted the code so I'm using
uint32_t buffer[(_MAX_MTU_SIZE/4)+1];
bytes_read = read(_this->_tap_file_descriptor,
_this->_buffer,
sizeof(buffer);
the read command uses a void* for the buffer so I was assuming I could use a uint32_t buffer..
but on a standard ping I am missing 8 bytes
on a ping -s 512
IP truncated-ip - 456 bytes missing! 192.168.0.77 > 192.168.0.76: ICMP echo request, id 2796, seq 557, length 520
Sorry for being dense. The only API I see you calling there is read(), and read() wants a void * pointer, so it doesn't matter whether it points to an array of n 1-byte elements or an array of n/4 4-byte elements. What's the real reason you're doing that?
Also you're using _this->_buffer but the name of the array is just "buffer" with no underscore. Is that meant to be the same variable?
that's what I thought...
the reason is that when I make an Api Call the packet is a structure based on
int32_t* data;
int32_t length
..etc..etc..
so in C++ it saves trying to reinterpret cast a 32bit pointer onto an 8 bit Array although I can do that & it does work.. if the buffer is int32_t I can just say
buffer->data = &_test_buffer[0];
and be done with it
Rob,
I think you are running into an alignment problem, and it sounds like you are on a 64-bit machine. This would explain the missing 8 bytes.
You are better off using dynamic memory allocation, as the memory pointer will be properly aligned:
int32_t* data;
data = (int32_t *)malloc(_MAX_MTU_SIZE);
sorry please disregard the which NIC part.. need to engage brain before keyboard
Very good totorial. Thank you
I am having UDP packet loss on the client host somewhere between client application -> client host kernel -> client-host-tun device. These packets are missing from my application that reads from the tun. My tun is configured like so:
tun1 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
inet addr:192.168.13.5 P-t-P:192.168.13.5 Mask:255.255.0.0
UP POINTOPOINT RUNNING NOARP MULTICAST MTU:6500 Metric:1
RX packets:8617887 errors:0 dropped:0 overruns:0 frame:0
TX packets:6127019 errors:0 dropped:20 overruns:0 carrier:0
collisions:0 txqueuelen:100
RX bytes:551235232 (525.6 MiB) TX bytes:391798708 (373.6 MiB)
I am messing with the MTU and txqueuelen options.
My client application sends a burst of 64 byte UDP packets followed by a sync packet. I am using the sync packet to throttle the amount of outstanding packets in the previous burst. So in effect (or so it seems) I do not think I am over running any queues. I will send a burst of 20 packets, but only read say 12 or 15 (random amount each time) from the tun1 device. Where are they getting dropped?
I looked at /proc/net/snmp and I can see that all my expected UDP packets are logged here (no errors). Then I look at the "TX packets and dropped" counts from ifconfig. Usually I do not see any increase in ifconfig's reported dropped, but the TX packet counts do not match that found in /proc/net/snmp.
It seems the UDP packets are dropped at the tun, but not always logged as dropped..
Any ideas?
Thanks,
Steve
Perhaps it's not related at all, but this line:
inet addr:192.168.13.5 P-t-P:192.168.13.5
are you sure it's the way it's supposed to be?
Thanks for the response Waldner.
Very interesting. I am not entirely sure about this point-to-point config to be honest. What I am trying to do is configure the tun device to accept any packets headed to 192.168.x.x. For this I created tun using the famous tunctl like this:
tunctl -u sseeley -n -t tun1
ifconfig tun1 192.168.13.5/16 txqueuelen 50
This creates this funny p-t-p link to itself.
So I just tried:
ifconfig tun1 192.168.13.5/16 txqueuelen 50 pointopoint 192.168.13.13
I get the following:
tun1 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
inet addr:192.168.13.5 P-t-P:192.168.13.13 Mask:255.255.0.0
UP POINTOPOINT RUNNING NOARP MULTICAST MTU:1500 Metric:1
RX packets:64996 errors:0 dropped:0 overruns:0 frame:0
TX packets:370163 errors:0 dropped:59 overruns:0 carrier:0
collisions:0 txqueuelen:50
RX bytes:2169920 (2.0 MiB) TX bytes:39235816 (37.4 MiB)
But I still get random amounts of UDP packet loss. I've tried larger MTUs too.
I expected to be able to configure a tun device to accept this multicast of destination addresses. It seem to allow this since I can send udp packets to many listing servers on different hosts at the other end of my tunnel.. but I get this random UDP packet loss before it even arrives at my origination tun device. I even just tried no multicast (netmask == 255.255.255.255) and it is still the same.
I feel I am missing something fundamental. I do not think I should be getting UDP packet loss on the client host before packets arrive at tun. Am I wrong to think this?
Thanks,
Steve
Alright, it seems that the ifconfig output is misleading.
Ok, so how do you generate the traffic? For packets to be sent "out" the tun1 interface, they should have a destination IP address in the range 192.168.0.0/16 (except, of course, 192.168.13.5 itself). Alternatively, the host should have a route to whatever destination address is in the packet, and that route should point to the tun1 interface.
That way, if you attach an application to the descriptor corresponding to tun1, that application should receive the packets that the kernel routes out tun1.
So, how does your application connect to the tun1 interface descriptor, and how do you actually read the packets? Also, how does the application-level program generate the UDP datagrams?
Hi Waldner,
Thank you for the tutorial. It's very useful.
I'm having the following problem, and I'm hoping you could help me out. I'm creating a bridge, and then add to it two tap interfaces. No physical interface is added to the bridge. These are the commands:
brctl addbr test
ip tuntap add mode tap tap0
ip tuntap add mode tap tap1
ifconfig test up
ifconfig tap0 up
ifconfig tap1 up
brctl addif test tap0
brctl addif test tap1
The problem is that the bridge doesn't seem to work correctly. I sent through tap0 some broadcast frames (WOL frames), and they didn't reach tap1. I was sending packets with:
etherwake -b -i tap0 00:00:00:00:00:00
The tshark command for tap0 showed the frames being sent with tap0, but another tshark for tap1 didn't show them.
Then I added to tap0 the IP address 192.168.10.1/24, and did:
arping 192.168.1.2
I saw ARP request broadcast frames on tap0, but they didn't reach tap1.
This is the output of ifconfig for test, tap0, and tap1 interfaces:
root@computer:~# ifconfig test
test Link encap:Ethernet HWaddr 02:07:b1:eb:2c:2a
inet6 addr: fe80::944b:d9ff:fe10:b240/64 Scope:Link
UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:19 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:3377 (3.2 KiB)
root@computer:~# ifconfig tap0
tap0 Link encap:Ethernet HWaddr 02:07:b1:eb:2c:2a
inet addr:192.168.10.0 Bcast:192.168.10.255 Mask:255.255.255.0
inet6 addr: fe80::7:b1ff:feeb:2c2a/64 Scope:Link
UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:278 overruns:0 carrier:0
collisions:0 txqueuelen:500
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
root@computer:~# ifconfig tap1
tap1 Link encap:Ethernet HWaddr b2:ee:2c:f9:d5:0d
inet6 addr: fe80::b0ee:2cff:fef9:d50d/64 Scope:Link
UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:19 overruns:0 carrier:0
collisions:0 txqueuelen:500
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
The output of the brctl:
root@computer:~# brctl show
bridge name bridge id STP enabled interfaces
pan0 8000.000000000000 no
test 8000.0207b1eb2c2a no tap0
tap1
root@computer:~# brctl showmacs test
port no mac addr is local? ageing timer
1 02:07:b1:eb:2c:2a yes 0.00
2 b2:ee:2c:f9:d5:0d yes 0.00
The output of route:
root@computer:/home/iszczesniak# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
192.168.2.0 0.0.0.0 255.255.255.0 U 2 0 0 eth1
192.168.10.0 0.0.0.0 255.255.255.0 U 0 0 0 tap0
169.254.0.0 0.0.0.0 255.255.0.0 U 1000 0 0 eth1
0.0.0.0 192.168.2.1 0.0.0.0 UG 0 0 0 eth1
What am I doing wrong?
Thanks,
Irek
Hi Irek,
I think you've got it backwards. The WOL frame is sent to tap0, which means it goes out "on the wire". There, you should have a program attached to the tap descriptor that catches the frame and does something with it. If there is no such program, the frame is dropped. It would be no different if you added, say, eth0 to the bridge and ran "etherwake -i eth0": the frame would be sent out the network card onto the LAN, and would not appear in the bridge. Incoming frames, on the other hand, appear in the bridge as appropriate.
So in other words, for your frame to show on tap1, you have to set up things so that the WOL frame is incoming to the bridge; which means, for example, connect a VM to tap0, and generate the WOL frame from the VM. This way, the bridge will see the wOL frame incoming from the tap0 "port", and will broadcast it (or whatever) as appropriate. Also, using "etherwake -i test" (where "test" is the bridge name) should work too.
Thanks for your response, Waldner.
In my original case when in which nothing is connected to the tap0, why is the wire leaving tap0 going to the application that should be connected to the tap0 interface, and not to the bridge? It seems like tap0 has two wires going different ways. So is the configuration something like this?
bridge
| ^
v |
tap0 |
intf |
| |
v |
application
The configuration above seems like a good bet, because when the application is a VM, and when I send frames from the VM to the MAC address of tap0, the frames are received by the "test" bridge.
You are right: when I connect a VM to tap0 and send broadcast frame from the VM, I can see it on tap1. I get it: the "wire" is connected to the NIC on the VM.
You are also right that when I send broadcast frames to my "test" bridge, the frames get to both tap0 and tap1. But why? Interfaces "tap0" and "test" look the same as reported by ifconfig, and yet they are different.
Moreover, why is the MAC address of the bridge the same as the MAC address of the interface last added to it?
Why does a bridge have a MAC address at all? It shouldn't have an address! After all, hardware switches don't have MAC addresses.
Yes, the situation is how you depict it in the ASCII diagram. When you do "etherwake -i tap0", the frame goes to the application at the bottom. Remember that the bottom part of the diagram represent the "wire" to the kernel. As I said, it would work exactly the same if you did "etherwake -i eth0", except in that case the wire is a real wire (the LAN cable).
Also, imagine that you had no bridge; doing "etherwake -i tap0" in that situation, and doing the same when tap0 is part of a bridge, must work the same way in both cases (same for eth0, etc). Adding the interface to a bridge should not generally change its semantics.
Strictly speaking, a bridge does not need to have a MAC address (in fact, I believe most low-end, cheap unmanaged bridges and switches have no MAC address). However, that way your linux box would just sit there moving frames and nothing else, strictly acting as a bridge, not differently from a 15€ switch, which would be a bit of a waste. If you want to use the machine for something else, you want to assign an IP address to the bridge interface, so you will be able to receive and send local traffic in addition to the bridged traffic. Having an IP address, means you also need to have a layer 2 address (ie, a MAC address here). Under Linux, the bridge interface automatically takes the lowest MAC address of all the enslaved interfaces (see this article for more information on the implications of this).
What I like is a minimal design, and a bridge doesn't need a MAC address. I understand that a Linux box might offer more than a regular switch, and for that you need a MAC address. But the services should be provided by a new tap interface added to the bridge. I believe that the bridge should not even be shown by ifconfig.
That's interesting that the bridge takes the lowest MAC address of the bridge interfaces. I wonder why this is needed.
Well in that case I think you may want to look into something else, as (as far as I am aware) I don't think Linux offers a way to change the way it currently works (short of modifying the kernel code, of course).
Waldner, thanks again for the tutorial and the comments. I think I need to experiment with the bridges more.
Hi Waldner,
Thank you for this detailed tutorial :)
I am trying to create a tunnel from my linux server to my windows client. The problem is I am getting an invalid packet once I write it to the windows tun driver. Wiresharks says that it has a "Bogus IP header length (0, must be at least 20)" I believe it has something to do with. IFF_NO_PI flag. Is the 4 byte tun header needed when writing a data to tun?
I already done some research and found out that the Win32 tap driver doesn't prepend the 4 byte tun header. Is there anything I should do first to the read packet from win32 tap driver before writing it to a linux tun driver?
I'm afraid I have no experience with tun/tap under Windows. Have you tried to use IFF_NO_PI at both sides?
Thanks the problem was solved with setting IFF_NO_PI on the linux server.
I still had one problem. I already created my server and client that uses udp. the problem is sometimes the data that is sent from client to server is received out of order. I already expected that to happen because of the nature of UDP. I still continued with my UDP server/client thinking that would be fine since I also expected that once the data is written to the tun driver the TCP layer would handle the error correction. And it did, it actually worked but the upload speed of the client is so slow (1-2kb/s) while the download speed is averaging at 1-5mbps.
Could that be the effect of the retransmission of packets? Do you have any ideas on how to make UDP more reliable?
Hello all,
I'm Trying to let this simple tun work on UDP.
But I don't get it working.
What I did is to comment out the listen and accept part of TCP because I want to use UDP.
I also changed the sock fd to UDP style:
code:
-----------------------------------------------
if ( (sock_fd = socket(AF_INET, SOCK_DGRAM, 0)) < 0) {
perror("socket()");
exit(1);
}
-----------------------------------------------
I'm not familiar with the tap interface of Linux so the while loop is a bit fuzzy for me.
When i compile and start the client and server i see the creation of a UDP socket of 55555 with an established note by netstat.
Howeven when i send data from the client to the server i see the debug with the amount of bytes to the tap interface and the amount of bytes to the network, this data is not coming to the server.
When i initiate a ping from the server to the client (not is debug mode) i see text popping up on the screen, this text is the
datafield of a ICMP packet that i try to send to the client side of the tunnel.
I'm doing something stupid but i just can't figure it out...........
Thanks in advance!
If somebody has an example that would also be very handy!
Thanks
DoDo
I have to look into this, as I've never tried to implement the connection over UDP. I'm a bit busy ATM, but I hope I'll be able to look into it soon.
Hi! I tested this thing on my PC (Ubuntu 11.04) and worked well. It created a tun interface and logging the incoming packages but when I tried on my router (OpenWRT - Backfire-rc4 ) nothing happend. I tried to figure it out, but seems that ping -I tun0 send the data, but the program is waiting at nread(). Is anybody have an idea why this isn'T workin?
Are you pinging a non-local IP address (ie, one that would cause the kernel to actually send the data out, as opposed to replying directly)?
No I'm pinging non loopback address... (I know unix like skip them to send)
root@OpenWrt:~# ping -I tun0 192.168.2.22
PING 192.168.2.22 (192.168.2.22): 56 data bytes
^C
--- 192.168.2.22 ping statistics ---
13 packets transmitted, 0 packets received, 100% packet loss
./tuntap
Waiting for data in
ifconfig:
tun0 Link encap:Ethernet HWaddr 56:F3:A8:84:D3:42
inet addr:192.168.2.2 Bcast:192.168.2.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:1 carrier:0
collisions:0 txqueuelen:500
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
Another guess...firewall rules blocking outgoing packets?
This is a test router with no rules...
iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
Another strange issue, that if I create the tun interface with openvpn I can't connect to it. I read similar error up in comments but in a different situation...
openvpn --mktun --dev tun0 --dev-node /dev/net/tun --user root
ifconfig tun0 192.168.2.2 netmask 255.255.255.0
root@OpenWrt:/tmp# ifconfig tun0
tun0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
inet addr:192.168.2.2 P-t-P:192.168.2.2 Mask:255.255.255.0
UP POINTOPOINT RUNNING NOARP MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:100
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
The error:
ioctl(TUNSETIFF): Invalid argument
Error connecting to tun/tap interface tun0!
I've noticed it doesn't seem to work with kernel 2.6.38-11
(This worked with 2.6.32-34 - I saw traffic via tshark)
host$ sudo openvpn --mktun --dev tun3 --user myself
Thu Oct 20 22:02:50 2011 TUN/TAP device tun3 opened
Thu Oct 20 22:02:50 2011 Persist state set to: ON
host$ sudo ip link set tun3 up
host$ sudo ip addr add 10.0.0.1/24 dev tun3
host$ ifconfig tun3
tun3 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
inet addr:10.0.0.1 P-t-P:10.0.0.1 Mask:255.255.255.0
UP POINTOPOINT RUNNING NOARP MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:100
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
host$ ping 10.0.0.2 (10.0.0.2) 56(84) bytes of data.
From 10.0.0.1 icmp_seq=1 Destination Host Unreachable
From 10.0.0.1 icmp_seq=2 Destination Host Unreachable
From 10.0.0.1 icmp_seq=3 Destination Host Unreachable
--- 10.0.0.2 ping statistics ---
5 packets transmitted, 0 received, +3 errors, 100%packet loss, time 4018ms
I'm not sure what you are expecting that to do. If you don't connect any program to the tun interface to catch outgoing traffic, packets will be lost.
Apologies, I didn't state my observation clearly ...
When running the above (which basically matches the example you provided)
the behaviour is different depending on kernel versions.
under kernel 2.6.32
When you start pinging 10.0.0.2 you will see packets on tshark -i tun3
under kernel 2.6.38
when pinging 10.0.0.2 *nothing* appears on tshark -i tun3
Behaviour under 2.6.38 is different, for me at least:)
Is that what should happen.
(I should also thank you for the very useful tutorial btw)
Thanks, you're correct (I wasn't aware of this). It seems that behavior was changed between 2.6.35 and 2.6.36, specifically by this patch. Basically, earlier a tun device was always up and running from the moment it was created, while now it needs to see some process attached to the special fd to become up (ie, to get the "carrier").
If no process is attached, the test you are running (and which I too run in the article) now fails because the interface is down and packets are dropped and not "transmitted". I've put a note in the article to point out that what is described there works only with kernels < 2.6.36.
I understand the patch does the right thing, as having a process attached to the tun fd is the equivalent of having the "link up" for a tun interface; however, for the purposes of the article, this is a bit of a loss because the simple test described there to show how the interface works cannot be done anymore.
Thanks!
Then I'm out of ideas. Make sure you're checking the result of every system call in the code, there may be a failure somewhere.
The invalid flag argument was a mistake (I had different version of this flag in the compiler and in the kernel)But still not working on the router :S
Hello,
you need also to replace the send() and recv() parts.
Here is an example, which uses IPv6 multicasts:
int make_socket (uint16_t port) { int sock; struct sockaddr_in6 name; memset(&name,0,sizeof(name)); /* Create the socket. */ sock = socket (PF_INET6, SOCK_DGRAM, 0); if (sock < 0) { ERROR_OUTPUT("socket failed: %s",strerror(errno)); exit (EXIT_FAILURE); } /* Give the socket a name. */ name.sin6_family = AF_INET6; name.sin6_port = htons (port); if (bind (sock, (struct sockaddr *) &name, sizeof (name)) < 0) { perror ("bind"); exit (EXIT_FAILURE); } return sock; } [...] char* remote_ip= "FF02::1"; struct sockaddr_in6 destination_socket; tunnel_fd = make_socket(port); if(tunnel_fd < 0) exit(EXIT_FAILURE); // initialize the destination socket, which is used to simulate // the send path of the lower layer memset(&destination_socket,0,sizeof(destination_socket)); destination_socket.sin6_family = AF_INET6; retval = inet_pton(AF_INET6,remote_ip,&destination_socket.sin6_addr); destination_socket.sin6_port = htons(port); if(netdevicename) { // set the IPv6 scope id, if the user has selected a netdevice struct ifreq netdevice; strncpy(netdevice.ifr_name,netdevicename,IFNAMSIZ); // read the interface index retval=ioctl(tunnel_fd,SIOCGIFINDEX,&netdevice); if(retval < 0) { ERROR_OUTPUT("Failed to read the interface index of %s: %s", netdevicename,strerror(errno)); exit(EXIT_FAILURE); } destination_socket.sin6_scope_id = netdevice.ifr_ifindex; } [...] retval = sendto(tunnel_fd,telegram, telegram_len,0, (struct sockaddr*) &destination_socket, sizeof(destination_socket)); if(retval < 0) { ERROR_OUTPUT("sendto failed: %s",strerror(errno)); break; // leave the while loop } [...] rx_telegram_len=recvfrom(tunnel_fd,rx_telegram, sizeof(rx_telegram),0, (struct sockaddr *) &name, &size); if(rx_telegram_len < 0) { ERROR_OUTPUT("Failed to read datram from lower layer: %s", strerror(errno)); break; // leave the while loop }If you want to use IPv4, then create the socket as you have already posted. But you need also an IPv4 destination socket.
Greetings
Juergen
ups,
the above was meant as a reply to the post of DoDo.
I've been trying to write ethernet frames to the kernel using a TAP interface, but when I create the tap interface, no /dev/tapX is ever created. If I create one using "mknod /dev/tap0 c 36 16" I get an error message saying "no such device or address". What's going on? How do I insert ethernet frames directly in to the kernel? Where and how should the /dev/tapX device get created because it is not happening at all for me?
I'm not sure where you got the idea that /dev/tapX should be created. No device files exist under linux representing network interfaces, so it's perfectly normal that you don't see a /dev/tap0 device (in the same way that you don't see /dev/eth0, or whatever).
Well, I got it from the tun/tap txt file that describes how there are two userland application interfaces:
" - /dev/tunX - character device;
- tunX - virtual Point-to-Point interface.
Userland application can write IP frame to /dev/tunX
and kernel will receive this frame from tunX interface.
In the same time every frame that kernel writes to tunX
interface can be read by userland application from /dev/tunX
device."
I'm trying to write data to the kernel. Is there a way to do that? It seemed like from the txt file that there are two ways to write to it from a user level application, but I can only find one way (the tunX virtual point-to-point interface, which if I write to it, it goes "out on the wire" and not to the kernel).
Thanks for your response. I would appreciate any help you could provide.
It looks like you're looking at an old version of the documentation (ie for 2.4.x), because in more recent kernels that file does not mention /dev/tunX, see for example the document at http://www.kernel.org/doc/Documentation/networking/tuntap.txt.
Anyway, to write data to the kernel (although it's not much clear what you mean with that; you're probably trying to do something else, which you don't explain): briefly, as explained in the tutorial, you have to open /dev/net/tun to get a file descriptor which can then be used to send/receive packets to/from the kernel. It's all explained in the article, including sample C code.
Ah, I see, I was looking at an old version. I guess the problem I'm having is that when I open the file descriptor as described in the article and write some ethernet frames to it (using a tap interface), the kernel doesn't respond to the frames, like if the ethernet frames are ARP frames asking for the MAC address of the IP assigned to the tap0 interface, if I write them to tap0, I can see them with tcpdump, but the kernel does not send ARP replies back to my user application, like I would expect...
Ok, nevermind, I think I was just leaving out the IFF_NO_PI flag because now it seems to be working. Thanks for you help.
I have a question regarding this tutorial and android. I want to create a tun device on a android tablet and send the layer 2 data to a java program. Do you have a suggestion of how to interface to the tun device from an android app.
Sorry no, I don't have any suggestion. I suppose that, since Android is Linux (well, kind of sort of), you may be able to connect to the tun interface as explained in the article or use existing tools, but I have not tried so this is all guessing.
Hi waldner,
Thank you for this tutorial. Let me first describe what I am trying to achieve. I'm capturing packets from pcap, decapsulating my already encapsulated packets and then pushing them back to tap interface. These packets are ethernet packets, with proper header and checksum. The IP address in these packets are same as that of the tap interface. So, I'm expecting that tap will forward the packet up in the layer. These packets do show up in tshark. But the tap interface drops the packets and does not forward them up in the network stack. From your conversation with Irek, it seems that tap considers these packets as outgoing packets and sends to wire, instead of pushing them up in the network stack. Is there a way that I can push the packets up in the layer using tap interface?
I'm not sure I understand what you're trying to do. What is your goal?
Thank you for this writeup. It was most helpful.
Hello, I have a problem of routing traffic on two virtual interfaces I have created on my machine (CentOs6)
By using tunctl I created two virtual interfaces tap1 and tap2
let s imagine I gave them two different address
tap1: 10.1.1.1 net 255.255.255.0
tap2: 10.1.2.1 net 255.255.255.0
I m receiving traffic on my real interface eth0: 192.168.1.23 net 255.255.255.0
I tried by using brctl and iptables to send some traffic to tap1 and others to tap2.. unfortunately i m not able to get how to do it.
An example of splitting may be.. let s send all the icmp packets to tap2 and the others on tap1. Is it possible somehow? Can you help me with some instructions?
Regards
Not sure what you're trying to do with brctl...anyway, this is a routing issue and has nothing to do with tun/tap interfaces.
I think, if I understand you correctly, that you need policy routing rules to route traffic differently based on protocol. In your specific example of tap1 and tap2, you could create a second routing table where traffic is routed out tap2, then mark ICMP traffic with iptables, and finally add a routing rule which uses the alternate routing table for marked packets. You can find some theory at http://linux-ip.net/html/routing-tables.html and some examples at http://linux-ip.net/html/adv-multi-internet.html and generally googling for "linux policy routing" should turn up something.
mmm what I would like to do is to monitor the traffic with tcpdump for example.
and when I wanna do that, for sure I need to give them an interface, isnt ?
Ok, when you check some info into the traffic it s better to reduce it somehow just selecting the one you more need to check. In this way the process is less stressed overall when it receives high rate traffic.
Because of that I would like to route traffic incoming from eth0 into two virtual interfaces, the place where I attach a monitoring software, as tshark, tcpdump, etc.
What do you think?
Sorry, I don't understand what you're asking. If you're using tcpdump, you can specify a specific interface with -i or the special interface "any" which captures traffic on all interfaces (but not in promiscuous mode).
If you want to capture only a certain type of traffic, you can specify filters to tcpdump, for example
tcpdump -i eth0 icmpwill capture only ICMP traffic, ortcpdump -i eth0 tcp port 80will capture (hopefully) only HTTP traffic, etc. The manual page for tcpdump, or pcap-filter, provides all the details on the syntax to use for filtering.Hope this answers your question.
Yes I know how tcpdump works :)
As you suggested it is good just for specific case, I want to be able to select traffic for instance even based on the packet length. That s why I need iptables (mangle / nat), because I have more options in order to split the traffic in more interfaces.
Why do I want to do that? Because when too much traffic is coming, tcpdump may be not able to manage it all. So, by launching more processes of it, with different traffic to monitor, less traffic is lost by the kernel.
tcpdump -i tun1
tcpdump -i tun2
(..)
tcpdump -i tunN
I can open more processes on more cores.
So in my opinion what I should do is to split the traffic in more interfaces. In such a way, I can use the same interfaces for other applications (i.e. snort)
What I wanted to know from you is:
having known that the traffic is coming from and going to the real interface eth0, I want to send a copy of it (selected by filter of iptables) to tun1, tun2, .. tunN.
I saw option of ip route 2 (tee, --gw, .. ) but they dont work. DO you know easier way to suggest?
Sorry for disturbing,
Thanks a lot for your help
I don't know if there's an easy way to do what you want. Perhaps using the TEE target of iptables, but it's just a wild guess.
To be honest, if your traffic is coming from a SPAN port or equivalent (ie, not destined to the machine where tcpdump is running), I think iptables wouldn't even see it.