Some common networking operations in Perl

Posted by waldner on 11 December 2014, 10:09 am

A compilation of common operations that are often needed when writing networking code. Hopefully this saves some googling.

The examples will use Perl code, however from time to time the C data structures will be cited and the C terminology will be used. IPv4 and IPv6 will be covered.

Links to sample programs used in the examples: getaddrinfo.pl, getnameinfo.pl. A reasonably new version of Perl is required (in particular, 5.14 from Debian Wheezy is not new enough).

Socket addresses

In C, there's this notion of a "socket address", which is basically the combination of an IP address and a port (and other data, but address and port are the essential pieces of information). Here are the C data structures for IPv4 and IPv6:

/* IPv4 */
struct sockaddr_in {
    sa_family_t    sin_family; /* address family: AF_INET */
    in_port_t      sin_port;   /* port in network byte order */
    struct in_addr sin_addr;   /* internet address */
};

/* IPv6 */
struct sockaddr_in6 {
    sa_family_t     sin6_family;   /* AF_INET6 */
    in_port_t       sin6_port;     /* port number */
    uint32_t        sin6_flowinfo; /* IPv6 flow information */
    struct in6_addr sin6_addr;     /* IPv6 address */
    uint32_t        sin6_scope_id; /* Scope ID (new in 2.4) */
};

In C, lots of networking-related functions accept or return these structures (or, often, pointers to them). The connect() and the bind() functions are two notable examples.
In fact, in the C function prototypes the generic struct sockaddr type is used, which doesn't really exist in practice (although it has a definition); either a sockaddr_in or a sockaddr_in6 must be used, after casting it to sockaddr.

The actual IP addresses are themselves structs, which are defined as follows:

/* IPv4 */
struct in_addr {
    uint32_t       s_addr;         /* address in network byte order */
};

/* IPv6 */
struct in6_addr {
    unsigned char   s6_addr[16];   /* IPv6 address */
};

Then there's the more recent struct addrinfo, which includes a sockaddr member and, additionally, more data:

struct addrinfo {
    int              ai_flags;       // AI_PASSIVE, AI_CANONNAME, etc.
    int              ai_family;      // AF_INET, AF_INET6, AF_UNSPEC
    int              ai_socktype;    // SOCK_STREAM, SOCK_DGRAM
    int              ai_protocol;    // use 0 for "any"
    size_t           ai_addrlen;     // size of ai_addr in bytes
    struct sockaddr  *ai_addr;       // struct sockaddr_in or _in6
    char             *ai_canonname;  // full canonical hostname
    struct addrinfo  *ai_next;       // linked list, next node
};

This strcture is used by a class of newer, address-family-independent functions. In particular, code is expected to deal with linked lists of struct addrinfo, as indicated by the fact that the ai_next member points to the same data structure type.

From sockaddr to (host, port, ...) data and viceversa

If we have a Perl variable that represents a sockaddr_in or a sockaddr_in6 (for example as returned by recv()), we can extract the actual member data with code similar to the following:

# IPv4
use Socket qw(unpack_sockaddr_in);
my ($port, $addr4) = unpack_sockaddr_in($sockaddr4);

# IPv6
use Socket qw(unpack_sockaddr_in6);
my ($port, $addr6, $scopeid, $flowinfo) = unpack_sockaddr_in6($sockaddr6);

Note that $addr4 and $addr6 are still binary data; to get their textual representation a further step is needed (see below).

Conversely, if we have the individual fields of a sockaddr, we can pack it into a sockaddr variable as follows:

# IPv4
use Socket qw(pack_sockaddr_in);
$sockaddr4 = pack_sockaddr_in($port, $addr4);

# IPv6
use Socket qw(pack_sockaddr_in6);
$sockaddr6 = pack_sockaddr_in6($port, $addr6, [$scope_id, [$flowinfo]]);

Again, $addr4 and $addr6 must be the binary versions of the addresses, not their string representation.

As a convenience, it is possible to use the sockaddr_in() and sockaddr_in6() functions as shortcuts for both packing and unpacking:

# IPv4
use Socket qw(sockaddr_in);
my ($port, $addr4) = sockaddr_in($sockaddr4);
my $sockaddr4 = sockaddr_in($port, $addr4);

# IPv6
use Socket qw(sockaddr_in6);
my ($port, $addr6, $scopeid, $flowinfo) = sockaddr_in6($sockaddr6);
$sockaddr6 = sockaddr_in6($port, $addr6, [$scope_id, [$flowinfo]]);

From binary address to string representation and viceversa

If we have a binary IP address, we can use inet_ntop() and inet_pton() to convert it to a string (printable) representation:

# IPv4
use Socket qw(AF_INET inet_ntop);
$straddr4 = inet_ntop(AF_INET, $addr4);

# IPv6
use Socket qw(AF_INET6 inet_ntop);
$straddr6 = inet_ntop(AF_INET6, $addr6);

And the reverse process, from string to binary:

# IPv4
use Socket qw(AF_INET inet_pton);
$addr4 = inet_pton(AF_INET, $straddr4);

# IPv6
use Socket qw(AF_INET6 inet_pton);
$addr6 = inet_pton(AF_INET6, $straddr6);

All these functions fail if the argument to be converted is not a valid address in the respective representation.

Get sockaddr data from a socket variable

Sometimes it is necessary to know to which local or remote address or port a certain socket is associated. Typically we have a socket variable (for example, obtained with accept()), which in Perl can be stored in a handle, and we want the corresponding sockaddr data. So here's how to get it:

# Get remote sockaddr info from socket handle
$remotesockaddr = getpeername(SOCK);

# then, as already shown...

# IPv4
($port, $addr4) = sockaddr_in($remotesockaddr);

# or IPv6
($port, $addr6, $scopeid, $flowinfo) = sockaddr_in6($remotesockaddr);

To get sockaddr information for the local end of the socket, getsockname() is used:

# Get local sockaddr info from socket
$localsockaddr = getsockname(SOCK);
...

Note that depending on the protocol (TCP or UDP) and/or the bound status of the socket, the resuts may or may not make a lot of sense, but this is something that the code writer should know.

From hostname to IP address and viceversa

There are two ways to perform this hyper-common operation: one is older and deprecated, the other is newer and recommended.

The old way

The older way, which is still extremely popular, is somewhat protocol-dependent. Here it is:

# List context, return all the information
($canonname, $aliases, $addrtype, $length, @addrs) = gethostbyname($name);

As an example, let's try it with www.kernel.org:

#!/usr/bin/perl
 
use warnings;
use strict;
 
use Socket qw ( :DEFAULT inet_ntop );
 
my ($canonname, $aliases, $addrtype, $length, @addrs) = gethostbyname('www.kernel.org');
 
print "canonname: $canonname\n";
print "aliases: $aliases\n";
print "addrtype: $addrtype\n";
print "length: $length\n";
print "addresses: " . join(",", map { inet_ntop(AF_INET, $_) } @addrs), "\n";

Running the above outputs:

canonname: pub.all.kernel.org
aliases: www.kernel.org
addrtype: 2
length: 4
addresses: 198.145.20.140,149.20.4.69,199.204.44.194

So it seems there's no way to get it to return IPv6 addresses.

gethostbyname() can also be run in scalar context, in which case it just returns a single IP(v4) address:

# Scalar context, only IP address is returned
$ perl -e 'use Socket qw (:DEFAULT inet_ntop); my $a = gethostbyname("www.kernel.org"); print inet_ntop(AF_INET, $a), "\n";'
149.20.4.69
$ perl -e 'use Socket qw (:DEFAULT inet_ntop); my $a = gethostbyname("www.kernel.org"); print inet_ntop(AF_INET, $a), "\n";'
198.145.20.140
$ perl -e 'use Socket qw (:DEFAULT inet_ntop); my $a = gethostbyname("www.kernel.org"); print inet_ntop(AF_INET, $a), "\n";'
199.204.44.194

Normal DNS round-robin.

The inverse process is done with gethostbyaddr(), which supports also IPv6, though it's deprecated nonetheless. Again, the results differ depending on whether we are in list or scalar context (remember that all addresses have to be binary):

# List context, return more data

# IPv4
use Socket qw(:DEFAULT)
my ($canonname, $aliases, $addrtype, $length, @addrs) = gethostbyaddr($addr4, AF_INET);

# IPv6
use Socket qw(:DEFAULT)
my ($canonname, $aliases, $addrtype, $length, @addrs) = gethostbyaddr($addr6, AF_INET6);

In these case, of course, the interesting data is in the $canonname variable.

In scalar context, only the name is returned:

# scalar context, just return one name
use Socket qw(:DEFAULT);
my $hostname = gethostbyaddr($addr4, AF_INET);

# IPv6
use Socket qw(:DEFAULT);
my $hostname = gethostbyaddr($addr6, AF_INET6);

Note that, again, in all cases the passed IP addresses are binary.

The new way

The new and recommended way is protocol-independent (meaning that a name-to-IP lookup can return both IPv4 and IPv6 addresses) and is based on the addrinfo structure mentioned at the beginning. The forward lookup is done with the getaddrinfo() function. The idea is that, when an application needs to populate a sockaddr structure, the system provides it with one already filled with data, which can be directly used for whatever the application needs to do (eg, connect() or bind()).
In fact, getaddrinfo() returns a list of addrinfo structs (in C it's a linked list), each with its own sockaddr data, so the application can try each one in turn, in the same order that they are provided. (Normally the first one will work, without needing to try the next; but there are cases where having more than one possibility to try is useful.)

The C version returns a pointer to a linked list of struct addrinfo; with Perl it's easier as the list is returned in an array. The sample Perl code for getaddrinfo() is:

use Socket qw(:DEFAULT getaddrinfo);
my ($err, @addrs) = getaddrinfo($name, $service, $hints);

If $err is not set (that is, the operation was successful), @addrs contains a list of results. Since in Perl there are no structs, each element is a reference to a hash whose elements are named after the struct addrinfo members.

However, there are a few things to note:

getaddrinfo() can do hostname-to-address as well as service-to-port-number lookups, hence the first two arguments $name and $service. Depending on the actual task, an application might need to do just one type of lookup or the other, or both. In this paragraph we will strictly do hostname resolution; in the following we will do service name resolution.
getaddrinfo() is not only IP-version agnostic (in that it can return IPv4 and IPv6 addresses); it is also, so to speak, protocol (TCP, UDP) and socket type (stream, datagram, raw) agnostic. However, suitable values can be passed in the $hints variable to restrict the scope of the returned entries. This way, an application can ask to be given results suitable only for a specific socket type, protocol or address family. But this also means that, if everything is left unspecified, the getaddrinfo() lookup may (and usually does) return up to three entries for each IP address to which the supplied name resolves: one for protocol 6, socket type 1 (TCP, stream socket), one for protocol 17, socket type 2 (UDP, datagram socket) and one for protocol 0, socket type 3 (raw socket).
As briefly mentioned, the last argument $hints is a reference to a hash whose keys provide additional information or instructions about the way the lookup should be performed (see example below).

Let's write a simple code snippet to check the above facts.

#!/usr/bin/perl
 
use warnings;
use strict;
 
use Socket qw(:DEFAULT AI_CANONNAME IPPROTO_TCP IPPROTO_UDP IPPROTO_RAW SOCK_STREAM SOCK_DGRAM SOCK_RAW getaddrinfo
              inet_ntop inet_pton);
 
# map protocol number to name
sub pprotocol {
  my ($proto) = @_;
  if ($proto == IPPROTO_TCP) {
    return 'IPPROTO_TCP';
  } elsif ($proto == IPPROTO_UDP) {
    return 'IPPROTO_UDP';
  } else {
    return 'n/a';
  }
}
 
# map socket type number to name
sub psocktype {
  my ($socktype) = @_;
  if ($socktype == SOCK_STREAM) {
    return 'SOCK_STREAM';
  } elsif ($socktype == SOCK_DGRAM) {
    return 'SOCK_DGRAM';
  } elsif ($socktype == SOCK_RAW) {
    return 'SOCK_RAW';
  } else {
    return 'unknown';
  }
}
 
die "Must specify name to resolve" if (not $ARGV[0] and not $ARGV[1]);
 
my $name = $ARGV[0] or undef;
my $service = $ARGV[1] or undef;
 
# we want the canonical name on the first entry returned
my $hints = {};
if ($ARGV[0]) {
  $hints->{flags} = AI_CANONNAME;
}
 
my ($err, @addrs) = getaddrinfo ($name, $service, $hints);
 
die "getaddrinfo: error or no results" if $err;
 
# If we get here, each element of @addrs is a hash
# reference with the following keys (addrinfo struct members):
 
# 'family'      (AF_INET, AF_INET6)
# 'protocol'    (IPPROTO_TCP, IPPROTO_UDP)
# 'canonname'   (Only if requested with the AI_CANONNAME flag, and only on the first entry)
# 'addr'        This is a sockaddr (_in or _in6 depending on the address family above)
# 'socktype'    (SOCK_STREAM, SOCK_DGRAM, SOCK_RAW)
 
# dump results
for(@addrs) {
 
  my ($canonname, $protocol, $socktype) = (($_->{canonname} or ""), pprotocol($_->{protocol}), psocktype($_->{socktype}));
 
  if ($_->{family} == AF_INET) {
 
    # port is always 0 when resolving a hostname
    my ($port, $addr4) = sockaddr_in($_->{addr});
 
    print "IPv4:\n";
    print "  " . inet_ntop(AF_INET, $addr4) . ", port: $port, protocol: $_->{protocol} ($protocol), socktype: $_->{socktype} ($socktype), canonname: $canonname\n";
  } else {
 
    my ($port, $addr6, $scope_id, $flowinfo) = sockaddr_in6($_->{addr});
    print "IPv6:\n";
    print "  " . inet_ntop(AF_INET6, $addr6) . ", port: $port, protocol: $_->{protocol} ($protocol), socktype: $_->{socktype} ($socktype), (scope id: $scope_id, flowinfo: $flowinfo), canonname: $canonname\n";
  }
}

Let's test it:

$ getaddrinfo.pl www.kernel.org
IPv6:
  2001:4f8:1:10:0:1991:8:25, port: 0, protocol: 6 (IPPROTO_TCP), socktype: 1 (SOCK_STREAM), (scope id: 0, flowinfo: 0), canonname: pub.all.kernel.org
IPv6:
  2001:4f8:1:10:0:1991:8:25, port: 0, protocol: 17 (IPPROTO_UDP), socktype: 2 (SOCK_DGRAM), (scope id: 0, flowinfo: 0), canonname: 
IPv6:
  2001:4f8:1:10:0:1991:8:25, port: 0, protocol: 0 (n/a), socktype: 3 (SOCK_RAW), (scope id: 0, flowinfo: 0), canonname: 
IPv4:
  198.145.20.140, port: 0, protocol: 6 (IPPROTO_TCP), socktype: 1 (SOCK_STREAM), canonname: 
IPv4:
  198.145.20.140, port: 0, protocol: 17 (IPPROTO_UDP), socktype: 2 (SOCK_DGRAM), canonname: 
IPv4:
  198.145.20.140, port: 0, protocol: 0 (n/a), socktype: 3 (SOCK_RAW), canonname: 
IPv4:
  199.204.44.194, port: 0, protocol: 6 (IPPROTO_TCP), socktype: 1 (SOCK_STREAM), canonname: 
IPv4:
  199.204.44.194, port: 0, protocol: 17 (IPPROTO_UDP), socktype: 2 (SOCK_DGRAM), canonname: 
IPv4:
  199.204.44.194, port: 0, protocol: 0 (n/a), socktype: 3 (SOCK_RAW), canonname: 
IPv4:
  149.20.4.69, port: 0, protocol: 6 (IPPROTO_TCP), socktype: 1 (SOCK_STREAM), canonname: 
IPv4:
  149.20.4.69, port: 0, protocol: 17 (IPPROTO_UDP), socktype: 2 (SOCK_DGRAM), canonname: 
IPv4:
  149.20.4.69, port: 0, protocol: 0 (n/a), socktype: 3 (SOCK_RAW), canonname:

As expected, three entries are returned for each resolved IP address (BTW, the order of the entries matters: this is the order in which client applications should attempt to use them. In this case, IPv6 addresses are given preference, as it should be if the machine has good IPv6 connectivity - again, as it should be -).
In practice, as said, one may want to filter the results, for example by address family (IPv4, IPv6) and/or socket type (stream, datagram, raw) and/or protocol (TCP, UDP). For illustration purposes, let's filter by socket type. This is done using the socktype key of the $hints hash. For example, let's change it as follows to only return results suitable for the creation of sockets of type SOCK_STREAM:

my $hints = {}
$hints->{socktype} = SOCK_STREAM;   # add this line
if ($ARGV[0]) {
  $hints->{flags} = AI_CANONNAME;
}

Now let's run it again:

$ getaddrinfo.pl www.kernel.org
IPv6:
  2001:4f8:1:10:0:1991:8:25, port: 0, protocol: 6 (IPPROTO_TCP), socktype: 1 (SOCK_STREAM), (scope id: 0, flowinfo: 0), canonname: pub.all.kernel.org
IPv4:
  198.145.20.140, port: 0, protocol: 6 (IPPROTO_TCP), socktype: 1 (SOCK_STREAM), canonname: 
IPv4:
  149.20.4.69, port: 0, protocol: 6 (IPPROTO_TCP), socktype: 1 (SOCK_STREAM), canonname: 
IPv4:
  199.204.44.194, port: 0, protocol: 6 (IPPROTO_TCP), socktype: 1 (SOCK_STREAM), canonname:

Now the result is more like one would expect.

Note that there is much more to hint flags than shown above; the C man page for getaddrinfo() and the Perl reference linked at the end provide all the details.

So getaddrinfo() is the recommended way to do hostname to IP address resolution, although gethostbyname() won't probably go away soon.

The reverse process (from address to name) is performed using getnameinfo(), which is the counterpart to getaddrinfo(). Its usage is quite different from the C version and is as follows:

use Socket qw(:DEFAULT getnameinfo);
my ($err, $hostname, $servicename) = getnameinfo($sockaddr, [$flags, [$xflags]]);

Note that it accepts a sockaddr, so we pass it an address (IPv4 or IPv6) and a port. This should suggest that, just like getaddrinfo(), getnameinfo() can also do port to service name inverse resolution, which it indeed does (see below). Here we are concerned with reverse address resolution; in the following paragraph we'll do service port inverse resolution.

Let's write some code to test getnameinfo():

#!/usr/bin/perl
 
use warnings;
use strict;
 
use Socket qw(:DEFAULT inet_ntop inet_pton getnameinfo);
 
die "Usage: $0 [address] [port]" if (not $ARGV[0] and not $ARGV[1]);
 
my $straddr = ($ARGV[0] or "0.0.0.0");
my $port = ($ARGV[1] or 0);
 
# pack address + port
 
my $sockaddr;
 
# note that we assume the address is correct,
# real code should verify that
 
# stupid way to detect address family
if ($straddr =~ /:/) {
  $sockaddr = sockaddr_in6($port, inet_pton(AF_INET6, $straddr));
} else {
  $sockaddr = sockaddr_in($port, inet_pton(AF_INET, $straddr));
}
 
# do the inverse resolution 
 
my $flags = 0;
my $xflags = 0;
 
my ($err, $hostname, $servicename) = getnameinfo($sockaddr, $flags, $xflags);
 
die "getnameinfo: error or no results" if $err;
 
# dump
print "hostname: $hostname, servicename: $servicename\n";

Let's try it:

$ getnameinfo.pl 198.145.20.140 
hostname: tiz-korg-pub.kernel.org, servicename: 0
$ getnameinfo.pl  2001:4f8:1:10:0:1991:8:25
hostname: pao-korg-pub.kernel.org, servicename: 0

The Perl Socket reference page linked at the bottom provides more details about the possible hint flags that can be passed to getaddrinfo() and getnameinto(), and their possible return values in case of errors.

According to some sources, if a string representation of an address is passed to getaddrinfo() and the AI_CANONNAME flag is set, that should also work to do inverse resolution, in that the 'canonname' hash key of the returned value should be filled with the hostname. However, it does not seem to be working:

$ getaddrinfo.pl 198.145.20.140
IPv4:
  198.145.20.140, port: 0, protocol: 6 (IPPROTO_TCP), socktype: 1 (SOCK_STREAM), canonname: 198.145.20.140  # not the name

From service name to port number and viceversa

Here, again, there are two ways: the old one, and the new one.

The old way

This is done using getservbyname() and getservbyport() for forward and inverse resolution respectively:

my ($name, $aliases, $port, $proto) =  getservbyname($name, $proto);
my ($name, $aliases, $port, $proto) =  getservbyport($port, $proto);

Examples for both:

$ perl -e 'use warnings; use strict; my ($name, $aliases, $port, $proto) = getservbyname($ARGV[0], $ARGV[1]); print "name is: $name, aliases is: $aliases, port is: $port, proto is: $proto\n";' smtp tcp
name is: smtp, aliases is: , port is: 25, proto is: tcp

$ perl -e 'use warnings; use strict; my ($name, $aliases, $port, $proto) = getservbyport($ARGV[0], $ARGV[1]); print "name is: $name, aliases is: $aliases, port is: $port, proto is: $proto\n";' 80 tcp
name is: http, aliases is: , port is: 80, proto is: tcp

The new way

The new way is done again with getaddrinfo()/getnameinfo(), as explained above, since they can do hostname and service resolution on both directions (forward and reverse).
Whereas we ignored the port number in the sockaddr data when doing host-to-IP resolution above, in this case the port number is of course very important.

We can reuse the same code snippets from above, since we allowed for a (then unused) second argument to the program:

$ getaddrinfo.pl '' https
IPv6:
  ::1, port: 443, protocol: 6 (IPPROTO_TCP), socktype: 1 (SOCK_STREAM), (scope id: 0, flowinfo: 0), canonname: 
IPv4:
  127.0.0.1, port: 443, protocol: 6 (IPPROTO_TCP), socktype: 1 (SOCK_STREAM), canonname:

$ getnameinfo.pl '' 443
hostname: 0.0.0.0, servicename: https
$ getnameinfo.pl '' 389
hostname: 0.0.0.0, servicename: ldap

As mentioned before, it's also possible to ask for simultaneous hostname and service name resolution in both directions, eg

$ getaddrinfo.pl www.kernel.org www
IPv6:
  2001:4f8:1:10:0:1991:8:25, port: 80, protocol: 6 (IPPROTO_TCP), socktype: 1 (SOCK_STREAM), (scope id: 0, flowinfo: 0), canonname: pub.all.kernel.org
IPv4:
  198.145.20.140, port: 80, protocol: 6 (IPPROTO_TCP), socktype: 1 (SOCK_STREAM), canonname: 
IPv4:
  149.20.4.69, port: 80, protocol: 6 (IPPROTO_TCP), socktype: 1 (SOCK_STREAM), canonname: 
IPv4:
  199.204.44.194, port: 80, protocol: 6 (IPPROTO_TCP), socktype: 1 (SOCK_STREAM), canonname: 

$ getnameinfo.pl 2001:4f8:1:10:0:1991:8:25 443
hostname: pao-korg-pub.kernel.org, servicename: https

doing so is useful in the common case where the program needs a specific, ready-to-use sockaddr for a given service, address family and/or protocol (ie, the majority of cases), as opposed to just wanting to perform name or service resolution.

Reference: Perl Socket documentation.

Filed under networking, tips Tagged networking, perl, programming, socket API

Comments are closed | Permalink

\1