Skip to content
 

IPv6 address normalization

So we have this application that sends us IPv6 addresses in some odd (though valid) format, like

2001:db8:0:0:0:0:cafe:1111
2001:db8::a:1:2:3:4
2001:0DB8:AAAA:0000:0000:0000:0000:000C
2001:db8::1:0:0:0:4

and we need to compare them to addresses stored elsewhere (for example, in the output of ip6tables-save). However we find that, though the same addresses do already exist "elsewhere", there they look like this:

2001:db8::cafe:1111
2001:db8:0:a:1:2:3:4
2001:db8:aaaa::c
2001:db8:0:0:1::4

So our code thinks they're not present, and processes them again (or adds them twice, or whatever). Whith IPv4, of course, there was no such problem.

What we need is a way to "normalize" all these addresses so that if two addresses really are the same, their string representation must be the same as well, so they can be compared.

There is a proposed standard (RFC 5952) for IPv6 address text representation, which essentially boils down to these simple rules:

  1. Always suppress leading zeros in each field, eg "0004" becomes just "4", "0000" becomes "0"
  2. Always use lowercase letters ("1AB4" becomes "1ab4")
  3. Regarding replacement of runs of zero-valued 16-bit fields with "::" (as long as the run is composed of more than one field), always replace the longest run; in case of runs of the same length, replace the first (leftmost) run. As said, a single zero-valued 16-bit field is not considered a run and is not touched by this algorithm (but rule 1 above still applies).

The two functions we want are inet_pton() and inet_ntop(), which (at least in glibc) do implement the above rules. Although they are C library functions, the most popular scripting languages expose an interface to them so they can be called from scripts.

inet_pton() takes a string and converts it into an internal format (failing if the string is invalid, so it can also be used to validate the address string), and inet_ntop() takes an address in internal format and converts it into a string representation that follows the rules above.
I'm being purposely vague about the "internal format" because its representation may be platform-dependent, and scripting languages could obfuscate it even more, so it's better not to mess with it and just treat it as a sort of black box. At least with Perl and Python, it's the binary representation of the address.

Perl

#!/usr/bin/perl
use warnings;
use strict;
use Socket qw(AF_INET6 inet_ntop inet_pton);
 
while(<>){
  chomp;
  my $internal = inet_pton(AF_INET6, $_);
  if (defined($internal)) {
    print inet_ntop(AF_INET6, $internal), "\n";
  } else {
    print "Invalid address $_\n";
  }
}

If we're 100% sure that the input address is valid, then the main loop can be shortened to

while(<>){
  chomp;
  print inet_ntop(AF_INET6, inet_pton(AF_INET6, $_)), "\n";
}

Though extra checking surely can't hurt and avoids surprises.
Note that this needs a relatively recent Perl; for example, the Socket module of Perl 5.10.1 does not export AF_INET6 and friends (though ISTR there used to be a Socket6 module which did, but it had to be installed separately).

Python 2

#!/usr/bin/python
 
import os
import sys
import socket
 
for addr in sys.stdin:
  addr = addr.rstrip(os.linesep)
 
  try:
    internal = socket.inet_pton(socket.AF_INET6, addr)
    print socket.inet_ntop(socket.AF_INET6, internal)
 
  except socket.error:
    print "Invalid address " + addr

Again if validation can be skipped, then it can be written as

for addr in sys.stdin:
  print socket.inet_ntop(socket.AF_INET6, socket.inet_pton(socket.AF_INET6, addr.rstrip(os.linesep)))

which can in turn be turned into a oneliner by inlining the loop into a list and doing

print "\n".join([socket.inet_ntop(socket.AF_INET6, socket.inet_pton(socket.AF_INET6, addr.rstrip(os.linesep))) for addr in sys.stdin])

Code in action

So now running the original list of addresses through the normalization code we get:

$ echo '2001:db8:0:0:0:0:cafe:1111
2001:db8::a:1:2:3:4
2001:0DB8:AAAA:0000:0000:0000:0000:000C
2001:db8::1:0:0:0:4' | normalize.py
2001:db8::cafe:1111
2001:db8:0:a:1:2:3:4
2001:db8:aaaa::c
2001:db8:0:1::4

4 Comments

  1. Imran says:

    Thank you for reply.

    Python approach is not normalizing all the formats, where Java approach does.
    Example:
    If I ask to normalize IP fd12:ba74:4ba8:2e::3, outputs are

    Python: fd12:ba74:4ba8:2e::3
    Java: fd12:ba74:4ba8:2e:0:0:0:3

    Output by Java APIs seems following RFC.

    normIP = java.net.InetAddress.getByName(IP).toString().substring(1)

    • waldner says:

      I think we have a terminology misunderstanding here. The address you're trying to "normalize" (fd12:ba74:4ba8:2e::3) is already normalized according to the article and the RFC (see http://tools.ietf.org/html/rfc5952#page-10 for the details), thus it is correct that python outputs it unchanged. It's java that applies some other, different, formatting. If that's the one you need, then by all means go with java.

  2. Imran says:

    Very informative and helping. Facing another issue now.
    In a existing shell program I have to normalize IPs, doing something like below

    #!/bin/bash
    fill(){
    convert="python -c \"import socket; print socket.inet_ntop(socket.AF_INET6, socket.inet_pton(socket.AF_INET6, '${IP}'))\"`
    }

    IP="0:0:0:0:0:FFFF:204.152.189.116"
    fill
    echo "command: $convert"
    norm_ip=`$convert`
    echo $norm_ip

    Execution of this gives:

    command: python -c "import socket; print socket.inet_ntop(socket.AF_INET6, socket.inet_pton(socket.AF_INET6, '0:0:0:0:0:FFFF:204.152.189.116'))"

    SyntaxError: EOL while scanning string literal

    • waldner says:

      Your problem is one of bash syntax. For unknown/untold reasons, you're trying to put a complex command in a variable for later execution, something that looks completely unnecessary to me here. If you insist on doing that, please read this page: http://mywiki.wooledge.org/BashFAQ/050 where the topic is covered in detail and at least use one of the methods described there. You should also get into the habit of using quotes around your variables (http://mywiki.wooledge.org/Quotes), and $() instead of backticks for command substitution.

      I'd rewrite your code as follows:

      #!/bin/bash
      fill(){
       python2 -c "import socket; print socket.inet_ntop(socket.AF_INET6, socket.inet_pton(socket.AF_INET6, '${1}'))"
      }
      
      IP="0:0:0:0:0:FFFF:204.152.189.116"
      norm_ip=$(fill "$IP")
      echo "$norm_ip"