This is an all-time awk FAQ. It can be stated in various ways. A typical way is:
"How can I print the whole line except the first (or the first N, or the Nth) field?"
Or also:
"How can I print only from field N to field M?"
The underlying general question is:
"How can I print a range of fields with awk?"
There are actually quite a few ways to accomplish the task, each has its applicability scenario(s) and its pros and cons. Let's start with methods that only use standard Awk features, then we'll get to GNU awk.
Use a loop
This is the most obvious way: just loop from N to M and print the corresponding fields.
sep = "" for (i = 3; i<=NF; i++) { printf "%s%s", sep, $i sep = FS } print ""
This is easy, but has some issues: first, the original record spacing is lost. If the input record (line) was, say,
abc def ghi jkl mno
the above code will print
ghi jkl mno
instead. This might or might not be a problem. For the same reason, if FS is a complex regular expression, whatever separated the fields in the original input is lost.
On the other hand, if FS is exactly a single-char expression (except space, which is the default and special cased), the above code works just fine.
Assign the empty string to the unwanted fields
So for example one might do:
$1 = $2 = ""; print substr($0, 3)
That presents the same problems as the first solution (formatting is lost), although for different reasons (here it's because awk rebuilds the line with OFS between fields), and introduces empty fields, which have to be skipped when printing the line (in the above example, the default OFS of space is assumed, so we must print starting from the third character; adapt accordingly if OFS is something else).
Delete the unwanted fields
Ok, so it's not possible to delete a field by assigning the empty string to it, but if we modify $0 directly we can indeed remove parts of it and thus fields. We can use sub() for the task:
# default FS # removes first 2 fields sub(/^[[:blank:]]*([^[:blank:]]+[[:blank:]]+){2}/,""); print # removes last 3 fields sub(/([[:blank:]]+[^[:blank:]]+){3}[[:blank:]]*$/,""); print # one-char FS, for example ";" # removes first 2 fields sub(/^([^;]+;){2}/,""); print # removes last 3 fields sub(/(;[^;]+){3}$/,""); print
While this approach has the advantage that it preserves the original formatting (this is especially important if FS is the default, which in awk is slightly special-cased, as can be seen from the first example), it has the problem that it's not applicable at all if FS is a regular expression (that is, when it's not the default and is longer than one character).
It also requires that the Awk implementation in use understands the regex {} quantifier operator, something many awks don't do (although this can be worked around by "expanding" the expression, that is, for example, using
Manually find start and end of fields
Let's now try to find a method that works regardless of FS or OFS. We observe that we can use
pos = 0 for (i=1; i<= NF; i++) { start[i] = index(substr($0, pos + 1), $i) + pos pos = start[i] + length($i) }
Now, start[1] contains the starting position of field 1 ($1), start[2] the starting position of $2, etc. (As customary in awk, the first character of a string is at position 1.) With this information, printing field 3 to NF without losing information is as simple as doing
first = 3 last = NF print substr($0, start[first], start[last] - start[first] + length($last))
Seems easy right? Well, this approach has a problem: it assumes that the input has no empty fields, which however are perfectly fine in awk. If some of the fields in the desired range are empty, it may or may not work. So let's see if we can do better.
Manually find the separators
By design, FS can never match the empty string (more on this later), so perhaps we can look for matches of FS (using
If we go this route, however, we must keep in mind that the default FS in awk is special-cased, in that leading and trailing blanks (spaces + tabs) in the record are not counted for the purpose of field splitting, and furthermore fields are separated by runs of blanks despite FS being just a single space. This only happens with the default FS; with any other value, each match terminates exactly one field. Fortunately, it is possible to check whether FS is the default by comparing it to the string
If FS is not the default, there is still another special case we should check. The awk specification says that if FS is exactly one character (and is not a space), it must NOT be treated as a regular expression. Since we want to use match() and FS as a pattern, this is especially important, for example if FS is ".", or "+", or "*", which are special regular expression metacharacters but should be treated literally in this case.
All that being said, here's some code that finds and saves all matches of FS:
BEGIN { # sep_re is the "effective" FS, so to speak, to be # used to find where separators are sep_re = FS defaultfs = 0 # ...but check for special cases if (FS == " ") { defaultfs = 1 sep_re = "[[:blank:]]+" } else if (length(FS) == 1) { if (FS ~ /[][^$.*?+{}\\()|]/) { sep_re = "\\" FS } } } { # save $0 and work on the copy record = $0 if (defaultfs) { gsub(/^[[:blank:]]+|[[:blank:]]+$/, "", record) } # find separators i = 0 while(1) { if (match(record, sep_re)) { i++ seps[i] = substr(record, RSTART, RLENGTH) record = substr(record, RSTART + RLENGTH) } else { break } } # ...continued below
With the above code seps[i] contains the string that matched FS between field i and i + 1. We of course also have the fields themselves in $1...$NF, so we can finally write the code that extracts a range of fields from the line:
# ...continued from above result = "" first = 3 last = NF for (i = first; i < last; i++) { result = result $i seps[i] } result = result $last print result }
Are we still overlooking something? Unfortunately, yes.
We said earlier that FS can't match the empty string; however, technically we can obviously set it to a value that would ordinarily match the empty string, for example
FS="a*"
That matches zero or more a's, so in particular it will produce a zero-length match if it can't find an "a".
But, just as obviously, an FS that can match a zero-length string is useless as field "separator", so what happens in these cases is that awk just does not allow it to match:
$ echo 'XXXaYYYaaZZZ' | awk -F 'a*' '{for (i=1; i<=NF; i++) print i, $i}' 1 XXX 2 YYY 3 ZZZ
In other words, if awk finds a match of length zero it just ignores it and skips to the next character until it can find a match of length at least 1 for FS.
(Let's leave aside the fact that setting FS to "a*" makes no sense, as in that case what's really wanted is "a+" instead and let's try to make the code handle the worst case.)
In our sample code, we're using match(), which can indeed produce zero-length matches, but we are not checking for those cases; the result is that running it with an FS that can produce zero-length matches will loop forever.
Thus we need to mimic awk's field splitting a little bit more, in that if we find a zero-length match, we just ignore it and try to match again starting from the next character.
So here's the full code to print a range of fields preserving format and separators, with the revised loop to find separators skipping zero-length matches:
BEGIN { # sep_re is the "effective" FS, so to speak, to be # used to find where separators are sep_re = FS defaultfs = 0 # ...but check for special cases if (FS == " ") { defaultfs = 1 sep_re = "[[:blank:]]+" } else if (length(FS) == 1) { if (FS ~ /[][^$.*?+{}\\()|]/) { sep_re = "\\" FS } } } { # save $0 and work on the copy record = $0 if (defaultfs) { gsub(/^[[:blank:]]+|[[:blank:]]+$/, "", record) } # find separators i = 0 while(1) { if (length(record) == 0) break; if (match(record, sep_re)) { if (RLENGTH > 0) { i++ seps[i] = substr(record, RSTART, RLENGTH) record = substr(record, RSTART + RLENGTH) } else { # ignore zero-length match: go to next char record = substr(record, 2) } } else { break } } result = "" first = 3 last = NF for (i = first; i < last; i++) { result = result $i seps[i] } result = result $last print result }
A simple optimization of the above code would be to directly skip the next field upon finding a match for FS, eg
# attempt next match after the field that begins here record = substr(record, RSTART + RLENGTH + length($i))
since, by definition, a field can never match FS, so it can be skipped entirely for the purpose of finding matches of FS.
GNU awk
As it often happens, life is easier for GNU awk users. In this case, thanks to the optional fourth argument to the
# this does all the hard work, as split() is # guaranteed to behave like field splitting nf = split($0, fields, FS, seps) first = 3 last = NF for (i = first; i < last; i++) { result = result fields[i] seps[i] } result = result $last print result
For more and a slightly different take on the subject, see also this page on the awk.freeshell.org wiki.
Thanks for your thorough explanation for printing ranges in awk. I find your posts exceptionally informative.