Skip to content
 

Smart ranges in awk

Yes, we all know that awk has builtin support for range expressions, like

# print lines from /BEGIN/ to /END/, inclusive
awk '/BEGIN/,/END/'

Sometimes however, we need a bit more flexibility. We might want to print lines between two patterns, but excluding the patterns themselves. Or only including one. A way to achieve the result is to use something like these:

# print lines from /BEGIN/ to /END/, not inclusive
awk '/BEGIN/,/END/{if (!/BEGIN/&&!/END/)print}'

# print lines from /BEGIN/ to /END/, not including /BEGIN/
awk '/BEGIN/,/END/{if (!/BEGIN/)print}'

However, these have a problem. With this input, for example:

1 BEGIN
2 foo
3 bar
4 BEGIN
5 baz
6 END

the BEGIN at line 4 will not be printed, which instead we probably want. Even if those were correct, they are quite clunky, and there must be a better way to select the lines that we want, and in fact there is. This is another typical awk idiom. We can use a flag to keep track of whether we are currently inside the interesting range or not, and print lines based on the value of the flag. Let’s see how it’s done:

# print lines from /BEGIN/ to /END/, not inclusive
awk '/END/{p=0};p;/BEGIN/{p=1}'

# print lines from /BEGIN/ to /END/, excluding /END/
awk '/END/{p=0} /BEGIN/{p=1} p'

# print lines from /BEGIN/ to /END/, excluding /BEGIN/
awk 'p; /END/{p=0} /BEGIN/{p=1}'

All these programs just set p to 1 when /BEGIN/ is seen, and set p to 0 when /END/ is seen. The crucial difference between them is where the bare "p" (the condition that triggers the printing of lines) is located. Depending on its position (at the beginning, in the middle, or at the end), different parts of the desired range are printed. To print the complete range (inclusive), you can just use the regular /BEGIN/,/END/ expression or use the flag technique, but reversing the order of the conditions and associated patterns:

# print lines from /BEGIN/ to /END/, inclusive
awk '/BEGIN/{p=1};p;/END/{p=0}'

It goes without saying that while we are only printing lines here, the important thing is that we have a way of selecting lines within a range, so you can of course do anything you want instead of printing. And of course /BEGIN/ and /END/ should be changed to match the lines you want to select as starting and ending points.

UPDATE 16/10/10: a file may have many /BEGIN/,/END/ ranges. What if one wants to print, say, only the fourth such range? The solutions using flags are trivially modified by adding a counter, and only printing when the counter is four, or whatever instance is desired:

# print lines in the n-th /BEGIN/,/END/ range, not inclusive
awk -v n=4 '/END/{p=0}; p && c == n; /BEGIN/ && !p {p=1; c++}'

The other cases can be adapted similarly.

2 Comments

  1. Jotne says:

    awk '/BEGIN/,/END/{if (!/BEGIN/&&!/END/)print}'
    can be done like this
    awk '/BEGIN/,/END/{if (!/BEGIN|END/)print}'