Skip to content
 

Smart ranges in awk, part two

In a past article we saw how to use flags in awk to selectively process lines included in a range, delimited by two distinct patterns marking the beginning and end of the interval of interest. But let's focus on the case, also common, where there is a single pattern, and what's wanted is to select lines between occurrences of that pattern. As usual, the examples will just print the selected lines, but any kind of processing can be done on them.

A simple example input may be

nothing interesting here
nothing interesting here
--
interesting stuff here
interesting stuff here
--
nothing interesting here
nothing interesting here
nothing interesting here
--
other interesting stuff here
--
nothing interesting here
nothing interesting here

In this example, the lines with two dashes delimit the interesting blocks. So we may want to print the contents of, say, the second interesting block, or, for that matter, the n-th interesting block, which translates into printing lines between the m-th and the m+1-th occurrence of the pattern (m may be equal to n or not, depending on whether a delimiter line can be only either the beginning or the ending of a block, or it's allowed to be both things at the same time - for example when there are two consecutive blocks). A further variation may involve whether to print the delimiter lines or not.

This approach can be used if the delimiter lines have to be printed:

$ awk -v m=3 'count == m; /^--$/{count++; if(count==m) print}' file.txt
--
other interesting stuff here
--

If the delimiter lines are not wanted, here's how to do it:

$ awk -v m=3 '/^--$/{count++; next} count == m' file.txt
other interesting stuff here

Between first and last

Another, somewhat unrelated, variation may be "select all lines between the first and the last occurrence of a pattern". This is tricky, because when reading the input sequentially it's not possible to know whether the pattern we're seeing will be "the last".

If it can be assumed that there will be exactly two occurrences overall, this degenerated case is simpler to manage:

awk 'f; /pattern/{f=!f; if(f)print}' file.txt

("f" is for "flag") or, if the delimiter lines should not be printed,

awk '/pattern/{f=!f; next}; f' file.txt

Both can be written differently, by shuffling around when to print and/or the position of the flag, but the idea should be clear.

But let's see now hot to solve the general case of this problem, that is, when the pattern can occur an arbitrary number of times (including zero or one).

One approach is to make two passes, the first to discover where the first and last occurrence are, the second to actually print the lines in between:

awk 'NR==FNR{if(/pattern/){last=NR;if(!first)first=NR};next} FNR>=first && FNR<=last' file.txt file.txt

If the pattern never occurs, nothing is printed; if the pattern occurs only once, only that line is printed because "first" and "last" will have the same value. It can be trivially modified to exclude the delimiter lines:

awk 'NR==FNR{if(/pattern/){last=NR-1;if(!first)first=NR+1};next} FNR>=first && FNR<=last' file.txt file.txt

and this has the result that if the pattern occurs only once not even that line will be printed. This method uses a popular awk idiom, but it makes two passes over the file.

Now for a single-pass approach (which, obviously, should also handle the degenerated cases):

awk '!ok && /pattern/{ok=1} ok{p=p s $0;s=RS} /pattern/{print p;p=s=""}' file.txt

Since we can't know whether a matching line will be the last, the idea here is to accumulate lines in a buffer (an array could be used as well), and when we see a matching line (which could potentially be the last), print the buffer. The buffer includes the matching line itself, so the result is that the first and last matching delimiter lines are included in the output.

Here is a version that excludes those first and last delimiter lines (but still includes any matching line that occurs in between):

awk '!ok && /pattern/{ok=1; next} /pattern/{if(p)print p;p=s=""} ok{p=p s $0;s=RS}' file.txt