Debugging sed programs

Posted by waldner on 22 November 2009, 11:31 pm

To understand why and how a sed program works (and how sed programs work in general, for that matter), I suggest to start with an unusal approach, very low-tech. Sit down with paper and pencil, draw some boxes to represent the pattern and the hold spaces (the hold space is not used in this example), and try to simulate the behavior of the program with some simple input.

Let's debug this sed program:

sed ':begin;$!N;/PATTERN\n/s/\n//;tbegin;P;D'    # if the line ends in PATTERN, join it with the next line

The input is like this:

line1
line2foo
line3foo
line4
line5foo
line6

and we want to join lines if a line ends with "foo". Our sed program will thus become:

sed ':begin;$!N;/foo\n/s/\n//;tbegin;P;D' file.txt

Let's go through the input, see what sed does, and what the pattern space contains at each step. When the program starts, the first line of input is in the pattern space:

PATTERN SPACE: line1

:begin is just a label, so no action is performed. Then $!N reads in a new line if the current line isn't the last in the input, so now we have:

PATTERN SPACE: line1\nline2foo

Now /foo\n/ checks whether we have "foo" followed by a newline in the pattern space, which we don't. So the s command is not run, and the following t test is not true since no substitution was performed. So sed executes P and D, which respectively print line1 and deletes it from the pattern space, leaving

PATTERN SPACE: line2foo

D starts a new cycle, so sed goes back to the beginning of the program, without reading any new line. Again, $!N reads in a new line, since we're not at the last line, so we have

PATTERN SPACE: line2foo\nline3foo

Now /foo\n/ does match, so the substitution is performed, giving

PATTERN SPACE: line2fooline3foo

Also, since the substitution was performed, the t test is true, so the control flow jumps to the label begin, ie again at the beginning. Nothing was printed, nor deleted, and the pattern space is still as we left it. Now again, $!N reads in another line, yielding

PATTERN SPACE: line2fooline3foo\nline4

Again /foo\n/ matches here, so the substitution is performed, giving

PATTERN SPACE: line2fooline3fooline4

the t test is true, and we're back at the beginning. You'll notice that the pattern space is accumulating lines. $!N reads in a new line (since we're not at the end of the file yet):

PATTERN SPACE: line2fooline3fooline4\nline5foo

Now we don't have a match for /foo\n/, so the s is skipped, and t is false, which leads us to P and D. Not surprisingly, the first prints line2fooline3fooline4, and the second leaves the pattern space as follows:

PATTERN SPACE: line5foo

and starts a new cycle. $!N reads another line, giving

PATTERN SPACE: line5foo\nline6

We have a match, and the \n is removed, thus

PATTERN SPACE: line5fooline6

The t sends us once again to the beginning. Now we really are at the end, so N is not executed and no line is read. We don't have a match for /foo\n/, so sed skips directly to P, which prints line5fooline6, and D, which deletes the whole pattern space. Since we are at the end of the input, the program terminates. The final output thus is

line1
line2fooline3fooline4
line5fooline6

which satisfies our initial requirements to join lines ending in "foo" with the next line. You can use a similar method to trace the actions performed by any sed program.

Uhm, I think sed is not for me...

Wait! Now that we've gone through the pain of understanding the gory details, and that hasn't scared you away from sed, you will be happy to know that a sed debugger that automatically does exactly what we did by hand exists: it's called sedsed. It's a great piece of software, very easy to use, and you can find it here.

Filed under sed, shell Tagged sed, shell

Comments are closed | Permalink

\1