Ok the title isn't the best one but essentially the problem here is: I want to replace FOO with BAR, but only if FOO is (or is not) part of a text in brackets (this is just an example, although it seems to be a common occurring case; the point is that it must or must not be in a certain context). So, in this example:
abcd FOO efgh [this FOO is in brackets] ijkl FOO [another FOO in brackets]
The output should be either
abcd FOO efgh [this BAR is in brackets] ijkl FOO [another BAR in brackets]
or
abcd BAR efgh [this FOO is in brackets] ijkl BAR [another FOO in brackets]
depending on whether we want the in-context or out-of context replacement.
This is an interesting problem. There are a few different ways to approach it.
In-context replacement
In-context replacement is probably easier, so let's start with it.
Awk
The idea with awk is to use match() repeatedly to find all the instances of the context, and perform the replacements only on them. In our example, the contexts are all the bracketed blocks, so:
{ newline = "" while(match($0, /\[[^]]*\]/) > 0) { newline = newline substr($0, 1, RSTART - 1) context = substr($0, RSTART, RLENGTH) gsub(/FOO/, "BAR", context) newline = newline context $0 = substr($0, RSTART + RLENGTH) } newline = newline $0 print newline }
The variable newline contains the changed line, which is built up gradually. Parts of the original lines that are not touched are added to newline as they are, while contexts are added after FOO having been replaced with BAR. At the end, newline is printed. Let's build a simple test file (which will be used throughout the examples), and test it. Note that for simplicity we're NOT considering nested contexts, which rapidly become very hard to parse using regular expressions (in the example, that would be blocks containing bracketed subblocks). We're also deliberately ignoring the case where context-providing characters can appear (perhaps escaped somehow) in some other place and thus should be ignored for our purposes.
$ cat sample.txt abcd FOO efgh [this FOO is in brackets] ijkl FOO nmop [another FOO in brackets] blah abcd FOO efgh [this FOO is in brackets] ijkl FOO mnop [another FOO in brackets] blah efgh [normal text in brackets] ijkl mnop [another normal text] blah FOO FOO [FOO FOO][FOO FOO]FOO FOO [FOO] [FOO]ijkl $ awk -f incontext.awk sample.txt abcd FOO efgh [this BAR is in brackets] ijkl FOO nmop [another BAR in brackets] blah abcd FOO efgh [this BAR is in brackets] ijkl FOO mnop [another BAR in brackets] blah efgh [normal text in brackets] ijkl mnop [another normal text] blah FOO FOO [BAR BAR][BAR BAR]FOO FOO [BAR] [BAR]ijkl
sed
With sed, we use a loop and keep replacing FOOs that appear in a context:
$ sed ':loop; s/\(\[[^]]*\)FOO\([^]]*\]\)/\1BAR\2/; t loop' sample.txt abcd FOO efgh [this BAR is in brackets] ijkl FOO nmop [another BAR in brackets] blah abcd FOO efgh [this BAR is in brackets] ijkl FOO mnop [another BAR in brackets] blah efgh [normal text in brackets] ijkl mnop [another normal text] blah FOO FOO [BAR BAR][BAR BAR]FOO FOO [BAR] [BAR]ijkl
Note that this will not work if the replacement string contains the matched text (ie FOO here); that would lead to an endless loop.
Perl
Perl is the most powerful of the bunch, so we can do the replacement directly on each matched context with the help of the /e switch (for eval) to the replacement:
$ perl -pe 's/\[.*?\]/($a=$&)=~s%FOO%BAR%g;$a/eg' sample.txt abcd FOO efgh [this BAR is in brackets] ijkl FOO nmop [another BAR in brackets] blah abcd FOO efgh [this BAR is in brackets] ijkl FOO mnop [another BAR in brackets] blah efgh [normal text in brackets] ijkl mnop [another normal text] blah FOO FOO [BAR BAR][BAR BAR]FOO FOO [BAR] [BAR]ijkl
Also note that Perl's regular expressions are able to match contexts that would be difficult or impossible to match with standard awk/sed REs (think non-greedy quantifiers or lookaround). The example uses a simple context (brackets) so it's possible to use all the tools.
Out of context
This is a bit harder to accomplish, and in some cases we must resort to dirty tricks.
awk
Looking closely at the awk in-context solution, we see that during the loop we see both the contexts and the out-of-context data, alternatively. So all we need is to perform the replacements on the out-of-context data instead of the in-context one. So the solution is almost the same as the one for in-context replacement:
{ newline = "" while(match($0, /\[[^]]*\]/) > 0) { outofcontext = substr($0, 1, RSTART - 1) gsub(/FOO/, "BAR", outofcontext) newline = newline outofcontext context = substr($0, RSTART, RLENGTH) newline = newline context $0 = substr($0, RSTART + RLENGTH) } gsub(/FOO/, "BAR") newline = newline $0 print newline }
$ awk -f outofcontext.awk sample.txt abcd BAR efgh [this FOO is in brackets] ijkl BAR nmop [another FOO in brackets] blah abcd BAR efgh [this FOO is in brackets] ijkl BAR mnop [another FOO in brackets] blah efgh [normal text in brackets] ijkl mnop [another normal text] blah BAR BAR [FOO FOO][FOO FOO]BAR BAR [FOO] [FOO]ijkl
sed
The idea here is that contexts are removed from the line and stored away, the replacement is done on what's left (which thus must be the out-of-context data), and finally the contexts are restored to their original positions. Of course, to "remember" where the removed contexts are, we should use some sort of placeholder character.
So we put contexts in the hold space (separated by a ASCII 1 character), and we use an ASCII 1 in the original line to mark a spot where a context has to be reinserted after the replacements.
h # save line to hold space # remove non-contexts (ie, leave only contexts separated by \x1) s/^[^[]*\[/[/ s/\][^]]*$/]\x1/ s/\][^[]*\[/]\x1[/g # swap hold/pattern space to get the original line in pattern space x # remove contexts (ie, leave only non-contexts separated by \x1) s/\[[^]]*\]/\x1/g # do the actual replacement s/FOO/BAR/g # append hold space to pattern space, this gives <patternspace>\n<holdspace> in pattern space G # reinsert contexts where they belong :loop s/\x1\(.*\)\n\([^\x1]*\)\x1/\2\1\n/ t loop # remove leftover stuff s/\n.*//
Not the most straightforward way, but in these cases sed is a bit limited. I probably wouldn't recommend to use sed for this task.
With a sed that supports EREs like GNU sed (which is probably needed anyway to use \x1 as in the other solution above), there is also the option of using a loop, similar to the in-context solution:
sed -r ':loop; s/((^|\])[^[]*)FOO([^[]*($|\[))/\1BAR\3/; t loop' sample.txt
This has the same problem as the in-context solution (the replacement can't contain the pattern), and also leads us directly to the Perl solution.
Perl
With Perl, again, it's quite easy:
$ perl -pe 's/(?:^|\]).*?(?:$|\[)/($a=$&)=~s%FOO%BAR%g;$a/eg' sample.txt abcd BAR efgh [this FOO is in brackets] ijkl BAR nmop [another FOO in brackets] blah abcd BAR efgh [this FOO is in brackets] ijkl BAR mnop [another FOO in brackets] blah efgh [normal text in brackets] ijkl mnop [another normal text] blah BAR BAR [FOO FOO][FOO FOO]BAR BAR [FOO] [FOO]ijkl
Essentially the idea is the same as before, but this time we are matching all the out-of-context parts (that is, from either beginning of line or "]" to either end of line or "[").