This usually comes up in a form similar to "I'm running this code:
$ echo "foobar 123" | sed 's/[0-9]*/blah/' blahfoobar 123
but I don't get the expected result!" (which they think should be "foobar blah"). The result they get is instead as shown above.
Why is that? Remember that the star quantifier "*" means "zero or more" times of whatever is quantified. So in particular, "zero times" is perfectly fine. As a matter of fact, there's a match for "zero times [0-9]" just at the beginning of our string, and that's what sed replaces. Yes, it's a zero-length match (but, can a match of "zero times" something be non-zero-length?). Yet, it's still a perfectly valid match.
But there's more. By convention, there's a match for "zero times anything" between any two characters of the string, and also another one at the end:
$ echo "foobar" | sed 's/[0-9]*/BLAH/g' BLAHfBLAHoBLAHoBLAHbBLAHaBLAHrBLAH
So why, in the original example, does sed replace the first match even if there is a longer match later in the string (that is, "123")? Shouldn't quantifiers be greedy and always find the longest match?
Sure quantifiers are greedy, but greediness matters only when multiple matches are possible at the same position. So if our pattern is
But this is not the case in our original example. In it, there's no contention: there is only a single possible match at the beginning of the string, and the regex engine is content with it.
The usual (although somewhat simplistic) description for this behavior is "leftmost longest match", so stop at the first match that is found starting from the left, and if at that position multiple matches are possible, take the longest one (the aforementioned greediness). Our sed took the leftmost match, which is also the longest at that position, even if it has length zero.
Another pitfall which results from the same misconception is for example grepping for '[0-9]*' and wondering why all the input lines are printed. It's because they all match the pattern.
To sum it up, what people really mean when they write this code is usually "one or more", not "zero or more", so they really want this:
$ echo "foobar 123" | sed 's/[0-9][0-9]*/blah/' foobar blah
There are other ways to express "one or more". With BREs (Basic Regular Expressions, those used by sed or grep) it is possible to say
or, with GNU tools, also
With ERE (Extended Regular Expressions) tools, like awk or grep -E, "one or more" can be expressed as