This is one of the all-time most asked FAQ.
"How do I run a command on all .txt files (or .cpp, or .html, or whatever) in a big directory hierarchy"? Most often, the command is an editing command like sed, so the requester also wants to edit the files "in-place" (but of course he says that only after you've already given him a generic solution and he found it doesn't do what he wanted).
The general way to run a command on a directory hierarchy is to use find:
$ find /basedir -type f -name '*.txt' -exec something {} \;
The above will run the command something on every .txt file in the hierarchy. The
$ find /basedir -type f -name '*.txt' -exec grep -l 'foobar[0-9]*' {} \; # list of files follow...
Now, the above command runs grep once for each file found. If you have thousands of .txt files, that means that an equivalent number of processes is
Now it turns out that find has its own mechanism to do internally what xargs does; to enable it, just replace the semicolon with which you end -exec with a plus (+):
$ find /basedir -type f -name '*.txt' -exec grep -l 'foobar[0-9]*' {} +
Now grep will be invoked with as many arguments as possible each time (within the system's limit for the maximum command line length); so instead of spawning hundreds or thousands of processes like before, we're now running only a few, possibly just one. This is much better.
And you're not limited to just a single command: if you exec
find /basedir -type f -name '*.txt' -exec sh -c 'some code here' sh {} +
The "sh" after the code is a placeholder; this is what the spawned shell will see as its $0 (as per the manual). Since we're using "+", the {} is then turned into as many arguments as possible, so in the shell code you can use "$@" to refer to them, or loop over them using for, etc.
If the code in single quotes becomes long and complex, you can of course put it in a file and then do -exec script.sh {} +
Now for the following question..."but I'm running sed and I want to change the files in place". Ok, you know that sed has an option
$ find /basedir -type f -name '*.txt' -exec sed -i 's/foo/bar/g' {} +
Now read carefully: if something goes wrong, or you have an error in your sed code that does something that you don't want (yes, it happens!), the above command has the potential to make a mess of your files, in such a way that it may be impossible to recover. You have been warned. When playing with commands like the above it's extremely easy to find yourself with, say, 2500 files changed in an unintended, but non-reversible, way. This is the reason why you should really either make a backup of the whole hierarchy before attempting to do anything, or alternatively you should specify a backup extension to
For an excellent page on find, see Greg's wiki.
Sometimes the command you want to run is compute intensive. If you have access to several computer GNU Parallel http://www.gnu.org/software/parallel/ makes it possible to have those computers help do the computation. E.g. convert all .wav to .mp3 using local computer and computer1+2 running one job per CPU-core:
find . -type f -name '*.wav' | parallel -j+0 --trc {.}.mp3 -S :,computer1,computer2 "lame {} -o {.}.mp3"
To learn more watch the intro video for GNU Parallel: http://www.youtube.com/watch?v=OpaiGYxkSuQ