I figured this was worth sharing because I myself had written two (fairly lame) clones of this program before I discovered it.
Sometimes I find myself composing and running huge shell scripts, like the following:
$ cat process-files.sh sox input/foo.ogg output/foo.ogg channels 1 sox input/bar.ogg output/bar.ogg channels 1 sox input/baz.ogg output/baz.ogg channels 1 sox input/quux.ogg output/quux.ogg channels 1 # more of the same, for perhaps hundreds of lines...
(Aside: why not xargs? For complicated tasks, it can be error-prone or just plain insufficient. Moreover, there's a lot of value in being able to just look at the script and see exactly what is going to be executed on your behalf, especially for one-off tasks. If you know emacs macros, scripts like this are not onerous at all to generate anyway.)
If you have a sequence of tasks like this that can run independently (and they are CPU-bound), then it pays to distribute the tasks over all your CPU cores. Here's where GNU Parallel comes in handy. Just pipe into it the commands you want to execute:
$ parallel -j4 < process-files.sh
Now parallel runs up to 4 tasks concurrently, starting up a new one when each one finishes (just as if you had a queue and a pool of 4 workers). What an elegant interface.
GNU Parallel has a bunch of more advanced features that are worth checking out, for example, preserving the proper ordering of standard output across tasks (to maintain the illusion of sequential-ness), or showing an ETA.
GNU Parallel is not in the official Debian/Ubuntu repos (as far as I can tell) but it is a snap to build from source, and it's the sort of thing I'd want floating around in my ~/bin everywhere I work.
Consider:
ReplyDeletecd input
ls | parallel -j+0 sox {} ../output/{} channels 1
This would save you making the huge process-files script.
Thanks! I sometimes use xargs and the xargs-style syntax for parallel, but in the general case I sometimes fall back to a Huge Shell Script. For example, if you want to do something like
ReplyDeleteconvert foo.jpg [...] foo_small.jpg
then I don't know whether it's simple (or possible) to automate that using the xargs-style syntax.
xargs cannot, but GNU Parallel can:
ReplyDeleteparallel convert {} {.}_small.jpg ::: *.jpg
Wow, that's really neat. Thanks!
ReplyDeleteYou may also have a look at http://sf.net/projects/paexec
ReplyDeleteThis tool does similar tasks (distributes your tasks over CPUs or hosts) but in a different way. It also provides features absent in GNU parallel.
Thanks for this post.
ReplyDeleteJust a note for those who might accidentally install the 'moreutils' package on ubuntu, this package provides a *different* 'parallel' binary that is not GNU Parallel. It has different syntax and generally won't work as described here.
It is 2013 and GNU Parallel is as easy as 'apt-get install parallel -y' in Debian Squeeze.
ReplyDeleteI have prepared a shell script to check the url status. but when i excuted the script to check 1000 urls. It take at least 30 mins to job done.
ReplyDeleteCould please help me out, is there is way,that i can minimize the output time.