Why I love sed (and cygwin)

sed is a wonderful tool, that can help a lot a developer with various house-keeping tasks. I like to use it for two things: replace in multiple files based on regular expressions and renaming multiple files.

But using more complex expressions becomes cumbersome when you want to use sed from a terminal, because the special characters used in the regular expressions for sed overlap with the bash/sh special characters.

Let’s say that we have a set of folders that have the names all messed up: they are dated with years and the year is placed at the back. So we would use a script like:

And I know that this scary. Especially the fact that I use three types of quotes in a row can be scary, I understand that. But, in reality, it’s quite simple: the idea is to have the lines passed through this sed filter that will change the position of the year, placing it in front. So we use the expression:

to say that the first token is anything but the trailing ‘year’, and the second token is the year itself; we place the tokens in the parenthesis, escaped to not be considered the normal () characters. This expression is placed between straight apostrophes (‘) to be sure that we won’t hit the expansions of the shell (otherwise we’d have to double the backslashes as well as escape the parenthesis).

To feed the original input we break it down into lines (one file per line), and each line is fed to the sed filter we wrote. But to pass the folder names, we use environment variables instead of input redirection, so we have to capture the output of the sed expression by using the backticks  which run a command and return the command output.

Finally, the quotes, are used because the input and output may contain spaces (and in fact, the output will surely contain spaces).

This command can be simplified by using a sed script: we can place the expression in a file, ‘input.sed’. Then we can change the command to:

This is a clearer statement, and it’s simpler to reuse the replacement expression.

This kind of tools make me have always the bash shell and the UNIX text tools around; on Windows you can use the RedHat owned ‘Cygwin project‘. And to add a bit of spice, the terminal window of Cygwin does resize properly, unlike the terminal application that Microsoft delivers (which only maximizes to 80 columns). And, of course, you’d have the flexibility of a proper shell.

Comments

Why I love sed (and cygwin) — 6 Comments

  1. I’ve pretty much gave up on using the classic Unix tools (sed, awk, etc.) when I’ve realised I can do the same using just one unique tool, perl (yeah, “If the only tool you have is a hammer, you tend to see every problem as a nail”).

    Using your approach (which attempts to rename ALL the files from a folder) I get (on an Ubuntu box):
    > ls -12011 bar1999barfoofoo2012> ls | while read line ; do>   mv "$line" "`echo $line | sed 's/\(.*\)\([12][90][0-9][0-9]\)/\2 \1/g'`"> donemv: cannot stat `2011': No such file or directorymv: `barfoo' and `barfoo' are the same filesqa-nx01_vati> ls -11999 bar2011 2012 foobarfoo>

    A cleaner way (hey, at least for me!) would be:
    > ls -12011 bar1999barfoofoo2012> ls | perl -lne 'if (/(.+?)([12][90][0-9][0-9])$/) { print qq(mv "$_" "$2 $1"); }' | ksh> ls -11999 bar2011 2012 foobarfoo>

    Another good think about the above perl way is that I can postpone the final pipe to shell (‘ | ksh’) and first construct the perl one liner to make sure I’ll get the wanted effect.

  2. Right, forgot about the perl built in ‘rename’, no need to shell out 🙁

    Anyway the idea was that I find myself using these days mostly just perl one-liners, sometimes in combination with ‘ack’ (http://betterthangrep.com/ – BTW, great tool which, if you aren’t already using it, I would highly recommend it!), to generate on-fly scripts to complete various task/mass-changes on directory trees hence my “itch” to comment on this post 😉

    • 😛 Yup, it’s nice anyway that we can have various methods for this; all of them pretty much valid. I like the sed version because I know it’s easy to test the expression before the rename.