delladea: (Default)
Geeky Muse ([personal profile] delladea) wrote in [site community profile] dw_dev_training2012-02-03 01:52 pm
Entry tags:

Strip tab characters from multiple text files and replace them with spaces (or something else)

I switch between Gedit, Notepad++, and vim fairly often depending on what I'm doing and whose computer I'm on. Sometimes I end up with tab characters where I really wanted four spaces, mainly when I'm using vim and I haven't figured out how to get vim to not do this. Gedit and Notepad++ have settings to use spaces instead of tabs, so there's no issue there.

Either I don't notice the tab characters until after I've put lots of them in the file I'm editing, or I'm editing a file from someone else whose editor uses tab characters for indentation. I know its not a big deal to some people, but tab indentation mixed with space indentation is a huge pet peeve of mine.

Thus, a perl script was born:


View Gist (strip-tabs.pl)

Feel free to gank away if you find it useful!
mark: A photo of Mark kneeling on top of the Taal Volcano in the Philippines. It was a long hike. (Default)

[staff profile] mark 2012-02-03 08:34 pm (UTC)(link)
Cool, thanks for sharing!

Another way of doing this is using sed, like this:

sed -ie 's/\t/ /g' yourfile.pl

That will replace all tabs with four spaces in yourfile.pl. You can do it in all files of a certain type with something like this:

find -name \*.pl -exec sed -ie 's/\t/ /g' {} \;

That will find all Perl files (starting in the current directory and going down, so it'll recurse into any subdirectories) and replace the tabs with spaces.

These work from a Linux/Mac command line. Windows, you're out of luck.
Edited 2012-02-03 20:34 (UTC)
shadowspar: Pic of Kurama holding a rose (kurama - rose)

[personal profile] shadowspar 2012-02-04 01:01 am (UTC)(link)

There's definitely a lot of ways to accomplish any given task in the *nix environment. One of the Perl mottos is "TMTOWDTI" (There's More Than One Way To Do It), but in a lot of ways that's really true of the Unix culture that Perl grew out of. =)

The settings you're looking for in vim are expandtab (convert tabs to spaces), tabstop (how many spaces a tab counts for), and shiftwidth (how many spaces an indent should be). In my case, I want a four space indent using spaces not tabs, so I have this line in my .vimrc file:

    set expandtab tabstop=4 shiftwidth=4

This article in the vim wiki elaborates a bit more.

Thanks for posting this! Tabs in source files are the bane of my existence too. ^_^;

momijizukamori: Green icon with white text - 'I do believe in phosphorylation! I do!' with a string of DNA basepairs on the bottom (Default)

[personal profile] momijizukamori 2012-02-04 01:16 am (UTC)(link)
Gedit and Notepad++ have settings to use spaces instead of tabs, so there's no issue there.

I did not know this setting existed in gedit until right now. Congratulations, you have just made my day more magical.
shadowspar: Picture of ouendan (\o/)

[personal profile] shadowspar 2012-02-04 03:17 pm (UTC)(link)
Yaaaay, so glad to hear! \o/ I think there's a way to do almost anything in vim, but sometimes the path to get there can be a little bit arcane. =)
pne: A picture of a plush toy, halfway between a duck and a platypus, with a green body and a yellow bill and feet. (Default)

[personal profile] pne 2012-02-06 10:09 am (UTC)(link)
Another way of doing this is using sed, like this:

sed -ie 's/\t/ /g' yourfile.pl


Perl also has an -i switch (I think due, in part, to its originally having to compete with the established sed).

So you could boil down the script to something like perl -i -pe "s/\t/    /g" file1 file2 file3 .
Edited 2012-02-06 10:10 (UTC)
pne: A picture of a plush toy, halfway between a duck and a platypus, with a green body and a yellow bill and feet. (Default)

[personal profile] pne 2012-02-06 10:21 am (UTC)(link)
Would you be interested in some comments on the Perl code and suggestions for other ways to do things?
pne: A picture of a plush toy, halfway between a duck and a platypus, with a green body and a yellow bill and feet. (Default)

[personal profile] pne 2012-02-06 02:12 pm (UTC)(link)
OK, here are the two big thoughts I had.

$num_args = $#ARGV + 1;

An array in scalar context evaluates to the number of its elements, so this could be $num_args = @ARGV; instead.

I like to separate $#foo and scalar @foo, and use the former only in contexts where it means "index of the last entry" (for example, in a for loop iterating over the indices of an array) and the latter when I want a number of elements.

(Also, $#foo is sensitive to setting $[, but you shouldn't mess with that variable anyway.)

The other one is that "iterating over the files in @ARGV" is such a common use case that Perl has a shortcut for this.

If you read from the empty filehandle (as in while (<>) with nothing in between the angle brackets), you'll get a line at a time from all of the files in succession. Perl will automatically handle opening them and closing them for you. And if you didn't supply any file names, Perl will read from standard input. (This is a bit like Unix tools such as gzip or grep which will also work on standard input if there are no file name arguments.) See http://perldoc.perl.org/perlop.html#I/O-Operators for more on this. (That also mentions that you can find out which file you're currently on by examining $ARGV, which the magic will set for you appropriately on each new file.)

And if you don't assign <> to anything in the while loop, it'll automatically assign to $_ - which is the default thing that s/// operates on, which can be handy. It's also the default operand for lots of other operations.

So if you were just reading from the files, you could replace the whole "# Have files to parse..." loop with:

while (<>) {
  s/\t/        /;
}


That would just be missing the printing of the changed line and the editing behaviour.

In the one-liner I suggested, these are supplied by the -p and the -i command-line switch, respectively; see http://perldoc.perl.org/perlrun.html for more information on those.

You can also turn on -i inside the program by assigning to the magic variable $^I.
foxfirefey: Fox stealing an egg. (mischief)

[personal profile] foxfirefey 2012-02-07 12:54 am (UTC)(link)
There's a [community profile] command_liners community, if you like.
pne: A picture of a plush toy, halfway between a duck and a platypus, with a green body and a yellow bill and feet. (Default)

[personal profile] pne 2012-02-08 07:56 pm (UTC)(link)
Clearly I have a lot to learn still.

Well, there's a lot to learn, but not everyone is expected to know everything :)

Plus, There's More Than One Way To Do It (TMTOWTDI, tim-toady) in Perl.

It seems like it would be worth my while to also learn sed.

It depends, but having more tools in one's personal toolbox is often useful.

If you do learn sed, getting to know the basics of awk may also be useful. (And grep, if you don't know it already, as well as find and xargs, which are useful in connection with it, though less necessary if you have GNU grep.)