Geeky Muse (
delladea) wrote in
dw_dev_training2012-02-03 01:52 pm
Entry tags:
Strip tab characters from multiple text files and replace them with spaces (or something else)
I switch between Gedit, Notepad++, and vim fairly often depending on what I'm doing and whose computer I'm on. Sometimes I end up with tab characters where I really wanted four spaces, mainly when I'm using vim and I haven't figured out how to get vim to not do this. Gedit and Notepad++ have settings to use spaces instead of tabs, so there's no issue there.
Either I don't notice the tab characters until after I've put lots of them in the file I'm editing, or I'm editing a file from someone else whose editor uses tab characters for indentation. I know its not a big deal to some people, but tab indentation mixed with space indentation is a huge pet peeve of mine.
Thus, a perl script was born:
View Gist (strip-tabs.pl)
Feel free to gank away if you find it useful!
Either I don't notice the tab characters until after I've put lots of them in the file I'm editing, or I'm editing a file from someone else whose editor uses tab characters for indentation. I know its not a big deal to some people, but tab indentation mixed with space indentation is a huge pet peeve of mine.
Thus, a perl script was born:
View Gist (strip-tabs.pl)
Feel free to gank away if you find it useful!

no subject
no subject
$num_args = $#ARGV + 1;An array in scalar context evaluates to the number of its elements, so this could be
$num_args = @ARGV;instead.I like to separate
$#fooandscalar @foo, and use the former only in contexts where it means "index of the last entry" (for example, in a for loop iterating over the indices of an array) and the latter when I want a number of elements.(Also,
$#foois sensitive to setting$[, but you shouldn't mess with that variable anyway.)The other one is that "iterating over the files in @ARGV" is such a common use case that Perl has a shortcut for this.
If you read from the empty filehandle (as in
while (<>)with nothing in between the angle brackets), you'll get a line at a time from all of the files in succession. Perl will automatically handle opening them and closing them for you. And if you didn't supply any file names, Perl will read from standard input. (This is a bit like Unix tools such asgziporgrepwhich will also work on standard input if there are no file name arguments.) See http://perldoc.perl.org/perlop.html#I/O-Operators for more on this. (That also mentions that you can find out which file you're currently on by examining$ARGV, which the magic will set for you appropriately on each new file.)And if you don't assign
<>to anything in the while loop, it'll automatically assign to$_- which is the default thing thats///operates on, which can be handy. It's also the default operand for lots of other operations.So if you were just reading from the files, you could replace the whole "# Have files to parse..." loop with:
while (<>) { s/\t/ /; }That would just be missing the printing of the changed line and the editing behaviour.
In the one-liner I suggested, these are supplied by the -p and the -i command-line switch, respectively; see http://perldoc.perl.org/perlrun.html for more information on those.
You can also turn on -i inside the program by assigning to the magic variable
$^I.no subject
I haven't had the chance yet to revisit this yet, but I wanted to thank you for taking the time to reply. Clearly I have a lot to learn still. It seems like it would be worth my while to also learn sed.
no subject
Well, there's a lot to learn, but not everyone is expected to know everything :)
Plus, There's More Than One Way To Do It (TMTOWTDI, tim-toady) in Perl.
It seems like it would be worth my while to also learn sed.
It depends, but having more tools in one's personal toolbox is often useful.
If you do learn sed, getting to know the basics of awk may also be useful. (And grep, if you don't know it already, as well as find and xargs, which are useful in connection with it, though less necessary if you have GNU grep.)