Geeky Muse (
delladea) wrote in
dw_dev_training2012-02-03 01:52 pm
Entry tags:
Strip tab characters from multiple text files and replace them with spaces (or something else)
I switch between Gedit, Notepad++, and vim fairly often depending on what I'm doing and whose computer I'm on. Sometimes I end up with tab characters where I really wanted four spaces, mainly when I'm using vim and I haven't figured out how to get vim to not do this. Gedit and Notepad++ have settings to use spaces instead of tabs, so there's no issue there.
Either I don't notice the tab characters until after I've put lots of them in the file I'm editing, or I'm editing a file from someone else whose editor uses tab characters for indentation. I know its not a big deal to some people, but tab indentation mixed with space indentation is a huge pet peeve of mine.
Thus, a perl script was born:
View Gist (strip-tabs.pl)
Feel free to gank away if you find it useful!
Either I don't notice the tab characters until after I've put lots of them in the file I'm editing, or I'm editing a file from someone else whose editor uses tab characters for indentation. I know its not a big deal to some people, but tab indentation mixed with space indentation is a huge pet peeve of mine.
Thus, a perl script was born:
View Gist (strip-tabs.pl)
Feel free to gank away if you find it useful!

no subject
Another way of doing this is using sed, like this:
sed -ie 's/\t/ /g' yourfile.pl
That will replace all tabs with four spaces in yourfile.pl. You can do it in all files of a certain type with something like this:
find -name \*.pl -exec sed -ie 's/\t/ /g' {} \;
That will find all Perl files (starting in the current directory and going down, so it'll recurse into any subdirectories) and replace the tabs with spaces.
These work from a Linux/Mac command line. Windows, you're out of luck.
no subject
I spent many years as a Windows client-server application developer in VB6 and VB.NET. One-liners don't exist there, and so usually my brain doesn't go there! I'm trying to broaden my horizons though *g*.
no subject
There's definitely a lot of ways to accomplish any given task in the *nix environment. One of the Perl mottos is "TMTOWDTI" (There's More Than One Way To Do It), but in a lot of ways that's really true of the Unix culture that Perl grew out of. =)
The settings you're looking for in vim are expandtab (convert tabs to spaces), tabstop (how many spaces a tab counts for), and shiftwidth (how many spaces an indent should be). In my case, I want a four space indent using spaces not tabs, so I have this line in my .vimrc file:
set expandtab tabstop=4 shiftwidth=4This article in the vim wiki elaborates a bit more.
Thanks for posting this! Tabs in source files are the bane of my existence too. ^_^;
no subject
I did not know this setting existed in gedit until right now. Congratulations, you have just made my day more magical.
no subject
This just made my morning. :D
no subject
no subject
sed -ie 's/\t/ /g' yourfile.pl
Perl also has an -i switch (I think due, in part, to its originally having to compete with the established sed).
So you could boil down the script to something like
perl -i -pe "s/\t/ /g" file1 file2 file3.no subject
no subject
no subject
$num_args = $#ARGV + 1;An array in scalar context evaluates to the number of its elements, so this could be
$num_args = @ARGV;instead.I like to separate
$#fooandscalar @foo, and use the former only in contexts where it means "index of the last entry" (for example, in a for loop iterating over the indices of an array) and the latter when I want a number of elements.(Also,
$#foois sensitive to setting$[, but you shouldn't mess with that variable anyway.)The other one is that "iterating over the files in @ARGV" is such a common use case that Perl has a shortcut for this.
If you read from the empty filehandle (as in
while (<>)with nothing in between the angle brackets), you'll get a line at a time from all of the files in succession. Perl will automatically handle opening them and closing them for you. And if you didn't supply any file names, Perl will read from standard input. (This is a bit like Unix tools such asgziporgrepwhich will also work on standard input if there are no file name arguments.) See http://perldoc.perl.org/perlop.html#I/O-Operators for more on this. (That also mentions that you can find out which file you're currently on by examining$ARGV, which the magic will set for you appropriately on each new file.)And if you don't assign
<>to anything in the while loop, it'll automatically assign to$_- which is the default thing thats///operates on, which can be handy. It's also the default operand for lots of other operations.So if you were just reading from the files, you could replace the whole "# Have files to parse..." loop with:
while (<>) { s/\t/ /; }That would just be missing the printing of the changed line and the editing behaviour.
In the one-liner I suggested, these are supplied by the -p and the -i command-line switch, respectively; see http://perldoc.perl.org/perlrun.html for more information on those.
You can also turn on -i inside the program by assigning to the magic variable
$^I.no subject
no subject
I haven't had the chance yet to revisit this yet, but I wanted to thank you for taking the time to reply. Clearly I have a lot to learn still. It seems like it would be worth my while to also learn sed.
no subject
Well, there's a lot to learn, but not everyone is expected to know everything :)
Plus, There's More Than One Way To Do It (TMTOWTDI, tim-toady) in Perl.
It seems like it would be worth my while to also learn sed.
It depends, but having more tools in one's personal toolbox is often useful.
If you do learn sed, getting to know the basics of awk may also be useful. (And grep, if you don't know it already, as well as find and xargs, which are useful in connection with it, though less necessary if you have GNU grep.)