dw_dev_training | Strip tab characters from multiple text files and replace them with spaces (or something else)

I switch between Gedit, Notepad++, and vim fairly often depending on what I'm doing and whose computer I'm on. Sometimes I end up with tab characters where I really wanted four spaces, mainly when I'm using vim and I haven't figured out how to get vim to not do this. Gedit and Notepad++ have settings to use spaces instead of tabs, so there's no issue there.

Either I don't notice the tab characters until after I've put lots of them in the file I'm editing, or I'm editing a file from someone else whose editor uses tab characters for indentation. I know its not a big deal to some people, but tab indentation mixed with space indentation is a huge pet peeve of mine.

Thus, a perl script was born:

#!/usr/bin/perl -w

use File::Copy;

# This script replaces tab characters in a file with four spaces, or whatever else you want

$num_args = $#ARGV + 1;
$num_warnings = 0;
$tab_replace = "    ";  # Change this to whatever you want in place of tabs

if ( $num_args == 0 ) {
    print "usage: strip-tabs.pl file1 [file2, file3...]\n";
    exit;
}

# Have files to parse...
for my $f ( 0 .. ($num_args - 1) ) {
    if ( open(INPUTFILE, "<$ARGV[$f]") ) {
       if ( open(OUTPUTFILE, ">$ARGV[$f]~") ) {
           # Start parsing the file
           while ( my $line = <INPUTFILE> ) {
               $line =~ s/\t/$tab_replace/g;
               print OUTPUTFILE $line;
           }
           # Copy over original file here
           close(OUTPUTFILE);
           close(INPUTFILE);
           if ( !move("$ARGV[$f]~", $ARGV[$f]) ) {
               print "Could not write output to file $ARGV[$f]: $!\n";
               $num_warnings += 1;
           }
       }
       else {
           close(INPUTFILE);
           print "Could not create output file for $ARGV[$f]: $!\n";
           $num_warnings += 1;
       }
    }
    else {
        print "Could not open $ARGV[$f] for reading: $!\n";
        $num_warnings += 1;
    }
}
die "$num_warnings warnings encountered during file operation." unless $num_warnings == 0;

View Gist (strip-tabs.pl)

Feel free to gank away if you find it useful!

Threaded | Top-Level Comments Only

Cool, thanks for sharing!

Another way of doing this is using sed, like this:

sed -ie 's/\t/ /g' yourfile.pl

That will replace all tabs with four spaces in yourfile.pl. You can do it in all files of a certain type with something like this:

find -name \*.pl -exec sed -ie 's/\t/ /g' {} \;

That will find all Perl files (starting in the current directory and going down, so it'll recurse into any subdirectories) and replace the tabs with spaces.

These work from a Linux/Mac command line. Windows, you're out of luck.

Edited 2012-02-03 20:34 (UTC)

Neat! And quite a bit more efficient! :) I'm always amazed at how many different ways there are to accomplish a given task on *nix-systems.

I spent many years as a Windows client-server application developer in VB6 and VB.NET. One-liners don't exist there, and so usually my brain doesn't go there! I'm trying to broaden my horizons though *g*.

There's definitely a lot of ways to accomplish any given task in the *nix environment. One of the Perl mottos is "TMTOWDTI" (There's More Than One Way To Do It), but in a lot of ways that's really true of the Unix culture that Perl grew out of. =)

The settings you're looking for in vim are expandtab (convert tabs to spaces), tabstop (how many spaces a tab counts for), and shiftwidth (how many spaces an indent should be). In my case, I want a four space indent using spaces not tabs, so I have this line in my .vimrc file:

    set expandtab tabstop=4 shiftwidth=4

This article in the vim wiki elaborates a bit more.

Thanks for posting this! Tabs in source files are the bane of my existence too. ^_^;

Gedit and Notepad++ have settings to use spaces instead of tabs, so there's no issue there.

I did not know this setting existed in gedit until right now. Congratulations, you have just made my day more magical.

Thank you so much!! I knew there had to be a way to change tab settings in vim.

This just made my morning. :D

Yaaaay, so glad to hear! \o/ I think there's a way to do almost anything in vim, but sometimes the path to get there can be a little bit arcane. =)

Another way of doing this is using sed, like this:

sed -ie 's/\t/ /g' yourfile.pl

Perl also has an -i switch (I think due, in part, to its originally having to compete with the established sed).

So you could boil down the script to something like perl -i -pe "s/\t/ /g" file1 file2 file3 .

Edited 2012-02-06 10:10 (UTC)

Would you be interested in some comments on the Perl code and suggestions for other ways to do things?

Yes, I'm always open for suggestions! :)

OK, here are the two big thoughts I had.

$num_args = $#ARGV + 1;

An array in scalar context evaluates to the number of its elements, so this could be $num_args = @ARGV; instead.

I like to separate $#foo and scalar @foo, and use the former only in contexts where it means "index of the last entry" (for example, in a for loop iterating over the indices of an array) and the latter when I want a number of elements.

(Also, $#foo is sensitive to setting $[, but you shouldn't mess with that variable anyway.)

The other one is that "iterating over the files in @ARGV" is such a common use case that Perl has a shortcut for this.

If you read from the empty filehandle (as in while (<>) with nothing in between the angle brackets), you'll get a line at a time from all of the files in succession. Perl will automatically handle opening them and closing them for you. And if you didn't supply any file names, Perl will read from standard input. (This is a bit like Unix tools such as gzip or grep which will also work on standard input if there are no file name arguments.) See http://perldoc.perl.org/perlop.html#I/O-Operators for more on this. (That also mentions that you can find out which file you're currently on by examining $ARGV, which the magic will set for you appropriately on each new file.)

And if you don't assign <> to anything in the while loop, it'll automatically assign to $_ - which is the default thing that s/// operates on, which can be handy. It's also the default operand for lots of other operations.

So if you were just reading from the files, you could replace the whole "# Have files to parse..." loop with:

while (<>) {
  s/\t/        /;
}

That would just be missing the printing of the changed line and the editing behaviour.

In the one-liner I suggested, these are supplied by the -p and the -i command-line switch, respectively; see http://perldoc.perl.org/perlrun.html for more information on those.

You can also turn on -i inside the program by assigning to the magic variable $^I.

There's a

command_liners community, if you like.

(My apologies for getting to this late.)

I haven't had the chance yet to revisit this yet, but I wanted to thank you for taking the time to reply. Clearly I have a lot to learn still. It seems like it would be worth my while to also learn sed.

Clearly I have a lot to learn still.

Well, there's a lot to learn, but not everyone is expected to know everything :)

Plus, There's More Than One Way To Do It (TMTOWTDI, tim-toady) in Perl.

It seems like it would be worth my while to also learn sed.

It depends, but having more tools in one's personal toolbox is often useful.

If you do learn sed, getting to know the basics of awk may also be useful. (And grep, if you don't know it already, as well as find and xargs, which are useful in connection with it, though less necessary if you have GNU grep.)

Strip tab characters from multiple text files and replace them with spaces (or something else)

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject