no subject

Sophie's explanation is bang on and explains why I made that change. Thanks, Sophie!

I will take another moment here to make a comment about what "calling a method" actually does, behind the scenes, which helps explain why this particular case was so slow. (Wherein I ramble about computers.)

** What does this code do?

# this is my cool program, it returns my favorite number
return 4;

If you write that and save it as cool.pl and run it... what do you think it does? Well, let's try it:

Can't return outside a subroutine at cool.pl line 2.

This makes intuitive sense. You aren't in a subroutine -- so how can you return? You haven't gone anywhere to return from! The very word itself means "to go back" -- importantly, to a place you've already been.

** How do computers know how to do this?

To over-simplify, computers are very linear devices that go from point A to point B. You give them a set of instructions (commands) and they execute them, start to finish. Some of those instructions include things like "go execute this other code".

When that happens, the computer needs some way of knowing where it's been so that it can get back when you execute a return. The way they keep track of that is a thing called the stack. It is, in its most basic form, a way of keeping track of where you are in the program and where you've been so that you can get back. Kind of like a bookmark.

For this example, I'll be using the following small program:

1: sub cool {
2:     my $num = shift;
3:     return 0 if $num == 0;
4:     return cool($num - 1) + rand();
5: }
6: 
7: print cool(1);   # prints one random number
8: print cool(2);   # prints the sum of two random numbers
9: print cool(4);   # sum of four random numbers

This is a very silly example, but it gives us something recursive and (hopefully) easy to look at. Basically, it calls itself (recursively) the number of times you specify and returns the sum of that many random numbers.

** The stack, step by step!

When your program starts, the stack consists of one thing:

1. start of program [line 7, CURRENT]

I.e., "I'm at the beginning!" is what this bookmark says. Then it's going to do the first line of code: print cool(1);. This code tells it to execute the cool subroutine with the argument of 1.

To do this, the computer creates a new frame and pushes it on the stack. In other words, it creates a new bookmark so it can remember where it was and where it's going. The stack now looks like this:

1. start of program [line 7] 2. subroutine cool, arguments 1 [line 1, CURRENT]

Now, the code starts running. Yay! Eventually, it needs to leave the subroutine, i.e., it needs to return. To do that, the computer uses the stack so it knows where to go to leave. This is called popping from the stack. Once the computer does this, the stack looks like it did before:

1. start of program [line 7, CURRENT]

Now the computer knows where to go back to, so it can resume executing. It increments the current line and yay! Now you're on line 8!

** So... what next?

As you might expect, line 8 wants to call the cool subroutine again. It goes through the very same process: create a stack frame, push it onto the stack, jump to line 1, start executing. Eventually it returns by popping off the stack frame, jumping back to where it was, and continues execution of your program.

This is a lot of work. Every time you call a subroutine, it has to go through this pretty involved process to do all of the bookmarking required to properly jump around between different subroutines. For this reason (and others), calling a subroutine (or method) is a lot slower than you might expect.

In a tight loop like the one I changed the other day, all of that bouncing back and forth involves creating tens of thousands of stack frames. While individually they're very fast, 50,000 of anything is a lot slower than not doing it at all. In this case, it added up to approximately three seconds of time spent just doing bookkeeping. Oh well.

** The rest of the story I didn't talk about.

In reality, it's a lot slower due to dynamic/late binding. Perl has to look up exactly which line of code you are talking about when you tell it to call a method. Because of inheritance, it has to look in a bunch of places to see what method it should call.

Because of polymorphism and because you can change an object's class pretty much at will, it has to do this logic every time you call a method. It doesn't do any caching. This is extremely slow.

(18 comments)

no subject

Post a comment in response: