dw_dev_training | DW object-oriented programming explained (Part 2)

Welcome to the second part of the series on object-oriented programming - or OO - as it applies to the Dreamwidth codebase. :)

If you haven't already read the first part, you'll want to do that before reading this part. I also realise that I never got around to explaining what 'methods' are in the first post, so I'm going to do that right now before delving into the main part of this post:

What are methods?

Recall from the previous post that each object in both real life and OO have what are called "properties" - pieces of information about the object. Each object constructed from the same class will have the same property *names* (eg. "number_of_pages"), but different *values* (one book might have 500 pages, another might have 150, etc).

But these aren't enough to fully allow an object to work. For example, take an iPod. It would have a number of properties, such as "color" and "disk_space", but they don't help describe what the iPod *does* - plays music.

When an iPod is used to play music, the user generally just selects a song and hits Play. That's all the user needs to know; the iPod itself takes care of the tricky parts, like making sure the status on the screen is up-to-date with what's happening, pumping audio through those earbuds, and turning your body into a silhouette. Okay, maybe it doesn't do that last one, but still, the point is that it knows how to deal when someone wants to play music. It's what it was designed to do, after all.

And that's what methods are for. Class methods are there to deal with stuff that other programmers shouldn't have to care about - they can just tell an object to do something, and it does it. Methods are defined in the class - the blueprint - but when a programmer using an object invokes one of these methods, the method gets access to the object's memory store, which allows it to take action appropriate for that *particular* object.

That's a little confusing. Let me try to explain it in terms of the iPod. Let's say we have an "iPod" object and the class it was constructed from has a method called "play_song". When this method is invoked for a particular song, the code that's called isn't tailored for that specific iPod - it's the same code that runs for all "iPod" objects.^(*) But some magic in the programming language allows the code to gain access to the property values of that specific "iPod" object, which will have everything the iPod needs to know to play the song it was given, such as the current volume level, etc.

(Before I leave this subject, I wanted to note that in the comments on my last post, my quick explanation of methods involved having a "nextPage" method on a Book class. After further reflection, I figured that this probably wasn't quite accurate, because you can't ask a book to turn its own pages - you have to do that yourself. Hence, I used a new example here.)

^(*) Of course, in real life an iPod has an actual copy of the code to itself stored in a microchip. If you think of the construction process, however, each real-life iPod that's constructed will have the same code in its microchip, which isn't tailored for any particular manufactured iPod - so it still kinda makes sense.

As with the last post, if you have any questions on this, feel free to let me know in the comments!

So, with that explanation of methods out of the way, it's time to move onto our next topic - how it applies to the DW codebase.

I'm going to do this as a few posts, each dealing with their own topic, because I've got a fair amount to say about them. I'm still not entirely sure how many there'll be, but I'm writing them one at a time so there may be some time (a few days to a week) between each one.

A couple of things to note before I begin:

This post may require some basic knowledge of Perl and/or programming in general. Not much, I promise! (Things such as what a 'string' is, etc.) But all the same, if anybody finds themselves confused by anything I write, feel free to ask for clarification in the comments. I won't bite!

Secondly, if you're used to OO from another language, you'll find some things about Perl's implementation of OO to be strange and baffling. That's because Perl wasn't actually designed with OO in mind; OO support came later, and to be honest, it shows. Still, it's what we use, so I hope I can at least help with understanding it.^(**)

^(**) There is a version of Perl in the works which does a much better job of not only OO but a lot of other things - Perl 6 - but at the cost of revamping a lot of the language such that you probably wouldn't be able to use it without spending some time making sure your code conformed to it. For this series, therefore, I'll be concentrating on Perl 5, which is what most Perl developers - including DW and LJ - use.

With all that said, let's move onto our first topic!

What is an 'object' in Perl?

You may already have seen examples of OO in Perl in the Dreamwidth codebase. For example, when you see something like:

$ret .= "<td>" . $u->ljuser_display . "</td>";

...what you're actually seeing is the coder calling the 'ljuser_display' method on an object called $u. The value that method gives back is then inserted into a string.

But wait. Aren't Perl variables beginning with a dollar symbol supposed to be scalars (variables holding a single value), not objects?

To explain this, let me explain briefly the three different types of variables to be found in Perl:

Scalars: These variables begin with a dollar symbol ($), and represent a single value.
Lists: These variables begin with an at-sign (@), and represent a series of values which are accessed by number. Other languages might know this as an 'array'.
Hashes: These variables begin with a percent sign (%) and represent an unordered list of named values, and each value can be accessed by using its name. Other languages might know this as an 'associative array'.

So clearly, $u must be a scalar, because it begins with a dollar sign. But it's *also* an object, and that's not on the list above. Wha?

Here's the thing - unlike other languages, Perl doesn't have separate 'Object' types. Instead, when you create an object, what you're *really* doing is taking a scalar and "blessing" it as an object of a certain class. (Seriously, that's what it's called.) After that, you can use class methods on the scalar.

Why would anybody do such a thing? Because the scalar represents that object's internal memory store.

I didn't tell you this above, but although it's true that scalars can only represent a single value, that single value can be a reference to another variable. That's allowable because Perl does it by storing the memory location of that variable as the value. (Other languages can also do this, but they're known as 'pointers'.)

Recall from the last post that the memory store of an object consists of 'properties', which are named values, such as 'number_of_pages'. As such, the internal memory store of an object is best represented as a hash. But Perl doesn't let you "bless" a hash directly, so instead you create a reference to the hash (or in Perl parlance, a "hashref"), put it in a scalar, and then bless the scalar. It's a roundabout way of doing it, but because of the convenience of having the memory store variable *right there*, it works.

There's one problem with this, and that's that if you have the variable that represents the object, you also have access to its internal memory store, because you can still use the scalar as a normal hash by using a syntax such as:

my $id = $u->{'userid'};

Here, we're using the "->" syntax to say that we know that $u contains a reference of some kind, and that we want to get to the variable that it's pointing to. We then use that variable as a hash to get to the property named 'userid'.

Now, if this is code within the class itself, then this is generally fine. In most other cases, however, it's bad form to peek directly into the memory store of another object, even if you do have it right there. That's because you don't generally know how that object uses its memory store; it's possible that any information you grab might be out of date, for example. Worse, the layout of the memory store might change in the future; after all, it's only intended to be an *internal* memory store, and as long as the object knows how to deal with its own memory store, that's all that's really required.

Instead, most classes will supply methods that can get you the value you want. (Rather appropriately, they tend to be informally called "getters".) In the example above, although I didn't show its creation, I can tell you that $u is an LJ::User object, and the class for LJ::User defines a method called "id" that will get you the same information, so you can write the above line like so:

my $id = $u->id;

Perl reuses the "->" syntax even when you want to call a method; I'm not entirely sure why. In any case, here we're calling the "id" method to gain the userid instead of looking directly into the memory store, and the class itself gets to decide how to give us the information we want. With this, we can be sure that if LJ::User's memory store layout changes in the future, we'll still get what we need.

(In practice, this is unlikely to be an issue in DW's codebase, and indeed a lot of code in there *does* use the memory store instead of the appropriate method. It isn't a good idea, though, and it makes future code maintenance much easier if getters are used instead.)

That's about it for this post. There's a lot of stuff here so feel free to ask questions if there's anything you don't understand! My next post will probably talk about how you can create and use an object, as well as some example of existing classes in the codebase.

Flat | Top-Level Comments Only

So I did some more work on that same code after

mark touched it. It turns out that that preload_rows() call was a big problem--the implementation was that in order to see if we had any more comments to load, it scanned through all of the comments that had been touched on that request to see if any hadn't been loaded yet. Now, the method that

mark modified was also scanning through all of the loaded comments to check for a different setting... So by having that nested call to preload_rows in there, that moved it from checking n comments to checking n^2 comments. Very very bad.

I don't remember if I fixed that in the update that I made, but it's certainly a fixable issue. If the calls to nodeid() (and therefore preload_rows()) is cheap, then calling nodeid() vs. $_->{nodeid} shouldn't make that much difference. I mean, it'll make some, and if you're really seriously optimizing that could be worth it, but chances are there's some other underlying problem.

Ahhhh. Thanks for that explanation! I had only been looking at the most recent version of nodeid/preload_rows (because I assumed they hadn't been changed), and it did seem odd to me that it was going as slowly as that. That makes a lot more sense, now.

It definitely seemed odd to me that pure getters should have *that* much of an effect! Glad to see this was just a mistake on my part.

DW object-oriented programming explained (Part 2)

What are methods?

What is an 'object' in Perl?

no subject

no subject