kareila: "Mom, I'm hungry." "Hush, I'm coding. You ate yesterday." (coding)
kareila ([personal profile] kareila) wrote in [site community profile] dw_dev_training2009-08-02 09:29 pm
Entry tags:

Kareila's intro to database design

[staff profile] denise sez:

I'd really like to see someone's thought processes while implementing something big/major that relies on working with DB stuff: what's the best practice bits that we just won't think about because we don't know enough? What sort of stuff should we do just because that's the best way to do it? Etc.


I am not a professional database designer, but I've read a few books and learned my way around MySQL, so I'll try to share my thoughts by way of the spec I've implemented for Bug 215: Implement v-gifts. Fair warning: although that spec has benefited from one round of peer review, it hasn't been approved yet, so it might will change again.

1. Objects, Properties, and Uniqueness



Each table in a database should completely describe the properties specific to one group of similar objects. Obviously the virtual gifts themselves comprise one table (vgiftids), and each gift transaction belongs in another table (vgifts). To come up with the needed properties, I look at the proposed functionality and figure out what data I need to store to provide that functionality. For the vgiftids, the obvious properties are a name, a description, a picture, and a cost. I've also decided to create space for some useful metadata such as creator (who designed the gift), creation date, and a couple of Y/N markers called "featured" and "active" that will let us control how the gift is displayed in the store.

I'll quote the spec for vgiftids here so you don't have to keep flipping back and forth:

CREATE TABLE vgiftids (
    vgiftid    INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
    name       VARCHAR(80) NOT NULL UNIQUE,
    created    INT UNSIGNED NOT NULL,   #unixtime
    creator    INT UNSIGNED NOT NULL DEFAULT 0,
    active     ENUM('Y','N') NOT NULL DEFAULT 'N',
    featured   ENUM('Y','N') NOT NULL DEFAULT 'N',
    desc       VARCHAR(255),
    pickey     VARCHAR(255),
    cost       FLOAT(2,2),

    INDEX (name),
    INDEX (created)
)


I'm not going to explain the SQL syntax here, beyond the fact that the first word on each line is the name of a column in our table, and the second word is the type of data it will contain (numbers or strings). That's stuff you can copy out of other tables that contain similar data to what you want. What I want to focus on here are the phrases "NOT NULL", "PRIMARY KEY", "DEFAULT" and "UNIQUE". Judicious application of these keywords is crucial to a well-designed table.

Each row in a table should uniquely describe a single instance of an object. Having duplicate rows in a table is a bad, bad thing. The most common way to enforce this restriction is to define a column or a group of columns as the table's primary key, and the most common form of primary key is a unique ID number. In my vgiftids table, vgiftid is the primary key. Although the name column also contains unique values, names can be changed; primary keys are forever, until the row is deleted. (You can also think of userids vs. usernames as another illustration of this principle.) By further specifying vgiftid as AUTO_INCREMENT, I've told the database to keep track of this column for me. Every time I create a new row, it will automatically give the new row the next vgiftid number in the sequence.

Advanced topic: explain my primary key for the vgift_tagids table. (Hint: I explain it further down, but I want you to think about it here before continuing.)

Okay, before we move on, we need to consider not only desired database values, but allowed database values. What if we add a row that doesn't have a value specified in every column? The primary key is calculated via auto-increment, but any other column that is left blank will show up as NULL, unless we've specified a default value. In my vgiftids table, my "featured" and "active" flags default to "N", and if a creator is not specified, that defaults to 0 (which is an invalid userid).

You want to carefully consider which values could legitimately be allowed to be NULL. I've decided that a gift could have no description, picture or cost. No cost? On second thought, maybe that should also have a default of 0. A gift with no picture or description would be boring, but maybe it might have one but not the other? I've decided not to enforce a value here. All the other columns in my table have a NOT NULL qualifier, which means the database will throw an error if you try to create or update a record that has no value set for that column. I have two columns specified NOT NULL that have no DEFAULT (name & created), which means I must specify those every time I create a record.

Advanced topic: explain why "created" doesn't default to the current SQL timestamp. (Hint: it has to do with timezones.)

One last thing: defaults only apply upon record creation. You can later update a column with a default value to be NULL unless you've also specified it's NOT NULL. That's why you still have to specify NOT NULL even when you have a DEFAULT, if you want to be sure it has a value.

(I think PRIMARY KEY also enforces NOT NULL, but it doesn't hurt to specify both.)

2. And You Thought Your Relationships Were Complicated



In my vgifts table (the one that lists gift purchases), I've specified "transid" as the primary key. But I also have a column for vgiftid, which specifies which gift was given in the transaction. Guess what? The values in the vgiftid column for that table match up to the ones in the vgiftids table! That defines a relationship between the two tables.

Advanced topic: explain why vgiftid can't be a primary key for the vgifts table. (Hint: in my next version of the spec, I will probably rename this table vgift_trans.)

That means we can do something like this:

SELECT name, desc, pickey FROM vgiftids WHERE vgiftid IN ( SELECT vgiftid FROM vgifts WHERE rcptid = 3 ORDER BY delivery_t DESC LIMIT 1 );


That would, theoretically, return the properties describing the most recent gift given to [staff profile] denise; her list of gifts is in the vgifts table, but the gifts are actually described in the vgiftids table.

Advanced topic: name the two reasons the above query won't run as given. (Hint: I mention one of the problems in a comment on the bug, but don't give an explanation.)

Do you see why we've designed it this way? By having only the vgiftid stored in the transaction, and looking up the other data associated with the vgiftid when we need it, we avoid data duplication and make it easy to change the properties associated with a gift in that one row of the vgiftids table, instead of having to do a search and replace on every copy of the gift that was ever given in the vgifts table. Each table in a database should completely describe the properties specific to one group of similar objects. The name and description of the gift are not properties of the gift transaction.

Of course, one user could receive multiple copies of the same gift from different people, or the same person at different times, or even different gifts at the same time. That's why there are no unique keys in vgifts apart from the primary key, transid.

Have you looked yet at the other two tables, vgift_tagnames & vgift_tagids? I designed those in response to feedback that specifying gift categories as columns of the vgiftids table wasn't flexible enough. The vgift_tagnames table defines the names and corresponding ids of the tags, but what about vgift_tagids?

CREATE TABLE vgift_tagids (
    tagid      INT UNSIGNED NOT NULL,
    vgiftid    INT UNSIGNED NOT NULL,

    PRIMARY KEY (tagid, vgiftid)
)


This table defines the relationships between the tagids and the vgiftids, but one gift can have many tags, and one tag can apply to many gifts! There are no unique columns in this table; the constraint is that there can only be one row describing a given combination of the two (and no NULL values; the relation of a thing to nothing is not of interest). Each row in a table should uniquely describe a single instance of an object. In this case, the object is a mapping between two other types of object, and it's a complex enough relationship that we can't contain it in the other tables.

Okay, I've probably given you enough to think about for one post. If there's interest, I'll do a writeup on database queries after I go off and write the functions that will interact with my vgift tables.

And of course, please point out if you see anything I got wrong! I don't want to lead anyone astray.
afuna: Cat under a blanket. Text: "Cats are just little people with Fur and Fangs" (Default)

[personal profile] afuna 2009-08-03 07:11 am (UTC)(link)

I've also heard of foreign keys but I couldn't find a single occurrence of the word "foreign" in update-db-general.pl so it can't be that important!


I think [staff profile] mark agrees with you!

http://bugs.dwscoalition.org/show_bug.cgi?id=180#c17

yvi: Kaylee half-smiling, looking very pretty (Default)

[personal profile] yvi 2009-08-03 07:32 am (UTC)(link)
But, but, foreign keys... :)
owl: Stylized barn owl (Default)

[personal profile] owl 2009-08-03 08:18 am (UTC)(link)
O_O

Well, they are nice from stopping some clampett from entering an address with a state of BY or something, app then proceeds to go, "Wait, what, I can't sell X product in this BY place!", falls over, Blackberries go off....
allen: (explosiverunes)

[personal profile] allen 2009-08-03 05:35 pm (UTC)(link)
Just for reference, I respectfully disagree with [staff profile] mark here. I think this thread at Stack Overflow gives a pretty good overview on the pros and cons of using foreign keys.