Kurt McKee

lessons learned in production

Hey there! This article was written in 2008.

It might not have aged well for any number of reasons, so keep that in mind when reading (or clicking outgoing links!).

Authorship and extensibility

Posted 20 February 2008 in renquist

Feed items can now have authors associated with them in the database using a one-to-many authors-to-items relationship. Renquist by default will only use a one-to-one relationship, however (which means that there will be one author entry for each item entry). At first I struggled with the idea that there would be significant data duplication; I'm the only one writing on this blog, so why store ten identical "Kurt" authors given ten feed items? Wouldn't it make more sense to store that information only once?

Yes, but then I considered the case of comment feeds. I know already that there have been three Davids who have posted on my site. One uses "Dave", another "David", and the third showed up once and called himself "David (a different one)". There could easily be a collision of names, and it would be foolish to think that just because someone says that he's "David" that he's the same "David" who posted three other comments. Make sense?

Therefore I'm choosing to leave it up to someone else (through an as-yet- unrealized plugin framework) to decide how best to minimize duplication. Maybe the plugin could merely minimize duplication by name; easy, but perhaps not ideal in all circumstances. Maybe the plugin could minimize duplication using more advanced means (those three Davids might be writing in three separate languages, for instance). There are at least two other methods that jump to mind, but it's easy to see the potential for a smart plugin.

In the meantime, however, Renquist will store author information despite likely data duplication.