o9s
is a microblog
home archives search feed blogroll


Things And Data

I’ve been thinking about how Reddit stores its data. They use the concept of things and data as described in this post from yesterday.

I’m currently designing a CMS with a friend and I want to push on this idea for our database instances. I want to only have 2 tables in our schema. A Thing table and a Data table.

A thing will have an id, a type, a create_date and an update date. A data will have a thing_id, a key and a value. A thing will have multiple data.

Diagram

We want to be able to support multiple languages. I would do it by appending the language’s code to the data key. For example, the key for the title wouldn’t be title, it would be title.en or title.fr or title.es. Fields that are not translatable, such as the ones containing numbers, will not have a language code appended to their keys.

Using this approach, the database queries will be slower. But considering that, in our case, a database instance is never going to contain a great number of rows, I don’t expect the impact to be noticeable.

The SQL queries will be harder to write. A SQL query to retrieve posts will look something like this.

SELECT
    thing.*,
    (
        SELECT `value`
        FROM `data`
        WHERE thing_id = thing.id
        AND `key` = "title"
        LIMIT 1
    ) AS title,
    (
        SELECT `value`
        FROM `data`
        WHERE thing_id = thing.id
        AND `key` = "body"
        LIMIT 1
    ) AS `body`
FROM thing
WHERE thing.type = "post"

Using a standard 3FN database schema, it would be much simpler.

SELECT * FROM posts

It will also not be possible to guarantee that a row is unique. It will all have to be done through application code.

The good news is that the database migrations are going to be easier. It’s not hard to migrate from version 1 to version 2 or from version 2 to version 3. It gets more complicated when you want to update from version 1 to version 3. All we will have to do is deal with non-existent rows in our application code. It can be mitigated by using the Null Object design pattern.

I also think that I will be able to write code to write the long SQL queries for me. The queries all follow the same pattern. There is always a -

SELECT ...
FROM thing
WHERE thing.type = ?

- part. There are always -

SELECT `value`
FROM `data`
WHERE thing_id = thing.id
AND `key` = ?
LIMIT 1

- parts.

Additionally, no time will be spent designing the database. At the very least, no more time than the time it took me to write this post.

Posted on 2018-02-02   #database     #sql  






← Next post    ·    Previous post →