How Reddit Does Data

The article referenced is from 2010. It states that Reddit uses only two tables to store all of its data.

Everything in Reddit is a Thing: users, links, comments, subreddits, awards, etc. Things keep common attribute like up/down votes, a type, and creation date. The Data table has three columns: thing id, key, value. There’s a row for every attribute. There’s a row for title, url, author, spam votes, etc.”

I mocked up a diagram of what this looks like.


The Thing table looks something like this.

id type create_date upvote_count downvote_count
1 subreddit” 2010/01/01 0 0
2 comment” 2010/06/01 5 2
3 comment” 2010/04/01 1 3
4 subreddit” 2010/01/01 0 0
5 award” 2010/04/01 0 0

The Data table looks something like this.

thing_id key value
1 title” The best subreddit in the universe”
1 slug” the-best-subreddit-in-the-universe”
1 description” This is a good subreddit. Please subscribe.”
2 body” This is the body of a comment”

I don’t think upvote_count and downvote_count should be in Thing. They should be in Data. Subreddits don’t have upvotes or downvotes. Awards don’t have upvotes or downvotes. They are not common attributes like the article suggests.

The article goes on to discuss the pros and cons of such an approach.

When they add new features they didn’t have to worry about the database anymore. They didn’t have to add new tables for new things or worry about upgrades. Easier for development, deployment, maintenance.

The price is you can’t use cool relational features. There are no joins in the database and you must manually enforce consistency. No joins means it’s really easy to distribute data to different machines. You don’t have to worry about foreign keys are doing joins or how to split the data up. Worked out really well. Worries of using a relational database are a thing of the past.”

User kemitche commented on Reddit explaining that the article is wrong. He says: we’ve got two tables per thing. That means Accounts have an account_thing” and an account_data” table, Subreddits have a subreddit_thing” and subreddit_data” table, etc.”

This explains why upvote_count and downvote_count are in Thing. The author probably meant the post_thing or the comment_thing table.

