How Reddit Does Data
The article referenced is from 2010. It states that Reddit uses only two tables to store all of its data.
“Everything in Reddit is a Thing: users, links, comments, subreddits, awards, etc. Things keep common attribute like up/down votes, a type, and creation date. The Data table has three columns: thing id, key, value. There’s a row for every attribute. There’s a row for title, url, author, spam votes, etc.”
I mocked up a diagram of what this looks like.
Thing table looks something like this.
Data table looks something like this.
|1||“title”||“The best subreddit in the universe”|
|1||“description”||“This is a good subreddit. Please subscribe.”|
|2||“body”||“This is the body of a comment”|
I don’t think
downvote_count should be in
Thing. They should be in
Data. Subreddits don’t have upvotes or downvotes. Awards don’t have upvotes or downvotes. They are not common attributes like the article suggests.
The article goes on to discuss the pros and cons of such an approach.
“When they add new features they didn’t have to worry about the database anymore. They didn’t have to add new tables for new things or worry about upgrades. Easier for development, deployment, maintenance.
The price is you can’t use cool relational features. There are no joins in the database and you must manually enforce consistency. No joins means it’s really easy to distribute data to different machines. You don’t have to worry about foreign keys are doing joins or how to split the data up. Worked out really well. Worries of using a relational database are a thing of the past.”
User kemitche commented on Reddit explaining that the article is wrong. He says: “we’ve got two tables per thing. That means Accounts have an “account_thing” and an “account_data” table, Subreddits have a “subreddit_thing” and “subreddit_data” table, etc.”
This explains why
downvote_count are in
Thing. The author probably meant the
post_thing or the