Wednesday, December 12, 2007

Hibernate Shards

I suppose I have been on sort of a hibernate kick, but my buddy John was asking about this tonight. So I decided to give a short little description of what it is and why is it useful.

I have explained what Hibernate is, but not demonstrated it through a deployable example. So I apologize ahead of time, if you have not used Hibernate and are still scratching your head wondering what the hell I am actually talking about. That is my bad, and I will correct it.

Moving right along, lets describe what a shard is. Simply put, a shard is a database that exists in a set of databases that contain the same schema aka shards. What makes a single shard special is that it does not hold all of the information for the entire application. It holds a very specific amount of information that pertains to the rule that is given to the Shard Strategy algorithm. Let me give you an example. Let us assume you have a database full of users. Let us also assume that all of those users have contacts. We have a three shard solution. Now, I can implement a sharding algorithm that says, put all of the users that have a login name of A-J in shard 1, K-T in shard 2, and T-Z in shard 3.

What is the benefit of doing this? The obvious benefit is that I have now gone from one pipe of average size, to three pipes of average size. That is huge! The strategy I may have implemented is not perfect so I will not always get the benefit from having three pipes, but with a better Sharding Strategy, I can theoretically get three operations done at once a majority of the time.

This is basically a solution to provide horizontal scaling of a database. I can increase to any number of shards I want, as long as I specified at the beginning of the project how many shards I wanted to use at the beginning. If you did not, you can still do it, but it will be painful, so think long and hard about that. And I hope you have a nice DBA at that point :).

There are some drawbacks to this type of thinking. You cannot implement cross sharding joins. They are currently not supported. You can't increase the number of shards easily, from what I have described in the previous paragraph, so you would have to use something called virtual shards to make your process a bit easier. Distributed transactions are not supported by hibernate shards as well. Oh yea, and its in beta, so hopefully you won't come across too many bugs.

With that being said, I think the positives outweigh the negatives and if it worked for YouTube, I am fairly certain it might work with your project as well, if it gets to that size where you need to be concerned about database scaling and replication ( Vertical Database Scaling) is not able to cut the mustard.

Hopefully I have given you a really fundamental idea of what is going on with hibernate shards. Feel free to ask questions and I will answer them to the best of my ability. I will try to create a project to show off what this brilliant technology accomplishes!

Here is a link to documentation if you are interested:
http://www.hibernate.org/hib_docs/shards/reference/en/html/

No comments: