Jeff Rothschild of Facebook’s “A Match Made in Heaven? The Social Graph and the Database”
Taking a look at the social graph and what it means for the database.
The social graph:
- At it’s heart it’s about people and their connections.
- Learning about people who are in your world.
- Can be a powerful tool for accelerating the use of an application.
“The social graph has transformed a seemingly simple application such as photos into something tremendously more powerful.” We’re interested about what people are saying about us, and about our friends. Social applications are compelling.
Facebook users blew through the estimate for 6 months of storage in 6 weeks. It is serving 250,000 photos per second at peak time, not including profiles. Facebook serves more photos than even the photo sites out there, and serves more event invitations than any other website out there.
E-mail invitations are an example of the power of the social graph. If you get a newsfeed or an invitation that tells you 12 friends are attending an event, you have more information, and then can have a better decision on whether or not you want to go.
Facebook opened up the social graph by opening up the facebook API for other people and organizations to build applications. Over 20,000 applications that have been developed for the platform, but it’s still very new.
If the social graph is good, then a richer graph should be better, right? Right! It’s a circle of connectivity with positive feedback, bringing people closer together, improving communications within social circles. But people are only one dimension to the graph — the social graph links people to other people, but also to events, pictures, applications. There are many different types of data, many types of networks, many types of media.
Which all means the database has a lot of work to do.
“Stuff” (ie, pictures)
lots of people + lots of stuff = exponential amounts of data.
Over 70 million active users, living in overlapping worlds of school, work, family, geographic regions, now facebook can no longer be partitioned by school, as it did when it started.
- Make it unnecessary to have to go to many different parts of the data — so use very few JOINs (I assume this means they have a LOT of denormalization).
- Memcache — made it multithreaded, then worked on the ethernet drivers as that was the next bottleneck.
- Memcache Proxy to scale Facebook beyond a single proxy – needed to keep memcache and web tier close together, low latency. Even using a regional network wasn’t close enough. So they built web tiers, and memcache tiers, but how do you ensure that the memcache caches were consistent at the different locations? Enter the memcache proxy to maintain a coherent cache through all facilities.
Replication to replicate db tier to the East Coast. The Memcache proxy would no longer be a good fit, because there would be a race condition between MySQL replication and the memcache proxy replication. So they extended MySQL to call the memcache proxy to update the data there, too.
- Where do you put the RAM? Even if you put enough memcache memory to have a 95% hit rate, that’s 500-1,000 queries per second, per MySQL server. So you need enough memory on the MySQL server. The answer is that you put memory everywhere, and in reality the db tier has about double the memory of the memcache tier.
The problem is that the data is duplicated — in the db or in memcache. Jeff believes that in the future, MySQL will run at the speed of the application.
Leave a Reply