Posts Tagged ‘scaling’

OSCon 2008 Video Matrix

As part of a project of Technocation, Inc I took a whole bunch of videos at OSCon 2008. The conference was about a month ago, and about 2 weeks ago I’d finished processing and uploading all the videos, but it was only today where I had the 5-6 hours I needed to finish posting all the video, and making this matrix of video.

The video may not be the quality that the O’Reilly folks took and put up on blip tv’s OSCon site, but all the videos here are freely downloadable or playable in your browser.

(more…)

BigTable Thoughts

By Sheeri Cabral May 31st, 2008 at 11:09 am
Posted in Group Blog PostsNon-Tech Articles
Tags:

So, Paul’s blog post pointing to Todd’s blog post got me thinking.

The main point Paul summarized was that duplicating data was a great way to scale, and used Todd’s reference to Flickr and how in their partition-by-user scheme, they put a comment in the commenter’s shard as well as in the commentee’s shard.

In my recent post about Twitter, I wrote:

Now, I understand that it is hard to get all the histories for the people I follow. But it only needs to be done once, and could then be cached — “Posts from who Sheeri follows on 5/20″. It would not be difficult, and I would be OK with the functionality changing such that “once you follow a new person, their tweets prior to when you followed them do not show up in the history.”

So using this thinking, every time someone I follow (say, @paulandstorm) makes a comment, it not only writes to their shard, but to mine. Now, that may not work given that the system also has to send messages at the same time, and that there can be numerous followers — dozens, hundreds, thousands.

The Flickr model works because it involves 2 writes to get the faster caching later, and there are more reads than writes. Twitter is more write-heavy, and likely has more writes than reads, considering that many folks do not visit an historical website to see their history.

This particular idea may not work for Twitter. But I’ve picked on Twitter enough….

I thought about livejournal. I’ve been a livejournal member since 2001 — after 2 months of writing my own journaling system with comments, I got wind that a system already existed, so started to use that.

Now, I can go and pick specific entries from specific days, or I can read my “friends list”. I specify my friends and livejournal dynamically populates pages of my friends list, with the amount of entries per page that I specify.

Livejournal could also use the idea presented above, as well as the concept of semi-dynamic data. Instead of dynamically generating the last, let’s say 20, entries of my “friends list”, livejournal could be making my friends list as it gets written to. A friend makes a post and it gets added to my shard, whether or not I read it. Once the count gets up to 20, a new cache page is generated.

Now, livejournal already has great caching, and has indeed had the growing pains Twitter is seeing. And for either livejournal or twitter to take advantage of these concepts, they would likely require a rewrite from the ground up. So it’s not that I am suggesting this. I just think it’s a great idea, and if you are working on a project, think of where it might be useful to apply…..again, it may not be applicable in all situations. Like Twitter, livejournals may have many “friends” so doing 100 or 1,000 writes every time a post is made may not actually be feasible.

Liveblogging: 10,000 Tables Can’t Be Wrong

By Sheeri Cabral April 17th, 2008 at 4:31 pm
Posted in Group Blog PostsMySQL
Tags:

10,000 Tables Can’t Be Wrong: Designing a Highly Scalable MySQL Architecture for Write-intensive Applications by Richard Chart

Chose MySQL for performance and stability, and less important but still there, experience and support. Support is becoming increasingly more and more important.

Starting point: 1 appliance supporting 200 devices
Problem/Goal: Extensible architecture with deep host and app monitoring, over 1000 devices with 100 mgmt points each
Distributed collection over a WAN, with latency and security concerns
Current reality: several times the scale of the original goal
Commercial embedded product, so they actually pay for the embedded MySQL server

Future: The fundamentals are sound: next generation of the product moves up another order of magnitude

Data Characteristics
>90% writes
ACID not important
Resilient to loss, because gaps in data do not invalidate the rest of the data
Data elements by themselves are valuable, but much more so when relationships are added.

Chose MyISAM because: (more…)

Liveblogging: A Match Made in Heaven? The Social Graph and the Database

By Sheeri Cabral April 17th, 2008 at 11:53 am
Posted in Group Blog PostsMySQLNon-Tech Articles
Tags:

Jeff Rothschild of Facebook’s “A Match Made in Heaven? The Social Graph and the Database”

Taking a look at the social graph and what it means for the database.

The social graph:

  • At it’s heart it’s about people and their connections.
  • Learning about people who are in your world.
  • Can be a powerful tool for accelerating the use of an application.

“The social graph has transformed a seemingly simple application such as photos into something tremendously more powerful.” We’re interested about what people are saying about us, and about our friends. Social applications are compelling.

Facebook users blew through the estimate for 6 months of storage in 6 weeks. It is serving 250,000 photos per second at peak time, not including profiles. Facebook serves more photos than even the photo sites out there, and serves more event invitations than any other website out there.

E-mail invitations are an example of the power of the social graph. If you get a newsfeed or an invitation that tells you 12 friends are attending an event, you have more information, and then can have a better decision on whether or not you want to go. (more…)

Panel Video: Scaling MySQL — Up or Out?

By Sheeri Cabral April 17th, 2008 at 10:41 am
Posted in Group Blog PostsMySQL
Tags:

Yesterday’s keynote panel on “Scaling MySQL — Up or Out?”

Directly download the 310MB wmv file (not if you are on the conference wireless please!), or watch it in your browser via streaming — simply click the “play” link on this page.

Keith Murphy managed to take painstaking notes with all the facts and figures. As well, Venu Anuganti presents a chart with the results as well as notes on the more detailed answers. Ronald Bradford has a brief summary of the 20 seconds of wisdom from each panelist.

How To Build Scalable Database Architectures

By Shakir March 12th, 2008 at 3:03 pm
Posted in Group Blog PostsOracle
Tags:

I’ve found lately that munching on carrots with French dressing is more satisfying than broccoli. Maybe it’s the tang-and-crunch combination. In any case, I was crunching away yesterday while thinking about how to answer a question one of our newer start-up clients asked me.

No one has ever come out and formally asked me for a document that states “Best Practices to Scale Application X”. It is an unusual demand, since it’s something many of us at Pythian have implemented, but it’s been more of an ad hoc, iterative process — and rightly so, since architectures must be so organic, and so tailored to the application. What’s more, no one has ever brought us on board so early in the game that we have a hand in actually — gasp! — doing the design and data-model from the get-go. Woo hoo!

Now, a little background. I have built and maintained a few systems. Some of them even supported over 100k concurrent users. These databases didn’t run RAC either (although I do support two very high profile RAC environments now). So, having been in the trenches and knowing what it takes to make a DB move, I got to thinking about some of the basic fundamentals. There are always rules of thumb, right? This is what you need to know to start with building a scalable high-performance system based on stuff that I’ve seen. Obviously, this assumes a database-centric app. Let’s start with the first ten principles.

(more…)

Oracle 11g Result Cache Tested on Eight-Way Itanium

By Alex Fatkulin November 27th, 2007 at 12:55 pm
Posted in Group Blog PostsOracle
Tags:

This will be the final post in my series on Result Caches. In my previous article, I had already got almost everything. Almost — four CPUs (cores) were still not enough to saturate the single latch. As you’ve probably already guessed, today we are going with an eight-way test.

Please note that today’s numbers are different since I’m using an entirely different hardware platform. While the four-way tests were done on a 2.4GHz Core 2 Quad box, today’s eight-way tests were done using four dual core Itanium 2 CPUs running at 1.1GHz.

Let’s take a look at the results:

# of processes Buffer Cache % linear Result Cache % linear
1 15085 100% 15451 100%
2 26745 88.65% 28881 93.46%
3 39144 86.5% 40628 87.65%
4 52342 86.75% 52625 85.15%
5 63922 84.75% 62767 81.25%
6 76336 84.34% 69549 75.02%
7 88844 84.14% 74208 68.61%
8 100959 83.66% 76768 62.11%

I made a nice-looking graph from this:

BC vs. RC

(more…)

Does Oracle 11g’s Result Cache Scale Poorly?

By Alex Fatkulin November 12th, 2007 at 3:48 pm
Posted in Group Blog PostsOracle
Tags:

In my previous blog entry, I explained why I would expect Result Cache not to scale well. Unfortunately, at the time that blog entry was written, I had no access to hardware with more than two cores. That left me in an everything-but-the-proof state. “Theory without practice is sterile.” ©Albert Einstein.

Since then, I got a chance to re-run my test cases on a quad-core CPU, moving one step forward.

I re-executed my test cases with one to four processes against the Buffer Cache and the Result Cache in order to capture the number of lookups per second. I raised number of iterations to 1M to make the results more stable though.

Here is what I got: (more…)

Oracle 11g’s Query Result Cache: Introduce Yourself to RC Latches

By Alex Fatkulin September 13th, 2007 at 11:26 pm
Posted in Group Blog PostsOracle
Tags:

In the previous article, I described my observations of RC Enqueue. Now it is time to take a look at the RC latches.

Latches, being serialization devices, are scalability inhibitors. Not that they inherently prevent you from scaling, quite the opposite is true. Serialization is a must if you expect your system to produce anything apart from GIGO (Garbage In Garbage Out). Concurrency is essentially made possible through serialization of shared resources. That being said, I would expect Result Cache to beat Oracle’s buffer cache on read-only workloads, since that is what RC was designed for. That is, Result Cache should perform faster and scale better.

(more…)