Twitter Should Get Back to Basics

Posted in: Technical Track

Twitter has had many outages recently. A May 17th, 2008 post says:

What went wrong? We checked in code to provide more accurate pagination, to better distribute and optimize our messaging system basically we just kept tweaking when we should have called it a day. Details are great but getting too caught up in them is a mistake. I’ve been CEO of Twitter for two months now and this an awesome lesson learned. We’re seeing the bigger picture and Twitter is back. Please contact us if something isn’t working right (with Twitter that is).

(in other news, that post was made on May 17th and does not show up on, which it should, between the May 16th and May 19th posts. I found a reference in other posts and had to search the site to find that post).

A real “awesome lesson learned” is “do not tweak production without testing first.” In every job I have had I have first learned and then taught the concept of “test everything possible.” Which Twitter has not learned yet, because, posted on Tuesday May 20th, states:

We caused a database to fail during a routine update early this afternoon.

As someone who has years of experience working with MySQL, and before that was a systems adminsitrator; as someone who was referred to as “the MySQL Queen” yesterday (by someone who wanted me to test their product, so yes, they were flattering me); I can assure you:

no matter how “routine” a change is, if you do it on production without testing it first, you are playing with fire, and 95% of the fires caused by not testing first are completely preventable.

I will repeat this, because repetition is important to learning concepts.

no matter how “routine” a change is, if you do it on production without testing it first, you are playing with fire, and 95% of the fires caused by not testing first are completely preventable.

With a proper testing environment, 19 out of 20 “whoops, didn’t expect THAT from a routine change!” issues are caught. And I can tell you that often “routine changes” cause unexpected results.

Now, I was online during an outage, and was showing their “site isn’t working” page for at least 3 hours between 2 and 5 am EDT yesterday (Tuesday, May 20th, 2008).

So…..there is no read-only copy around that Twitter could use? Maybe I cannot tweet, but I should at least be able to read what was done before!

Of course, since last week Twitter has done the opposite — often I can see the most recent 20 or so posts, but not anything prior. Now, I understand that it is hard to get all the histories for the people I follow. But it only needs to be done once, and could then be cached — “Posts from who Sheeri follows on 5/20”. It would not be difficult, and I would be OK with the functionality changing such that “once you follow a new person, their tweets prior to when you followed them do not show up in the history.”

Alternatively, you could go the snarky way and say: states:
What would be great is if Twitter just moved their blog to another platform so that it doesn’t fail when users need it most.

I am not a huge user of rails. But I will say that given the content of the public announcements, the platform is not the problem. It is the code release process that is the problem. Maybe there’s “agile development” happening, paired programming and code reviews. But there is not adequate testing.

Twitter — if you truly need scaling help, please ask for help — I know Pythian would be happy to help. However, if it really is as it seems — that basic good practice is not being followed — I would like to remind you that backups are really important too, just on the off chance that backups are not happening.

Interested in working with Sheeri? Schedule a tech call.

4 Comments. Leave new

It’s got so bad Twitter now have a blog (on Tumblr obviously) to report all the status issues.

Curiously enough, the seemingly trivial loss of pagination forced me to ditch Twitter so I was interested to read that this was partially as a result of ‘improved pagination’.


That very first link to the twitter blog looks like it’s from May of 2007, not 2008.

Log Buffer #99: a Carnival of the Vanities for DBAs
May 30, 2008 11:55 am

[…] About Us    Our Services    The Pythian Advantage    News Pythian has openings for MySQL and MS SQL Server DBAs in each of our offices in Ottawa, Canada; Boston, USA; Dubai, UAE; and Hyderabad, India. If you are a MySQL and/or SQL Server DBA and would like to evaluate this opportunity, please send us your résumé with an introductory paragraph to « Twitter Should Get Back to Basics […]

BigTable Thoughts
May 31, 2008 11:09 am

[…] Twitter Should Get Back to Basics […]


Leave a Reply

Your email address will not be published. Required fields are marked *