Images in a database

Jun 9, 2009 / By Sheeri Cabral

Tags: , , , , ,

About six months ago, the question of storing images in a database came up. This is one of my favorite topics, and has many database-agnostic parts.

Personally, I think “tell me about storing images in a database” is actually a great interview question, because you will be able to see the difference between someone who has just memorized “what’s right” versus someone who is really thinking. It also helps you see how someone will communicate — if they just say “NEVER do it, it’s as bad as crossing the streams!” then they are a type of person that gives you a short answer, without much explanation, and without many nuances. (That may be what you are looking for, but usually you want someone who gives reasons for why they strongly feel one way or another).

Consider the following cases:

What about storing lots of small
(say 1Mb or less) images that change frequently (say, dating profile pictures)?

What about large images that change infrequently (say, highly detailed satellite images)?

How important is consistency, and how will data consistency be maintained if images are stored outside the database? How will consistent backups be done?

What other resources (such as a CDN) are available?

What compliance and auditing issues might this client have?

What is the “sweet spot” in terms of size of an image stored in the database, and does that apply to text too? Would it be OK to have a TEXT or LONGTEXT field in the database, but all images are stored outside of the database?

All of these (and more) may swing the pendulum one way or another.

Putting images on a filesystem is the most popular method of doing things. We do have a client that stores “big BLOBby data” in a table (documents, images) — this is mostly for consistency and archiving purposes.

The cons of storing images in a database:
– more db traffic — in, out, and through (where through = replication)
– backups and exports from the db are more of a pain. Data/index files and tablespaces can get quite large. It’s nicer to be able to rsync
files….
– restores are more of a pain too.
– there is no computational stuff really that the db can do (other than, say, length). It cannot sort, compare, etc (well it can but it’s
meaningless). Using separate files makes coding actions such as like making thumbnails much easier.

The pros of storing images in a database:
– you have data integrity and consistency. If you delete a record you can cascade that delete down to the image. You cannot do a JOIN of an OS
and a db.
– databases are really really good at storage and retrieval. It can be argued that filesystems are really really good at storage and retrieval too, but for high activity only certain types of filesystems are, and if you do not have a ReiserFS or MogileFS expert, you may end up with more
than you bargained for. Very likely, you have someone already knowledgeable about databases.

Things that can be pros and cons:

– filesystem vs. database caching. This really depends on the filesystem and the DBMS you’re using. Many folks also use other caching methods for images too…..(webserver/app server/memcached/CDN/whatever)

Here are some references:

Oracle’s whitepaper on The Move to Store Images in a Database
An Article I wrote in Mar 2006, which is still highly relevant. Make sure to read all the comments, including following links like Mike Kruckenberg’s thoughts.

2 Responses to “Images in a database”

  • Dups says:

    Good post.

    I’ve always been a fan of looking at the entire picture when figuring out what to do. And just because something can do it doesn’t mean you should. I mean a hammer *can* open a peanut, doesn’t mean you should. A similar analogy I once read about storing blob data in a RDBMS: You can store the entire cow in a fridge, but wouldn’t it make sense to store it as steaks? Not that I am saying not to use blobs, just as with anything, use them wisely ;)

    So I think the points you raise are good. There is no hard and fast rule. I would argue that in most cases storing large binary data that you need to access frequently might not be what you want to do on a high trafficked web site. However, if you can cache those files to disk to serve them, then where you store them might not be as big of a deal.

    As you say it is all about finding the sweet spot in the system, in fact in *YOUR* system :) I was witness to a SQL Server implementation gone horribly wrong where thousands of binary attachments were stored into a database. I am not a SQL Server DBA so I can’t speak to the ins and outs of why it happened… however the result was that for 400 people (so not huge by any means) the DB started crawling and the backups of the DB became unmanageable. In the end I believe the system was changed to not store images and the whole thing improved dramatically.

    With things like memcached these days and the potential to move images off to some high performance file system or even creating a CDN like system with a DB In the background I think there are lots of options.

    I would even add how your server communicates with the DB in a particular network might even have an affect on which method you decide.

    So many options.

  • Dups — it is a good story, but the plural of “story” is not “data”.

    :)

    I will agree, though, that in *most* cases storing images in the database is not the best solution overall for performance. But performance isn’t always the only factor, either. Sometimes it’s more important to have the database consistent, and have BLOBs be easily auditable in the same manner as other data.

Leave a Reply

  • (will not be published)

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>