“Hello World!” I guess that’s the most appropriate way to start my 1^{st} blog post under pythian.com domain. I’m going to start slow, but hopefully will pick up speed and have at least couple of posts each month to share with you. I’ve been blogging at http://appsdbalife.wordpress.com until now and I haven’t decided yet what the future will be for my previous blog, I wouldn’t like it to become some kind of a zombie page that’s been long dead but still wandering around the internet world.

Enough intros, let’s get to business! I hope this blog post doesn’t get lost in the huge amount of posts related to OOW 2011.

A few days ago I was asked to estimate how much space needed to be added to the ASM diskgroup to handle the database growth for one year without additional need of adding disks. Obviously, to estimate the disk space that needed to be added I had to know what the DB size will be in one year from now. It looked like an easy task as I knew we were recording the size of the database every day.

I checked the size of the database – it was 887.3Gb and a month ago the size of the database was 856.9Gb, so it had increased by 30.4Gb in one month. 30.4Gb multiplied by 12 months equals 364.8Gb, so the database will be ~1252Gb after one year. That was easy right?

But wait a second! Wouldn’t it be logical that as the DB becomes bigger (there is more activity; there are more users and more transactions, more everything) then it grows quicker too? So I checked how much the DB increased during one month exactly one year ago – 21.2Gb (the DB size was 740.3Gb at that time). Looking at this number I instantly knew the DB size I calculated earlier was not correct.

This was the moment where physics kicked in. Velocity, distance and time could easily be substituted for DB growth in one month, size of the DB and time. Also, as the “velocity” has increased during one year we deal with some acceleration too. I graduated high school 10 years ago, so I had to look up the formulas. At first I needed to calculate acceleration and assuming the acceleration stays the same I would be able to calculate the “distance” or (how much the DB would grow in one year). I used the following formulas (where s = distance, u = initial velocity, v = final velocity, a = uniform acceleration, t = time):

- v^2 = u^2 + 2as
- s = u*t + (a*t^2)/2

I applied the 1^{st} formula on the variables I had (v=30.4 Gb/month, u=21.2 Gb/month, t=12 months, s=887.3-740.3=147 Gb), so the results were:

- v^2 = u^2 + 2as
- 30.4^2 = 21.2^2 + 2*a*147
- 924.16 = 449.44 + 294*a
- 474.72 = 294*a
- a = ~1.6147 (Gb/month^2)

As I had the acceleration of the DB growth I could calculate the estimated growth of the database during one year (u=30.4 Gb/month, t=12 months, a = ~1.6147 Gb/month^2), and the result was:

- s = u*t + (a*t^2)/2
- s = 30.4*12 + (1.6147*12^2)/2
- s = 364.8 + 1.6147*144/2
- s = 364.8 + 116.2584
- s = ~481.1 Gb

Based on these calculations the size of the DB would be 887.3 + 481.1 = 1368.4Gb, which is quite different from 1252Gb that I calculated before. I used the estimated DB size of 1368.4GB to calculate how much disk space needed to be added to the ASM disk group in the end.

Of course, the acceleration of the DB growth might not be uniform in reality but I think it’s still better to use this, as it’s likely to be more accurate compared to result based on the assumption that the DB growth is constant. You can do the same calculations for different types of data – it is not bound to the database, use it to estimate the total size of apache logs or the total size of concurrent request output files if you’re running e-Business Suite. Keep in mind the estimates are only estimates don’t rely on them 100% and they are going to be more accurate if the input data are accurate.

I hope you enjoyed reading this. See you next time!

### Share this article

### 10 Responses to “Back to School: Elementary Physics for DBAs”

### Leave a Reply

Hey Maris,

Nice post! I was also thinking about this some time ago, what would I do if I got the same task. Now I know the correct approach for this :) These formulas really reminds me of school years :)

Regards,

Andrejs.

Hi,

Keep in mind this ain’t no silver bullet!

Each database is unique, you have to evaluate at first if this approach can be used in your situation or not. Keep in mind the results would be more inaccurate if there was any non-typical activity that impacted the size of the DB, e.g. a new purging process was implemented few months ago or a new module was implemented.

I just wanted to stress that in most cases the growth of the DB might not be liner, but there’s still a quite simple way to estimate the database size in the future.

Thank you for reading and commenting. I’m glad I was able to bring back to you the nice memories from you school years :)

Maris

Or…you could have taken your historical figures and put them into an Excel spreadsheet, graphed the growth over time and then added a trendline to predict the future.

Hi Brian,

I actually did that! I tried all the types of trendlines with different options but none of them really provided reliable results. Most of them were just linear, but ones which were not (exponential and polynomial) were visibly pointing into the wrong direction.

Probably the issue there was that I had too many values in the graph (daily values for ~1.5 years), it could have been a bit better if I just took monthly or bi-weekly measurements, but I didn’t try that.

By the way, I used excel trendlines successfully to forecast when the diskgroup would run out of space based on DB growth during last 2-4 weeks – that worked fine.

Maris

At a previous employer I tracked sizes of all data objects over time, then used some of the regression analytic functions in Oracle to create some data that was further massaged in Excel.

The result was predictions that were fairly accurate. Your method however seems much simpler – kudos!

Thanks for the kudos, Jared!

I wonder if you ever let us know more details about the method you used earlier.

Based on the short description you provided looks like it should be more accurate than this one. However I’m not sure about what level of the complexity would be involved.

Maris

Congrats with the first article Maris!

Meanwhile I agree that the grow is in rare case is linear and in 90% of cases “acceleration” would be a positive number I think you should state clearly to a requested what estimate you are giving is just a guess.

This is most probably the right place to remember BAAG group :) http://www.battleagainstanyguess.com/

The data grow as you rightly suggested is business activities / technology / data model / algorithms driven. You may give any estimates but if business decides they need to reprocess last year data next month you may find that all space you allocated for a year is used.

My point here is just make all possible to make clear what assumptions your estimate is based on. Otherwise management will blame you for the mistake (they will blame you anyway ;)

Congrats once again and welcome to the Team!

Yury

Thanks Yury!

This is definitely not a guess.

This is a calculated estimate based on some assumptions and the DB growth statistics accumulated over the last year. If I was guessing I’d probably would give a different number each time one asked.

The recommendation of course remains, you have to give a clear statement that this is just an estimate and also provide some information on what are those assumptions that lead to it.

P.S. I have never worked with a manager who didn’t understood the meaning of the word “estimate” and I really hope I won’t ever have to.

Maris

Just saw this, Maris.

Good work. I would have probably used moving averages for the prediction.

May I suggest adding “How long before we run out of space” information to the daily disk and tablespace checks? It sounds like useful info.

Thanks Gwen,

I think it’s too complicated to add this as a generic dailies monitoring check. It’s easy to do that for DBs on ASM, but what about ones on FS or Raw disks?

I think it’s also quite important to keep in mind specifics of each database and consider the non-routine tasks which have been executed during a period under review (e.g. adding new application components, implementing purging)… if we add this as a dailies check we will ignore these details and the result would be less reliable.