Hardware Components Failures – Survey Results

May 10, 2012 / By Alex Gorbachev

Tags:

When preparing for the the IOUG Collaborate 12 deep dive on deploying Oracle Databases for high Availability, I wanted to provide some feedback on what hardware components are failing most frequently and which ones are less frequently. I believe I have reasonably good idea about that but I thought that providing some more objective data would be better. I couldn’t find and results of a more scientific research so I decided to organize a poll. This blog post shows the results and I promised to share it with several groups.

The results are also in the presentation material but it might be hidden deep into 100+ slides so here is the dedicated blog with some comments on the results.

I asked the following question — “Sort the hardware components in the descending order of failure rates.” The list must be fully ordered so there is only one component for each place — #1, #2 and so on so we basically have sorted list of 7 components. I didn’t collect failure rate estimate or anything like that — it’s already perception based so asking for ratios estimate would definitely be seeking for garbage data. Each component gets the weight based on the place it ends up on the list. Place #1 (most frequently failing) gives weight 7. Place #2 gives weight 6 and so on until place #7 that gives only weight 1.

The resulting weights for each component are averaged so the higher the resulting average weight, the more frequently respondents think this hardware component fails. Again, this is perception based and I’m sure our brains make it quite subjective but I hope having 50+ people on the survey would provide good indication. Please note that you can’t compare failure rates of each component by comparing the resulting average weight below — remember it’s just the placement on the list and we didn’t collect any failure rates so no conclusions about it.


(Click on the image to see full resolution)

I was satisfied that the results are quite close to what my estimates are except that I wouldn’t place failure of network cards and switches as high. I’ll let you comment on that — does the result match your estimates? Anything you found surprising?

The survey is coming from users of Oracle technology mostly. I think it should generally be applicable to any other platforms but if we compare with, say, MySQL deployments then Oracle infrastructure is usually built using more enterprise hardware class and MySQL infrastructure is usually running on lower grade servers. Still, I think it should be useful to anyone — just keep in mind the source of the feedback.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

HTML tags are not allowed.