Hadoop FAQ - But What About the DBAs?
There is one question I hear every time I make a presentation about Hadoop to an audience of DBAs. This question was also recently asked in LinkedIn's DBA Manager forum, so I finally decided to answer it in writing, once and for all. "As we all see there are lot of things happening on Big Data using Hadoop etc.... Can you let me know where do normal DBAs like fit in this : DBAs supporting normal OLTP databases using Oracle, SQL Server databases DBAs who support day to day issues in Datawarehouse environments . Do DBAs need to learn Java (or) Storage Admin ( like SAN technology ) to get into Big Data ? " I hear a few questions here:
- Do DBAs have a place at all in Big Data and Hadoop world? If so, what is that place?
- Do they need new skills? Which ones?
- Everyone knows DBA stands for "Default Blame Acceptor". Since the database is always blamed, DBAs typically have great troubleshooting skills, processes, and instincts. All of these are critical for good cluster admins.
- DBAs are used to manage systems with millions of knobs to turn, all of which have a critical impact on the performance and availability of the system. Hadoop is similar to databases in this sense - tons of configurations to fine-tune.
- DBAs, much more than sysadmins, are highly skilled in keeping developers in check and making sure no one accidentally causes critical performance issues on an entire system. This skill is critical when managing Hadoop clusters.
- DBA experience with DWH (especially Exadata) is very valuable. There are many similarities between DWH workloads and Hadoop workloads, and similar principles guide the management of the system.
- DBAs tend to be really good at writing their own monitoring jobs when needed. Every production database system I've seen has crontab file full of customized monitors and maintenance jobs. This skill continues to be critical for Hadoop system.
- They typically have more experience managing huge number of machines (much more so than DBAs).
- They have experience working with configuration management and deployment tools (puppet, chef), which is absolutely critical when managing large clusters.
- They can feel more comfortable digging in the OS and network when configuring and troubleshooting systems, which is an important part of Hadoop administration.