Hadoop FAQ – But What About the DBAs?

Jan 24, 2013 / By Gwen Shapira

Tags: , , ,

There is one question I hear every time I make a presentation about Hadoop to an audience of DBAs. This question was also recently asked in LinkedIn’s DBA Manager forum, so I finally decided to answer it in writing, once and for all.

“As we all see there are lot of things happening on Big Data using Hadoop etc….
Can you let me know where do normal DBAs like fit in this :
DBAs supporting normal OLTP databases using Oracle, SQL Server databases
DBAs who support day to day issues in Datawarehouse environments .

Do DBAs need to learn Java (or) Storage Admin ( like SAN technology ) to get into Big Data ? ”

I hear a few questions here:

  • Do DBAs have a place at all in Big Data and Hadoop world? If so, what is that place?
  • Do they need new skills? Which ones?

Let me start by introducing everyone to a new role that now exists in many organizations: Hadoop Cluster Administrator.

Organizations that did not yet adopt Hadoop sometimes imagine Hadoop as a developer-only system. I think this is the reason why I get so many questions about whether or not we need to learn Java every time I mention Hadoop. Even within Pythian, when I first introduced the idea of Hadoop services, my managers asked whether we will need to learn Java or hire developers.

Organizations that did adopt Hadoop found out that any production cluster larger than 20-30 nodes requires a full time admin. This admin’s job is surprising similar to a DBA’s job – he is responsible for the performance and availability of the cluster, the data it contains, and the jobs that run there. The list of tasks is almost endless and also strangely familiar – deployment, upgrades, troubleshooting, configuration, tuning, job management, installing tools, architecting processes, monitoring, backups, recovery, etc.

I did not see a single organization with production Hadoop cluster that didn’t have a full-time admin, but if you don’t believe me – note that Cloudera is offering Hadoop Administrator Certification and that O’Reilly is selling a book called “Hadoop Operations”.

So you are going to need a Hadoop admin.

Who are the candidates for the position? The best option is to hire an experienced Hadoop admin. In 2-3 years, no one will even consider doing anything else. But right now there is an extreme shortage of Hadoop admins, so we need to consider less perfect candidates. The usual suspects tend to be: Junior java developers, sysadmins, storage admins, and DBAs.

Junior java developers tend not to do well in cluster admin role, just like PL/SQL developers rarely make good DBAs. Operations and dev are two different career paths, that tend to attract different types of personalities.

When we get to the operations personnel, storage admins are usually out of consideration because their skillset is too unique and valuable to other parts of the organization. I’ve never seen a storage admin who became a Hadoop admin, or any place where it was even seriously considered.

I’ve seen both DBAs and sysadmins becoming excellent Hadoop admins. In my highly biased opinions, DBAs have some advantages:

  • Everyone knows DBA stands for “Default Blame Acceptor”. Since the database is always blamed, DBAs typically have great troubleshooting skills, processes, and instincts. All of these are critical for good cluster admins.
  • DBAs are used to manage systems with millions of knobs to turn, all of which have a critical impact on the performance and availability of the system. Hadoop is similar to databases in this sense – tons of configurations to fine-tune.
  • DBAs, much more than sysadmins, are highly skilled in keeping developers in check and making sure no one accidentally causes critical performance issues on an entire system. This skill is critical when managing Hadoop clusters.
  • DBA experience with DWH (especially Exadata) is very valuable. There are many similarities between DWH workloads and Hadoop workloads, and similar principles guide the management of the system.
  • DBAs tend to be really good at writing their own monitoring jobs when needed. Every production database system I’ve seen has crontab file full of customized monitors and maintenance jobs. This skill continues to be critical for Hadoop system.

To be fair, sysadmins also have important advantages:

  • They typically have more experience managing huge number of machines (much more so than DBAs).
  • They have experience working with configuration management and deployment tools (puppet, chef), which is absolutely critical when managing large clusters.
  • They can feel more comfortable digging in the OS and network when configuring and troubleshooting systems, which is an important part of Hadoop administration.

Note that in both cases I’m talking about good, experienced admins – not those that can just click their way through the UI. Those who really understand their systems and much of what is going on outside the specific system they are responsible for. You need DBAs who care about the OS, who understand how hardware choices impact performance, and who understand workload characteristics and how to tune for them.

There is another important role for DBAs in the Hadoop world: Hadoop jobs often get data from databases or output data to databases. Good DBAs are very useful in making sure this doesn’t cause issues. (Even small Hadoop clusters can easily bring down an Oracle database by starting too many full-table scans at once.) In this role, the DBA doesn’t need to be part of the Hadoop team as long as there is good communication between the DBA and Hadoop developers and admins.

What about Java?
Hadoop is written in Java, and a fairly large amount of Hadoop jobs will be written in Java too.
Hadoop admins will need to be able to read Java error messages (because this is typically what you get from Hadoop), understand concepts of Java virtual machines and a bit about tuning them, and write small Java programs that can help in troubleshooting. On the other hand, most admins don’t need to write huge amounts of Hadoop code (you have developers for that), and for what they do write, non-Java solutions such as Streaming, Hive, and Pig (and Impala!) can be enough. My experience taught me that good admins learn enough Java to work on Hadoop cluster within a few days. There’s really not that much to know.

What about SAN technology?
Hadoop storage system is very different from SAN and generally uses local disks (JBOD), not storage arrays and not even RAID. Hadoop admins will need to learn about HDFS, Hadoop’s file system, but not about traditional SAN systems. However, if they are DBAs or sysadmins, I suspect they already know far too much about SAN storage.

So what skills do Hadoop Administrators need?

First and foremost, Hadoop admins need general operational expertise such as good troubleshooting skills, understanding of system’s capacity, bottlenecks, basics of memory, CPU, OS, storage, and networks. I will assume that any good DBA has these covered.

Second, good knowledge of Linux is required, especially for DBAs who spent their life working with Solaris, AIX, and HPUX. Hadoop runs on Linux. They need to learn Linux security, configuration, tuning, troubleshooting, and monitoring. Familiarity with open source configuration management and deployment tools such as Puppet or Chef can help. Linux scripting (perl / bash) is also important – they will need to build a lot of their own tools here.

Third, they need Hadoop skills. There’s no way to avoid this :) They need to be able to deploy Hadoop cluster, add and remove nodes, figure out why a job is stuck or failing, configure and tune the cluster, find the bottlenecks, monitor critical parts of the cluster, configure name-node high availability, pick a scheduler and configure it to meet SLAs, and sometimes even take backups.

So yes, there’s a lot to learn. But very little of it is Java, and there is no reason DBAs can’t do it. However, with Hadoop Administrator being one of the hottest jobs in the market (judging by my LinkedIn inbox), they may not stay DBAs for long after they become Hadoop Admins…

Any DBAs out there training to become Hadoop admins? Agree that Java isn’t that important? Let me know in the comments.

39 Responses to “Hadoop FAQ – But What About the DBAs?”

  • Ahbaid says:

    Hi Gwen! I prefer DBA = Does Basically Anything. Agree on the Hadoop Admin role, and I am proceeding full speed ahead into the world of Hadoop! Excellent article.

  • Uwe Hesse says:

    Hi Gwen,
    very interesting and helpful article!

    I encounter some worried DBAs myself who want to know exactly what you addressed here: How is Hadoop affecting my job role? You make it clear that it is actually a great chance for them :)

    Thank you &
    Kind regards
    Uwe

    • Levin says:

      Hi Gwen,

      I can relate to the Default Blame Acceptor :-) .I encounter it everyday.Great Article.Hadoop sounds interesting.I will start doing some reading for this.

      Thanks,

      Levin

  • Janis Tupulis says:

    Excellent one, many thanks!

  • Michael says:

    Great Article. Few years ago I was started saying, there is should be Hadoop DBA position in the company that using Hadoop. As same as for any other databases out there.

  • Yury says:

    Thanks for sharing Gwen. I thinking about myself. Should I jump into magical Hadoop world or stay an Oracle DBA for a while.
    My current thoughts are:
    – There are still so many things waiting for me in the Oracle DBA space. I probably would like like to cover some of them before jumping somewhere else …
    – BIG Data arrived so rapidly that I got some associations with other rapidly arrived buzzwords before Hadoop (e.g. SOA). Would Hadoop/Big Data stay for good? Or those may disappear in few years?
    – No matter what I think there are going to be enough work for both DBA and Hadoop Admin in the datafication age :)

    Yury

    • Gwen Shapira says:

      Hi Yury,

      Nuno Pinto De Souto gave a similar sentiment in the Big Data SIG.

      First of all, I agree that Oracle DBA is a huge world and there is always more to learn. I can easily point to areas where I can learn or improve myself. I also agree with Nuno who said that many skills are timeless and always needed. I totally agree on that.

      Is Hadoop here to stay? If I could tell how market trends work, I would be a far richer woman. I’m just a simple DBA :)

      I work with Hadoop because I love it. I love the brilliant simplicity of the platform, the rich eco-system, the flexibility, the tools. I feel very creative when I work with Hadoop, much more so than working with Oracle. But this is personal – everyone has his own favorite tools.

      I try to encourage DBAs to learn Hadoop for two reasons:
      1. Maybe some of them will love it as much as I do. I want to spread the joy!
      2. Hadoop is being actively adopted by many organizations. Hadoop Admins are necessary. Someone has to do the job. I’m trying to encourage more people to study Hadoop, so the job market will become a bit more balanced.

      Will Hadoop go away? Personally, I see it as a real solution to real problems. I don’t see it going away any time soon. It will probably become boring. When I started my career, XML was really hot – everyone was talking about it, learning new technologies around XML, re-designed processes, etc. Now, people just use XML and don’t talk about it much. The time I spent working on DOMs and XSLT and all was not wasted at all. It was fun back then, and is still sometimes useful.

      Knowledge is never wasted, even when the trend is over.

  • Ofir says:

    Hi Gwen,
    great post!
    It is true that DBAs are already used to be in the center of attention – working with sys admins, network admins, storage admins, developers and vendors to solve complex technical challenges.
    I also find Hadoop admin to make sense for DBAs as a career move especially for the “infrastructure” DBAs – who focus more around DB infrastructure, not schema design. Since there currently is a shortage of such admins, it might also be financially beneficial – but the key factor should be just passionate to play and own cool, new technologies…

  • Jeff P. says:

    Great post.
    When I explain to folks what a lowly DBA does, I tell them
    “I don’t get to drive the train, but when it jumps off the track, guess who’s phone starts ringing itself off his desk!”

  • Amazing Article Gwen which describe what DBA future could be, its really Useful and good To share .

  • [...] is not going to hire an experienced Hadoop Administrator, it’s a great job for the DBA. Gwen Shapira makes a great note of this, arguing that the DBAs experience with complex tuning requirements, data warehouses, and developers [...]

  • Anil says:

    Excellent inputs. One Question: does the the experience of RAC DBA, in your opinion, be helpful while administering the Hadoop clusters….

    • Gwen Shapira says:

      Here are few examples for when my RAC DBA experience was helpful when administering an Hadoop cluster:

      1. When setting up highly available namenode, you need to configure STONITH method. As a RAC DBA, you probably know all about STONITH, why its important and can easily choose the correct configuration.

      2. In Hadoop, troubleshooting often involves figuring out which specific node is having trouble. RAC DBAs are pretty good at drilling down on randomly occuring issues to find the faulty server

      3. Troubleshooting also involves correlating messages from large numbers of logs and machines. RAC DBAs are usually experts on that too.

      In general, RAC experience is experience in distributed systems – which is critical for Hadoop administration.

  • Narasimha says:

    Very good information.
    I have 4+ on Linux/Hadoop curently with 7LPA, is it fair enough to ask for 18LPA?

  • hareesh says:

    Hi Gwen
    I’m fresher but i want learn hadoop admin please give me some suggestions
    Is it good for me????

  • Biraj says:

    Hi Gwen,
    Very good & informative article.

    I have 7+ years of exp as iSeries developer. Would Hadoop admin career helpful for developer as well?

    Thanks in advance!!

  • Shaik says:

    Excellent Article for upcoming Hadoop Admins…

  • Harish says:

    Good article.
    i have a doubt sir actually am a B.Tech Passed out student well trained on .NET(DOTNET) but no job still so i heard about Hadoop Admins have more demand in market now..
    So i want you to suggest whether i should go training for hadoop or stick to .NET

  • Krishna says:

    Hi Gwen,
    Great Artical with good analytion about Hadoop,Let me Know i have around 1 year exp. on Oracle i want learn Hadoop,please give me your valubale sugestion regarding this.

  • Syed Jahanzaib Bin Hassan says:

    Nice Article
    Mixture of both skills required to troubleshoot the problems and a candidate which have such kind of combination of skills in the market is very difficult

  • Kashif says:

    Hi Gwen: thanks for the wondeful article. Wanted to know your thoughts about the future of the data / database architect and database developer e.g. PL/SQL developer (not the DBA) in the Hadoop world. What would be the next logical step in the Hadoop world for these types of professionals?

  • Sreekanth Matturthi says:

    I have 13+ yrs of experience in Solaris & Redhat Administrator with Symantec Cluster Technology as an SME in a MNC company. By Using this experience, i have already started learning Hadoop and created 2 nodes on my Virtual Machine doing some R&D. then i decided to change my carrier to Hadoop Adminitration. Please suggest and help me out

  • Rishi Jian says:

    Thanks Gwen for this article ..
    I am 2013 fresher .. Currently jobless.. Will learning Hadoop/Bigdata is good for me for getting a job having 0 years experience.. ??
    Thanks in advance.

  • Mel Bourne says:

    Funny… you keep mentioning the DBA fits the role of Hadoop administrator, but most of the task you mentioned, like OS Performance Tuning, Hardware, Deployments, Storage Configurations, Clusters, Backup Infrastructure, Networking, Coding are mostly done by Unix SysAdmins. Read them again. I worked with DBAs a lot, none of them are comfortable enough to work with hardwares, os tuning, and setting up the infrastructure… they even come to me to write some automation scripts. I was a Unix Sysadmin, and guess what, I was the only one selected to be trained in Teradata, and was tasked to stood up and Manage our first TERADATA Database and Infrastructure… I’ll tell you what, that recommendation came from TERADATA itself, they’d rather train a Unix Sys Admin who has a background in Database and coding/development, than converting an Oracle DBA to manage a TERADATA environment.

  • TEJAS says:

    Hi,

    thanks for the wonderful post.

    i am a oracle DBA ,
    i am planning for HADOOP.

    i am planning for a HADOOP ADMIN .

    thanks,
    TEJAS

  • DBA is the basic thing which help to handle hadoop. There are many things which we can get and along with this a perfact hadoop admin need to improve this technology in a refining way.

  • SK says:

    I have 25 years experience in IT/Software Development/System Admin/DBA. Initially started career as DBA and got 8 years in MSSQL/DB2/Oracle/MySQL. Also spend 4 years in Unix/Solaris Admin. Strong hands on PL/SQL and some exposure in Java/C#. Later on mover to technical solution architecture. Now even after my age (52 Years), I am very techie and involve myself in Hadoop and related technologies. Its my passion. I LOVE HADOOP. Still working as Hadoop Admin.

  • Jayanta says:

    I am working as Sr. Technical manager – Database Architect , holding 14 years of experience on Oracle Administration. I am very interested to move on Big Data Hadoop technology but little confuse which stream will be match to my skill and experience. Off course Haddop Administration is very similar type of work but as looking for senior level to math the good package what will be the best suite for me.
    Thanks
    Jayanta

  • Mohsin says:

    Nyc article..I have one question in my mind that is .. How can a java developer jump into Hadoop without being an oracle DBA. So please suggest me , Im Confused ..thanx

  • Rajeev says:

    I am a MS SQL DBA with no experience in Linux/Unix and Java. Hadoop looks really interesting. I was looking at job market and most of them have Linux administration as one of the requirements.
    Any MS SQL DBA have experience getting into Hadoop to give me hope of making it if I dive into it.

  • Samarth Sharma says:

    Nice article. Cleared all of my doubts. I am a DBA but soon going for Hadoop Cluster Admin :)
    Thanks for this article

  • sairam says:

    I’m fresher but i want learn hadoop admin please give me some suggestions
    How can a fresher can get a job as hadoop admin…what courses need to be done??

  • Praks says:

    I have a good experience of Oracle DB administration. If I want to get into Hadoop admin… where shud i start from .. some one pls guide me

  • marksmithdba says:

    Great article!

    As for those who are expecting career guidance from an expert blogger: no-one owns your career except you – teach yourself the skills you need and prove to your boss that you are THE person in the company who can and who WANTS to support it.

    You can get your hands on an image of a Hadoop implementation easily enough, so all you need is a VM and the documentation.

    IMO, enterprise adoption of Hadoop is very immature. Unlike database appliances, such as Exadata, where the sentiment is that it’s a database on steroids on an engineered system – and has ORACLE on the front – so it’s natural to leave it to the DBA, no-one knows what to do with Hadoop.

    Underneath it all, it’s a file SYSTEM = sysadmins
    But it’s an ENTERPRISE DATA Hub = DBAs
    You need to EXTRACT the raw data so it’s usable = developers?

    Just kidding about the last one :)

    Should Hadoop be deployed as part of a Big Data Appliance, support will be expected from the DBAs with the SAs and network admins saying “it’s an appliance, talk to the vendor about it”.

    If it’s a roll-your-own commodity cluster, I can’t see how the SAs and network admins can throw it over the fence because they’re responsible for the hardware support.

    The savvy companies will realize that, like enterprise data warehouses, Hadoop crosses many “traditional” organizational silos and a dedicated Enterprise Data Management support group is needed – with participation from DBAs, SAs and network admins.

  • Subhransu Sahoo says:

    Great article, concise and inspiring. After deep diving into Hadoop and Big Data ecosystem, I learned that my past DBA+ skills helped a lot. Completely agree with the narration. Thank you.

Leave a Reply

  • (will not be published)

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>