Following my “Building Integrated DWH with Oracle and Hadoop” webinar for IOUG Big Data SIG, I got a bunch of excellent follow up questions. The most frequently asked questions are: What is the minimum I need to do to get started with Hadoop? and How do I load data into Hadoop? Since so many people are interested in the same question, it makes more sense to answer on the blog.
Shortly before we all went on break for the holiday, Oracle announced the new BDA X3-2. Now I have time to properly sit down with a glass of fine scotch and dig into the details of what is included in the release. Turns out that there are quite a few changes packed in. We are getting new hardware, new Hadoop, new Connectors and new NoSQL. Tons of awesome features are included. Let’s get into it.
This year’ OOW was even more amazing for me. For the first time I presented at OpenWorld.It is great to see how many people now share my interest in integrating Hadoop into the enterprise data warehouse eco-system. To those who missed my presentations and to those who attended and want to review the slides, you can find the content here.
The consistent message, both from Oracle and from independent data architects has been: “Hadoop will not replace Oracle. Each system has its strengths and they can be used side by side to offer wider range of data storing and processing possibilities”.
It seems that most professionals understood and agreed with this message, because this year the question I am hearing most is “Which data should we store in Hadoop and which in Oracle”.
I can’t claim to have the definitive answer, but I can offer some pointers and start the discussion.
Tuesday morning at OOW is always occupied by this forum, an opportunity for authors and other persons to receive heads up on what’s coming down the pipe from Oracle. The following notes are musings from yours truly as I attended the forum today.
We have the SIG meeting at Oracle Open World Everyone is welcome. Gwen Shapira is the SIG leader and expect lots of great things in that space. The SIG is also looking for volunteers — that’s going to be hot space so if you want to engage early come and let us know.
Before I would dig into the mechanics under the hood of the hadoop beastie (which is the part, I assume, that is going to be heady as hell), I thought it would be a good idea to play a little bit with some of its applications to give me a feel for the lay of the land. Let’s have a look, shall we.
The chances of getting my hands on 18 servers each with 12 cores, 48g RAM and 84T storage each all connected by InfiniBand are not that great. But I can play with the software, and so can you. Unlike Oracle’s Exadata, almost every software component that is available on the Big Data Appliance is also available for download. So, lets roll our own Big Data appliance!
It was fun presenting today at Portland and I’m looking forward to continuing my user group marathon at Denver tomorrow and on Thursday. Since many people asked me where they can find my slides, and I predict that few more will keep asking about them over the next few days, I uploaded my Big Data and NoSQL presentations to SlideShare. You can find them here:
Whenever I tell an experienced Oracle DBA about Hadoop and what companies are doing with it, the immediate response is “But I can do this in Oracle”. Let’s compare.