THE WORLD DISCUSSES #PYTHIAN ON TWITTER. HAVE A QUESTION? USE OUR HASHTAG AND ASK AWAY.

Comparing Hadoop Appliances

Today Oracle announced that it’s Big Data Appliance is available. You can see the press release here.
The appliance was initially announced at Oracle OpenWorld in September. The appliance announced today is pretty similar to what was presented by Oracle in OpenWorld, the glaring difference is that the Hadoop shipped with the appliance will not be a vanilla Apache Hadoop, but rather Cloudera’s Hadoop distribution and will include Cloudera’s administration software. You can read Oracle’s press release about the collaboration here. Alex Popescu blogged about the implications for Oracle and Cloudera.

At this time there are three Hadoop appliances in the market: Oracle’s Big Data appliance, Netapp’s Hadooplers and EMC’s Greenplum DCA. It looks like a lot of companies that did not already adopt Hadoop in 2011, are looking to do so in 2012, and some of them may be considering going with an appliance. I want to take a look at some of the reasons a company will be interested in a Hadoop appliance, and what are the differences between the different appliances.

Read the rest of this entry . . .

Hadoops Everywhere

We don’t pay enough attention to Hadoop.

By “we” I mean DBAs, the rest of the world is paying plenty of attention to Hadoop. Recently, I started asking my customers and fellow DBAs about Hadoop adoption in their company. Turns out that many of them have Hadoop. Hadoop shows up in large companies and small ones, in established industries and in startups. Its everywhere.

The way Hadoop shows up in all companies, and the way DBAs don’t pay Hadoop much attention, reminds me a lot of how MySQL started showing up in the enterprise. It didn’t start by DBAs showing up one morning and telling their managers:
“There’s this new open source database. Its not as stable as Oracle and it doesn’t have all the features we need, but man – its going to save us tons of money, and its pretty simple to manage.”

Nope, this never happened. What happened instead is that developers learned about MySQL, and it seemed to them like an excellent way to go around this whole DBA thing. They could install it themselves, learn how to use it in a week and become happy and productive. Without ever having to discuss their schema, data model, requirements, capacity planning, availability, backups and all the other things that DBAs want to talk about.

By the time the application came out of developement and had to be deployed in production, MySQL was a done deal. No one is going to re-write the app just because the DBAs don’t know MySQL. Sometimes the Oracle DBAs were forced to learn and admin MySQL, but more often it was considered “not a database” and left for the sysadmins to manage, while the DBAs continued to pretend that the entire world is written by Oracle.

So thats what Hadoop adoption looks like now – Its usually introduced by the developers and administered by sysadmins, while DBAs continue to pretend it doesn’t exist or doesn’t matter. When pressed, some DBAs will even insist that all this “big data” thing can and should be done in a database, but the developers are too ignorant or lazy to work with a proper RDBMS.

I think the day arrived when, just like DBAs can no longer ignore MySQL, we can no longer ignore Hadoop either. So lets talk about it.

Read the rest of this entry . . .

Different Technology Stacks On Production and DR?

Last week, I was at the NetApp office in North Sydney for the presentation on NetApp SnapManager for Oracle. It was good opportunity to learn more about NetApp snapshots while working on a project for one of our clients in Sydney. It was an especially interesting topic as I have some experience using Veritas Checkpoints (see my presentation on test systems refreshes), and it was interesting to see what’s different and new in the NetApp implementation. But I digress.

I learned that NetApp can provide access to the same LUNs via either Fiber-Channel (FC) or iSCSI. And this is when the interesting argument surfaced. Apparently, some companies aim to have the technology stack on their disaster-recovery site as different as possible from the primary production site. Their argument is that if one technology fails at the primary site (like FC to access storage), then the DR site using a different technology stack will more likely be unaffected.

Hrm . . .  I had never thought about this, and when I consider it now, it still doesn’t appeal to me. If I design a highly-available solution with a disaster-recovery site in place, one of my priorities would be to switch between the sites comfortably at any time. The more differences two sites have, the lower my comfort level is.

The only reason why I think some companies can “demand” having different storage technology stacks at production and DR is to justify a more convenient (a cheaper?) implementation.

Thoughts? Comments?

Start NowWith Pythian - database design, management and emergency handling capabilities...

Live Updates

pythian: RT @FN_Press2: Schooner Information Technology Teams with Pythian to Deliver Advanced Support and High... http://finanznachrichten.de/20
more



Testimonials

  • Serge Racine

    DBA, Brookfield Energy

    We are very satisfied by the service given to us by Andre and Shakir in support of our recent data quality and reorganization initiative.... more