THE WORLD DISCUSSES #PYTHIAN ON TWITTER. HAVE A QUESTION? USE OUR HASHTAG AND ASK AWAY.

Oracle Database or Hadoop?

Some decisions sound easy, but its also easy to get them wrong. Today I had a choice of hanging around New York city, or working on my big data presentation for RMOUG. Sounds easy, and yet I spent the day working on that presentation.

Whenever I tell an experienced Oracle DBA about Hadoop and what companies are doing with it, the immediate response is “But I can do this in Oracle”.

Read the rest of this entry . . .

Comparing Hadoop Appliances

Today Oracle announced that it’s Big Data Appliance is available. You can see the press release here.
The appliance was initially announced at Oracle OpenWorld in September. The appliance announced today is pretty similar to what was presented by Oracle in OpenWorld, the glaring difference is that the Hadoop shipped with the appliance will not be a vanilla Apache Hadoop, but rather Cloudera’s Hadoop distribution and will include Cloudera’s administration software. You can read Oracle’s press release about the collaboration here. Alex Popescu blogged about the implications for Oracle and Cloudera.

At this time there are three Hadoop appliances in the market: Oracle’s Big Data appliance, Netapp’s Hadooplers and EMC’s Greenplum DCA. It looks like a lot of companies that did not already adopt Hadoop in 2011, are looking to do so in 2012, and some of them may be considering going with an appliance. I want to take a look at some of the reasons a company will be interested in a Hadoop appliance, and what are the differences between the different appliances.

Read the rest of this entry . . .

De-Confusing SSD (for Oracle Databases)

You never forget your first SSD.
For me, the first time I really *noticed* SSDs was when one of my customers encountered serious corruption on one of their databases and we had to restore an entire database. It was not a small database, around 300G in total file size. After I started RMAN restore and recovery process, the customer asked the inevitable question: “How long will this take?”. I replied that I’m not familiar with the performance of their storage, but from my experience a restore of this size can be expected to take 5 hours. Imagine my surprise when the restore was done after an hour.

This was enough to convince me that SSD is magic, and that if you have money and IO problem, just go SSD. Of course, if that was the end of the story, I wouldn’t have much of a blog post.

Read the rest of this entry . . .

Secrets of Oracle’s Automatic Degree of Parallelism

Automatic degree of parallelism, or Auto DOP, is a new feature in 11gR2 that promises to help manage systems where large subset of the workload runs with parallel processing. In this post I’ll introduce the feature and give very useful tips I got from Oracle’s Real World Performance expert Greg Rahn on how to use it. So this is worth reading even if you are familiar with the feature.

The problem is fairly well known – you system only has finite amount of resources. Only so many CPUs, only so many disks capable of delivering only so many IO/s and MB/s. A certain query may have amazing performance when running with 32 parallel processes all alone on your test system. When 5 people need to run it at once, and at the same time there are two scheduled jobs running each with its own parallel processes, there are two likely outcomes:

  1. You will run more parallel processes than your system is capable of serving. Resulting in long queues on the CPU and storage, and overall performance degradation.
  2. You limit the maximum number of parallel processes to protect the database resources, and some of the queries degrade. If you don’t detect it, the ETL process that should have finished in two hours takes 24, which means that the daily report sent to the CEO is missing some of the data. Ouch.

Read the rest of this entry . . .

Hadoop and NoSQL Mythbusting

With all the buzz in OOW about the big data machine, there was also a lot of non-sense flying around. I love it that the Oracle community is finally interested in Hadoop and NoSQL, but I hate it when people sound authoritative without having an actual clue. I’ve left a few presentations with smoke coming out of my ears.

Here are few things that people got all wrong:
Read the rest of this entry . . .

Oracle’s Big Data Machine – Details and Musings

Oracle announced the Big Data Appliance on Monday morning keynote. Many people, me included, were long waiting for this to happen. Others didn’t think it will ever happen. So naturally, there is a lot of buzz and excitement around the new device in Open World. The keynote announcement was very short on details and certainly did not satisfy my technical curiosity. So I went to a few presentations to hear what exactly is included in the offering.
Read the rest of this entry . . .

Trends and Data – Notes from Strata NYC 2011

I’ve attended the Strata conference in NYC last week. Its been many years since I’ve last attended a conference without presenting in it. On one hand, attending only makes for a far more relaxed experience. On the other hand, I missed having random people come up to me and talk about my presentation. I decided to attend the conference since it is considered the foremost data science conference. And I was very much interested in what those data scientists are up to.

Good data scientists  combine the abilities of business analysts, statisticians and software engineers. They have the skills, the tools and the mandate to mine and analyze all the data the organization collects to deliver valuable insights to the business and data-based features to the customers of the business. In addition, it is considered the hottest job around. Of course, it is data scientists who mined job postings and job moves to come up with this conclusion, so maybe take it with a grain of salt.

Data-scientists normally work with very large amounts of data, both structured (the enterprise data warehouse) and unstructured (web server logs, blog posts). Since I’m a big fan of big data, I was very curious to see what those data scientists care about.

So, in no particular order – stuff data scientists like:

Read the rest of this entry . . .

Database Appliance for the Masses

In early 2005 I worked for an SQL Server and Windows shop that wanted to transition to Linux and Oracle for the improved high availablity and scalability. We had exactly one SA who knew Linux, two SQL Server DBAs and one database developer.

The new manager of the DBA team also wanted to go for RAC.

We knew we didn’t have the necessary expertise, so after discussing with Oracle’s sales team, we decided to get a consultant to install it for us.

A week of consultant work later, countless of server re-provisions, and we still didn’t have a working RAC.
Not the consultant’s fault, but at the time Oracle’s RAC installation was both very senstivite to OS and HW configuration, and wasn’t very good at detecting problems and notifying the users. It also wasn’t easy to uninstall and re-install. Mix that with a team that has no experience with the underlying OS, and you have a guaranteed disaster.
Our SAs simply couldn’t understand and implement the installation requirements, and the DBA team couldn’t communicate this to them very well.

Fast forward six years:
Read the rest of this entry . . .

Hadoops Everywhere

We don’t pay enough attention to Hadoop.

By “we” I mean DBAs, the rest of the world is paying plenty of attention to Hadoop. Recently, I started asking my customers and fellow DBAs about Hadoop adoption in their company. Turns out that many of them have Hadoop. Hadoop shows up in large companies and small ones, in established industries and in startups. Its everywhere.

The way Hadoop shows up in all companies, and the way DBAs don’t pay Hadoop much attention, reminds me a lot of how MySQL started showing up in the enterprise. It didn’t start by DBAs showing up one morning and telling their managers:
“There’s this new open source database. Its not as stable as Oracle and it doesn’t have all the features we need, but man – its going to save us tons of money, and its pretty simple to manage.”

Nope, this never happened. What happened instead is that developers learned about MySQL, and it seemed to them like an excellent way to go around this whole DBA thing. They could install it themselves, learn how to use it in a week and become happy and productive. Without ever having to discuss their schema, data model, requirements, capacity planning, availability, backups and all the other things that DBAs want to talk about.

By the time the application came out of developement and had to be deployed in production, MySQL was a done deal. No one is going to re-write the app just because the DBAs don’t know MySQL. Sometimes the Oracle DBAs were forced to learn and admin MySQL, but more often it was considered “not a database” and left for the sysadmins to manage, while the DBAs continued to pretend that the entire world is written by Oracle.

So thats what Hadoop adoption looks like now – Its usually introduced by the developers and administered by sysadmins, while DBAs continue to pretend it doesn’t exist or doesn’t matter. When pressed, some DBAs will even insist that all this “big data” thing can and should be done in a database, but the developers are too ignorant or lazy to work with a proper RDBMS.

I think the day arrived when, just like DBAs can no longer ignore MySQL, we can no longer ignore Hadoop either. So lets talk about it.

Read the rest of this entry . . .

Important Things I’ve Learned at Hotsos 2011

Hotsos is a blast. Easily the best technical Oracle conference. The speakers are terrific, the topics are cutting edge and the audience is experienced, intelligent and engaged.

I’ve been to quiet a few conferences by now, and one of the things I noticed is that the best learning is rarely as organized as “I’ll go to this presentation about triggers and I’ll learn important things about triggers”. This works too, but often you learn more from chance comments, side conversations, something a presenter says that causes you to think more deeply about some topics.

I’m documenting the best lessons I’ve learned, so you can learn too and so I won’t forget them. Read the rest of this entry . . .

Start NowWith Pythian - database design, management and emergency handling capabilities...

Live Updates

pythian: RT @FN_Press2: Schooner Information Technology Teams with Pythian to Deliver Advanced Support and High... http://finanznachrichten.de/20
more



Testimonials

  • Serge Racine

    DBA, Brookfield Energy

    We are very satisfied by the service given to us by Andre and Shakir in support of our recent data quality and reorganization initiative.... more