Currently browsing NoSQL

THE WORLD DISCUSSES #PYTHIAN ON TWITTER. HAVE A QUESTION? USE OUR HASHTAG AND ASK AWAY.

Oracle Connects Big Data to Medium and Small Data

With the announcement of the Oracle Big Data Appliance, Oracle also comes up with some really cool technology stack which is being termed as Oracle Big Data Connectors (OBDC). This piece of software can be used with both Oracle Big Data Appliance and other Apache Hadoop-based systems.

Read the rest of this entry . . .

Comparing Hadoop Appliances

Today Oracle announced that it’s Big Data Appliance is available. You can see the press release here.
The appliance was initially announced at Oracle OpenWorld in September. The appliance announced today is pretty similar to what was presented by Oracle in OpenWorld, the glaring difference is that the Hadoop shipped with the appliance will not be a vanilla Apache Hadoop, but rather Cloudera’s Hadoop distribution and will include Cloudera’s administration software. You can read Oracle’s press release about the collaboration here. Alex Popescu blogged about the implications for Oracle and Cloudera.

At this time there are three Hadoop appliances in the market: Oracle’s Big Data appliance, Netapp’s Hadooplers and EMC’s Greenplum DCA. It looks like a lot of companies that did not already adopt Hadoop in 2011, are looking to do so in 2012, and some of them may be considering going with an appliance. I want to take a look at some of the reasons a company will be interested in a Hadoop appliance, and what are the differences between the different appliances.

Read the rest of this entry . . .

Hadoop and NoSQL Mythbusting

With all the buzz in OOW about the big data machine, there was also a lot of non-sense flying around. I love it that the Oracle community is finally interested in Hadoop and NoSQL, but I hate it when people sound authoritative without having an actual clue. I’ve left a few presentations with smoke coming out of my ears.

Here are few things that people got all wrong:
Read the rest of this entry . . .

Oracle’s Big Data Machine – Details and Musings

Oracle announced the Big Data Appliance on Monday morning keynote. Many people, me included, were long waiting for this to happen. Others didn’t think it will ever happen. So naturally, there is a lot of buzz and excitement around the new device in Open World. The keynote announcement was very short on details and certainly did not satisfy my technical curiosity. So I went to a few presentations to hear what exactly is included in the offering.
Read the rest of this entry . . .

Oracle Big Data Appliance — Oracle’s Bold Move Into Big Data Space

Oracle Big Data Appliance (BDA) is being announced at the Oracle OpenWorld keynote as I’m posting this. It will take some time for it to be actually available for shipment and some details will likely change but here is what we have so far about Oracle Big Data Appliance.

A rack with InfiniBand, full of 2U servers similar to Exadata Storage. No flash storage needed so couple sockets and a dozen of disks will do. Maybe more ram than Exadata storage cells themselves. I suspect you could have as many servers as you want in a configuration but since Hadoop clusters are usually dozens and more nodes, full rack seems reasonable with about 20 Hadoop compute nodes to start with. Real deployments should easily go into multiple racks stacked together.

Low latency, high bandwidth communication is critical for fast data loading and later data processing with Hadoop so InfiniBand will be there — same Exadata/Exalogic-like platform.

Oracle should also have its own NoSQL engine — Oracle NoSQL Database. If you know existing Oracle products, Berkley DB seems to be a reasonable foundation to power Oracle’s new NoSQL engine.
Read the rest of this entry . . .

Trends and Data – Notes from Strata NYC 2011

I’ve attended the Strata conference in NYC last week. Its been many years since I’ve last attended a conference without presenting in it. On one hand, attending only makes for a far more relaxed experience. On the other hand, I missed having random people come up to me and talk about my presentation. I decided to attend the conference since it is considered the foremost data science conference. And I was very much interested in what those data scientists are up to.

Good data scientists  combine the abilities of business analysts, statisticians and software engineers. They have the skills, the tools and the mandate to mine and analyze all the data the organization collects to deliver valuable insights to the business and data-based features to the customers of the business. In addition, it is considered the hottest job around. Of course, it is data scientists who mined job postings and job moves to come up with this conclusion, so maybe take it with a grain of salt.

Data-scientists normally work with very large amounts of data, both structured (the enterprise data warehouse) and unstructured (web server logs, blog posts). Since I’m a big fan of big data, I was very curious to see what those data scientists care about.

So, in no particular order – stuff data scientists like:

Read the rest of this entry . . .

Hadoops Everywhere

We don’t pay enough attention to Hadoop.

By “we” I mean DBAs, the rest of the world is paying plenty of attention to Hadoop. Recently, I started asking my customers and fellow DBAs about Hadoop adoption in their company. Turns out that many of them have Hadoop. Hadoop shows up in large companies and small ones, in established industries and in startups. Its everywhere.

The way Hadoop shows up in all companies, and the way DBAs don’t pay Hadoop much attention, reminds me a lot of how MySQL started showing up in the enterprise. It didn’t start by DBAs showing up one morning and telling their managers:
“There’s this new open source database. Its not as stable as Oracle and it doesn’t have all the features we need, but man – its going to save us tons of money, and its pretty simple to manage.”

Nope, this never happened. What happened instead is that developers learned about MySQL, and it seemed to them like an excellent way to go around this whole DBA thing. They could install it themselves, learn how to use it in a week and become happy and productive. Without ever having to discuss their schema, data model, requirements, capacity planning, availability, backups and all the other things that DBAs want to talk about.

By the time the application came out of developement and had to be deployed in production, MySQL was a done deal. No one is going to re-write the app just because the DBAs don’t know MySQL. Sometimes the Oracle DBAs were forced to learn and admin MySQL, but more often it was considered “not a database” and left for the sysadmins to manage, while the DBAs continued to pretend that the entire world is written by Oracle.

So thats what Hadoop adoption looks like now – Its usually introduced by the developers and administered by sysadmins, while DBAs continue to pretend it doesn’t exist or doesn’t matter. When pressed, some DBAs will even insist that all this “big data” thing can and should be done in a database, but the developers are too ignorant or lazy to work with a proper RDBMS.

I think the day arrived when, just like DBAs can no longer ignore MySQL, we can no longer ignore Hadoop either. So lets talk about it.

Read the rest of this entry . . .

Log Buffer #232, A Carnival of the Vanities for DBAs

These days products based on the database technologies are getting hatched with the speed of light. From the giants like Oracle and Microsoft to the start-ups, there is an army of products which is growing by the week. It’s become hard to remain abreast of all these technologies, but thanks to blogs, we get the latest and greatest news. This week’s Log Buffer in its Log Buffer #232 has lumped some interesting posts together.
Read the rest of this entry . . .

PgEast 11 The End Game

Well the last busy day here in The Big Apple again a number of very good technical talks. It is not often that the developer of a key piece of a technology gives an intro talk so I grasped it when it came up. Robert Haas gave a very informative talk on the theory behind WAL (Write Ahead Logging) and how it is implemented on PostgreSQL as compared to other DBs. His talk never ventured into the neither world of techno-babel but gave just enough of the technical side to get the understanding out. In the second part of his talk Robert focused on a introduction of the ‘Buzz’ words of WAL that one might have to deal with. This was both very entertaining and armed one with a real understanding of WAL.

I next sat in on ‘Little Jim’ Mlodgenski’s ‘Scaling with GridSQL’ talk. Another great technical talk that did not get bogged down in little details. Jim illustrated how GridSQL leverages the Power of Nodes to create a scalable parallel query data ware-house by creating a controller that will split off most of a large query to the different nodes in a cluster take the results from these nodes and then applies the final touches. Jim clearly demonstrated that with simple aggregation queries one seen a linear gains in performance for each node added to the cluster. With more complex queries there was an exponential gain for the first few nodes but one sees a fall of after only 8. Jim was very open about the pitfalls of this form of scaling (eg backup can be problematic) but it a very good solution for quick scalable data-ware housing.

The final talk of the conference was Jake Luciani’s talk comparing Apache’s Casandra to PostgreSQL was a very good introduction to this rather novel No-SQL DB. Think of a ring of peer to peer hash tables that work together to scale, provide no single point of failure, automate replication and implement tunable consistency. Its basic concept is the opposite or the RDBMS ‘Store Many! Read Once’ which makes some sense when used in such situations as large blogs, photo libraries or even diverse catalogs. Jake also introduced us to something he called CQL a query language for thew No-SQL DB

The conference ended with one of the better open forums I have attended I am sure next year will be much better.

Hopefully I will be able to make it next year as well

Day one at PGEast 11

I guess I brought the snow with me to Ne York as I awoke to a nice 10cm dump. Anyway today would best be described as a day of ‘Disruptive Tech’

I first attended Kevin Kempters intro into PorstgreSQL High Availability. A very well balanced presentation that gave a very good overview of what is available out of the box for both Warm Standbys and Hot Standbys how they can be very easily implemented. He also gave a quick overview of other tools that can be used including Slony for detailed fail-overs and PgPool for load balancing and relication. Not very disruptive but it does show that Pg is on par with most of the heavy hitters such as MySQL and Oracle.

The keynote this year was by Ed Boyajian the CEO or EnerpriseDB and he gave an big picture of the DB in terms of market which is a whopping 26$ Billion a year in the US alone of which the the two five players have 90% of the market one having more than half.

He made the comparison between his time at Red Hat when there was a huge untapped market much the same situation exists today for PostgreSQL as it represents a ‘Disruptive player’ in the game is it is the last open source DB out there. In other words we can only grow in the future.

To continue on with my Disruptive theme I also attended B. W. McAdams and Justin Dearing’ s two talks on Mongo. Mongo is true disruptive technology as it is a NON-SQL Database. For an old timer relational chap I was a little skeptical. It is hard to thing of a DB without SQL, Schema, Joints or triggers but they made a good case for it. It is all a question of building the correct tool for the Job. Traditional relational DB where never intended to be used to create Blog web sites and as many of us have found out they might not be the ‘right’ tool. Mongo with its ‘Document’ orientation solves many of the ‘Blog’ problems very elegantly. Mongo is just not for Blogs both speakers gave a number of examples of its application for example in a quickie app that displays the nearest Subway station to you and one that acts as the cache for a large PostgreSQL DB

I also has to pleasure to hear a first time speaker Vanessa Hurst who presented on the topic of ORMs (Object Relational Mappers) and the problems they cause for DBs. It was good to hear some of these issues and she made the very good point that it is always a compromise between speed to market and long term goals. You might get an ORM db out in two months but in one year form now your DB may not work anymore because of single object files, lack of planning for scalability or just poor design that was forced upon the team from the ORM.

Well off to enjoy the ‘Le Comte Ory’ at the Met for me tonight

Cheers

Start NowWith Pythian - database design, management and emergency handling capabilities...

Live Updates

pythian: RT @FN_Press2: Schooner Information Technology Teams with Pythian to Deliver Advanced Support and High... http://finanznachrichten.de/20
more



Testimonials

  • Serge Racine

    DBA, Brookfield Energy

    We are very satisfied by the service given to us by Andre and Shakir in support of our recent data quality and reorganization initiative.... more