How to run OpenTSDB with Google Bigtable
In a previous post ( OpenTSDB and Google Cloud Bigtable) we discussed OpenTSDB, an open source distributed database specifically designed for storing timeseries data. We also explained how OpenTSDB relies on Apache HBase for a reliable and scalable data backend. However, deployment and administration of an HBase cluster is not a trivial task, as it requires a full Hadoop setup. This means that it takes a big data engineer (or better a team of them) to plan for the cluster sizing, provision the machines and setup the Hadoop nodes, configure all services and tune them for optimal performance. If this is not enough, Operations teams have to constantly monitor the cluster, deal with hardware and service failures, perform upgrades, backup regularly, and a ton of other tasks that make maintenance of a Hadoop cluster and OpenTSDB a challenge for most organizations. With the release of Google Bigtable as a cloud service and its support for the HBase API, it was obvious that if we managed to integrate OpenTSDB with Google Bigtable, we would enable more teams to have access to the powerful functionality of OpenTSDB by removing the burden from maintaining an HBase cluster. Nevertheless, integration of OpenTSDB with Bigtable was not as seamless as dropping a few jars in its release directory. This happened because the OpenTSDB developers went over and above the standard HBase libraries, by implementing their very own asynchbase library. Asynchbase is a fully asynchronous, non-blocking, thread-safe, high-performance HBase API. And no one can put it better than the asynchbase developers themselves who claim that ‘ This HBase client differs significantly from HBase's client. Switching to it is not easy as it requires one to rewrite all the code that was interacting with any HBase API.’ This meant that integration with Google Bigtable required OpenTSDB to switch back to the standard HBase API. We saw the value of such an effort here at Pythian and set about developing this solution.
The asyncbigtable libraryToday, we are very happy to announce the release of the asyncbigtable library. The asyncbigtable library is a 100% compatible implementation of the great asynchbase library that can be used as a drop in replacement and enable OpenTSDB to use Google Bigtable as a storage backend. Thanks to support from the OpenTSDB team, the asyncbigtable code is hosted in the OpenTSDB GitHub repository.
ChallengesTo create asyncbigtable we had to overcome two great challenges. The first one was that OpenTSDB assumes that the underlying library (until now asynchbase) performs asynchronous and non-blocking operations. On the other hand, the standard HBase API only supports synchronous and blocking calls. As a workaround for this, we used the BufferedMutator implementation that collects all Mutation operations in a buffer and performs them in batches, allowing for mutations with an extremely low latency. The second challenge stemmed from the fact that the OpenTSDB project has a very limited set of jar dependencies, that are explicitly defined in Makefiles. Contrary to this spartan approach, HBase and Bigtable client libraries have a significant number of transitive dependencies. Since, adding those dependencies one-by-one in the OpenTSDB build process would complicate its dependency management, we decided to package all asyncbigtable dependencies in an uber-jar using the Maven assembly plugin. Therefore, building OpenTSDB with asyncbigtable support is now as simple as downloading a single beefy jar.
Before you startBefore you build OpenTSDB with Google Bigtable support, you must complete the following required steps:
- Create a Google Bigtable cluster (https://cloud.google.com/bigtable/docs/creating-cluster)
- Install HBase shell with access to the Google Bigtable cluster (https://cloud.google.com/bigtable/docs/installing-hbase-shell)
- Download and install the required tools for compiling OpenTSDB from source (https://opentsdb.net/docs/build/html/installation.html#compiling-from-source)
Build and run OpenTSDB
- Clone and build the modified source code from the Pythian github repository:
git clone -b bigtable firstname.lastname@example.org:pythian/opentsdb.git cd opentsdb sh build-bigtable.sh
- Create OpenTSDB tables
OpenTSDB provides a script that uses HBase shell to create its tables. To create the tables run the following command:
env COMPRESSION=NONE HBASE_HOME=/path/to/hbase-1.1.2 \ ./src/create_table.sh
- Run OpenTSDB
export HBASE_CONF=/path/to/hbase-1.1.2/conf mkdir -p <tmp_dir> ./build/tsdb tsd --port=4242 --staticroot=build/staticroot \ --cachedir=<tmp_dir>