Open Source

fsfreeze in Linux

The fsfreeze command, is used to suspend and resume access to a file system. This allows consistent snapshots to be taken of the filesystem. fsfreeze supports Ext3/4, ReiserFS, JFS and XFS. A filesystem can be frozen using following command: # /sbin/fsfreeze -f /data Now if you are writing to this filesystem, the process/command will be…

Using strace to debug application errors in linux

strace is a very useful tool which traces system calls and signals for a running process. This helps a lot while debugging application level performance issues and bugs. Aim of this post is to demonstrate the power of strace in pinning down an application bug. I came across an issue in which nagios was sending…

Troubleshooting a Multipath Issue

Multipathing allows to configure multiple paths from servers to storage arrays. It provides I/O failover and load balancing. Linux uses device mapper kernel framework to support multipathing. In this post I will explain the steps taken to troubleshoot a multipath issue. This should provide an glimpse into the tools and technology involved. Problem was reported…

Calculating Business Days in HiveQL

One of the common tasks in data processing is to calculate the number of days between two given dates. You can easily achieve this by using Hive DATEDIFF function. You can also get weekday number by using this more obscure function: SELECT FROM_UNIXTIME(your_date,’u’) FROM some_table; This will return 1 for Monday, 7 for Sunday and…

PGBR 2013 Porto Velho insights

It’s been 4 years since the last time I spoke at the Postgres Brazilia community event (last time I spoke was at PGCon 2009 Sao Paolo – PyReplica project) and seems that the community is still growing and vibrant. The talks given at the meeting were amazing. PalominoDB was there with two talks: “Postgres and…

Benchmarking Postgres on AWS 4,000 PIOPs EBS instances

Introduction Disk I/O is frequently the performance bottleneck with relational databases. With AWS recently releasing 4,000 PIOPs EBS volumes, I wanted to do some benchmarking with pgbench and PostgreSQL 9.2. Prior to this release the maximum available I/O capacity was 2,000 IOPs per volume. EBS IOPs are read and written in 16Kb chunks with their…

Exploring Configuration Management with Ansible

What is Ansible? Ansible is a configuration management and deployment system, like Puppet, Capistrano, Fabric, and Chef. Its aim is to be radically simple and let you use your existing scripts to help with cluster configuration and software deployment whenever possible. Here are the ways that Ansible differentiates itself. Simplicity Ansible does not include a client/server…

The Postgres-XC 1.0 beta release overview

When I heard about this project a year ago, I was really excited about it. Many cluster-wide projects based on Postgres were developed very slowly, based on older (i.e. Postgres-R http://www.postgres-r.org/) or proprietary (i.e. Greenplum) versions. The features that this project hoped to achieve were ambitious, as we’ll detail in this post. And best of…

New versions of PgPool released – 3.1.3 & 3.0.7

This essential tool for Postgres architectures is continually improving, and is now available in its new releases. Both are bugfix versions. For those unfamiliar with the tool, it is a middleware with functionality as a load balancer, pooler*  and/or replication system for PostgreSQL databases. The 3.1.x versions are compatible with Postgres 9.x, whose streaming replication…

Exploring a new feature of 9.2: Index-only scans

We, like other Postgres DBAs worldwide, have been waiting for the 9.2 release for some time, specifically for the index-only scan feature, which will help reduce I/O by preventing unnecessary access to heap data if you only need data from the index. Besides 9.2 is still in development, it is possible to download a version…

Page 1 of 212