THE WORLD DISCUSSES #PYTHIAN ON TWITTER. HAVE A QUESTION? USE OUR HASHTAG AND ASK AWAY.

Easier SQL Server Database Restores

Have you ever been asked to restore a database and wondered which backup files were available? And if many were available, which ones you should use, and in which order?

Getting familiar with the msdb schema, especially the backupset and backupmediafamily tables, helps to answer that question since all the backup history is stored in that database. But you would still have to check if the backups are still available on disk (tape?) and figure out what are the commands to restore, and the order in which to restore each file.

Okay, this is not a difficult thing to do, but when you do it over and over again, it becomes tedious, and the automation bug in you starts to look for a better and quicker way to handle it. The bug in me found the following answer.

Read the rest of this entry . . .

Liveblogging: Automated System Management

Usenix 2008 – Automated System Management, by Æleen Frisch of Exponential Consulting (and numerous books)

What is automation?

generic scripts with cron,at

Problem: overlap of effort

So folks developed automation systems. General automation tools are around:

cfengine, puppet, cfg2

These are general — files, directories, etc. Don’t need to use chmod and chown and underlying commands.

However, they don’t really survive reboots well. For that, we tend to use tools more towards jumpstart, kickstart.

Monitoring with Nagios, related tools are rrd-tools such as cacti, cricket, munin, “or any of 8,000 others.” Automating ideas like iostat.

Nessus is a security testing tool.

homegrown, general, performance related, also automated backups — bakula, amanda, legato.

What do you want automated?

“Coffee machines”.

A lot of unsolved problems are human interaction.

Other problems solved — using remote power management.

Inventory management is another issue. HP OpenView is one, but Frisch says folks are not happy with it. You can pay for high-end monitoring systems.

A question came up about an inventory of users on systems. LDAP or NIS or Active Directory is the traditional solution where there are no local accounts. There’s authentication and then authorization, and the automated tools usually have authentication information but not authorization information. (You can handle it, but making groups on these tools is usually painful.) Authorization is usually handled either locally or as “if you’re authenticated you’re authorized”.

We talked about how to power down 500 machines when the air conditioning goes out, or when the power is going down. Combinations of temperature probes, “wake-on-lan”, remote power on and off were discussed.

What do people use to automate installs and configuration on Windows? For installation, the Windows native tools are great. It was noted that efs works better on Windows.

Anyone using Splunk with Windows? One answer — it works OK, there are some daemon tools to convert Windows Event Log to syslog.

Splunk came up as a topic of discussion, how it’s a great log management software and solves a problem we’ve had for decades — how to deal with logs. Frisch says, “Splunk is the most promising thing out there.”

Record keeping of time was brought up, as well as time management. Basically what we do at Pythian, so I explained how we do things. Other folks brought up ticketing systems as well. Jira and RT (Request Tracker) and OTRS (Open Ticket Request System) were brought up as well.

Also for change management, some folks use ClearCase (not open source), and others use rancid, others use Trac or bugzilla + change management system like subversion. Jira was recommended as a product that does both (with an add-on).

Use DHCP to help automate IP assigning. rsync is your friend too.

(it occurs to me that a dishwasher is an interesting problem; why do we have a dishwasher instead of just having a sink/dishwasher hybrid? Similarly, a hamper that does laundry for you when it’s full.)

Automating To Save Time

This is the first place I am announcing this: The Pythian Group has made me a Team Lead. I am extremely honored and somewhat humbled by this, and I am determined to do a good job. I started officially on Monday, March 3rd, and my first week went pretty well.

On Saturday, I spent a short bit of time automating one process. And while I was waiting for a 300G backup to copy from one machine to another, I worked on automating more.

Currently I have one somewhat junior DBA working for me, and I am getting another DBA tomorrow. But yesterday, I put in more than a half day of work. Why? Well, I was automating more to make the life of my team members easier.

As part of the DBA service, we offer monitoring in the form of alerting, and also in the form of daily checks to ensure everything is running smoothly. Daily checks consist of things that do not need to be checked every minute, but should be checked frequently. For instance, one of our daily checks is to ensure that the running database configuration matches the config file (i.e., my.cnf or my.ini). This is very valuable to ensure that no changes get lost. A DBA might be adjusting the configuration, but forget to put the final changes in the config file. In that case, the next day our daily checks will throw a warning, and that DBA will say “oh yeah, I forgot to put that into the config file!”

As many monitoring systems do, our system has false negatives. Though we do not normally “do the dailies” on weekends, I spent some time Saturday with them. I took the checks that were false negatives and fixed them to not show errors or warnings for those false negatives. For instance, many of our machines complained that the have-bdb parameter was set to DISABLED in the database but set to YES in the default file — because we used skip-bdb in the /etc/my.cnf. We did not set something like have-bdb=0, so the check complained. I fixed that problem, and a few others.

The result is that I went from 26% of my team’s machines reporting “daily checks are OK, nothing to look at” to 46% of my team’s machines reporting that. This means that my team can be more productive and spend time on the real errors, instead of clicking and taking the time to realize “oh, yeah, that, it’s just a false negative.”

Start NowWith Pythian - database design, management and emergency handling capabilities...

Live Updates

pythian: RT @FN_Press2: Schooner Information Technology Teams with Pythian to Deliver Advanced Support and High... http://finanznachrichten.de/20
more



Testimonials

  • Serge Racine

    DBA, Brookfield Energy

    We are very satisfied by the service given to us by Andre and Shakir in support of our recent data quality and reorganization initiative.... more