Blog | Pythian

Distribution Cleanup job and xp_cmdshell

Written by Jon Stairs | Mar 24, 2008 4:00:00 AM

Troubleshooting SQL Server replication is often like being a digital detective—you see a cryptic error at the surface, but the real culprit is usually buried three layers deep in a permission setting. It sounds like you had a fun time chasing down those xp_cmdshell identities!

Here are the H2 and H3 headings for your blog post to make it scannable and organized, without changing a single word of your original content.

Troubleshooting the SQL Server 2005 Distribution Cleanup Job

Recently I encountered a SQL Server 2005 Replication setup where the following error message was appearing about every 5 minutes in the SQL Server error logs.

Replication-@rowcount_only parameter must be the value 0,1, or 2. 0=7.0 compatible checksum. 1=only check rowcou: agent distribution@rowcount_only parameter must be the value 0,1, or 2. 0=7.0 compatible checksum. 1=only scheduled for retry. Could not clean up the distribution transaction tables.

Identifying the Root Cause via Job History

The error message was coming from the failing distribution cleanup job. Examining the job history I was able to determine the real error message.

Executed as user: PROD\SQLAgent. Could not remove directory '\\dummy\Replication\unc\DUMMY_TRANSLOG_TRANSLOG\20080205230108\'. Check the security context of xp_cmdshell and close other processes that may be accessing the directory. [SQLSTATE 42000] (Error 20015). The step failed.

Understanding the Distribution Cleanup Mechanism

To provide some background information on the Distribution Cleanup Job. When a server is configured to be a Distributer a job is created titled ‘Distribution clean up: distribution’. The job has one step and it calls a stored proc called dbo.sp_MSdistribution_cleanup where it will clear out records from the MSrepl_commands and MSrepl_transactions tables. The cleanup procedure also deletes any expired snapshots stored in the Replication folder.

Debugging the Cleanup Failure

Step 1: Enabling xp_cmdshell

The first step in debugging was to confirm if xp_cmdshell was enabled. I opened SQL Server Surface Area Configuration and selected “Surface Area Configuration for Features”, selected the xp_cmdshell and checked the enable box. Xp_cmdshell is disabled by default in SQL Server 2005 for security reasons.

Step 2: Verifying SQLAgent Permissions

The second part of debugging was to make sure that the PROD\SQLAgent account had proper permissions to run xp_cmdshell statements. Only members of the sysadmin role has permission to create jobs with steps that run xp_cmdshell, if PROD\SQLAgent is not a member of sysadmin role, the job could be given permission to run xp_cmdshell by using a proxy account.

Step 3: Setting UNC and Share Permissions

The final part of debugging was to ensure that proper permissions were set to delete the specified folder. The stored procedure dbo.sp_MSdistribution uses the unc path, not the local path to the folder. I configured the SQL Server service account to have full control on the replication share. There is a distinction between the permissions set on the folder and the network share.

Identifying the Execution Identity

The reason I set the SQL Server service account instead of PROD\SQLAgent to have full control on the share is because when an xp_cmdshell command it runs on the os as the SQL Server service account not the user issuing the command.

To prove this I have run the following commands:

C:\>echo %USERNAME% pythian  C:\>osql -E 1> xp_cmdshell 'echo %USERNAME% > c:\whoami.txt' 2> go  output   -------------------------------------------------------------------------------          ------------------------------------------------------------------------          ------------------------------------------------------------------------          --------------------------------  NULL  (1 row affected) 1> exit  C:\>more whoami.txt NETWORK SERVICE 

As we can see in the previous test that the xp_cmdshell is run under the NETWORK SERVICE account instead of the pythian user which I was logged in as.

Resolution

After making these changes the distribution clean up job was able to run successfully and the errors stopped occurring in the Error logs.

SQL Server Database Consulting Services

Ready to optimize your SQL Server Database for the future?