Distribution Cleanup job and xp_cmdshell
Troubleshooting SQL Server replication is often like being a digital detective—you see a cryptic error at the surface, but the real culprit is usually buried three layers deep in a permission setting. It sounds like you had a fun time chasing down those xp_cmdshell identities!
Here are the H2 and H3 headings for your blog post to make it scannable and organized, without changing a single word of your original content.
Troubleshooting the SQL Server 2005 Distribution Cleanup Job
Recently I encountered a SQL Server 2005 Replication setup where the following error message was appearing about every 5 minutes in the SQL Server error logs.
Replication-@rowcount_only parameter must be the value 0,1, or 2. 0=7.0 compatible checksum. 1=only check rowcou: agent distribution@rowcount_only parameter must be the value 0,1, or 2. 0=7.0 compatible checksum. 1=only scheduled for retry. Could not clean up the distribution transaction tables.
Identifying the Root Cause via Job History
The error message was coming from the failing distribution cleanup job. Examining the job history I was able to determine the real error message.
Executed as user: PROD\SQLAgent. Could not remove directory '\\dummy\Replication\unc\DUMMY_TRANSLOG_TRANSLOG\20080205230108\'. Check the security context of xp_cmdshell and close other processes that may be accessing the directory. [SQLSTATE 42000] (Error 20015). The step failed.
Understanding the Distribution Cleanup Mechanism
To provide some background information on the Distribution Cleanup Job. When a server is configured to be a Distributer a job is created titled ‘Distribution clean up: distribution’. The job has one step and it calls a stored proc called dbo.sp_MSdistribution_cleanup where it will clear out records from the MSrepl_commands and MSrepl_transactions tables. The cleanup procedure also deletes any expired snapshots stored in the Replication folder.
Debugging the Cleanup Failure
Step 1: Enabling xp_cmdshell
The first step in debugging was to confirm if xp_cmdshell was enabled. I opened SQL Server Surface Area Configuration and selected “Surface Area Configuration for Features”, selected the xp_cmdshell and checked the enable box. Xp_cmdshell is disabled by default in SQL Server 2005 for security reasons.
Step 2: Verifying SQLAgent Permissions
The second part of debugging was to make sure that the PROD\SQLAgent account had proper permissions to run xp_cmdshell statements. Only members of the sysadmin role has permission to create jobs with steps that run xp_cmdshell, if PROD\SQLAgent is not a member of sysadmin role, the job could be given permission to run xp_cmdshell by using a proxy account.
Step 3: Setting UNC and Share Permissions
The final part of debugging was to ensure that proper permissions were set to delete the specified folder. The stored procedure dbo.sp_MSdistribution uses the unc path, not the local path to the folder. I configured the SQL Server service account to have full control on the replication share. There is a distinction between the permissions set on the folder and the network share.
Identifying the Execution Identity
The reason I set the SQL Server service account instead of PROD\SQLAgent to have full control on the share is because when an xp_cmdshell command it runs on the os as the SQL Server service account not the user issuing the command.
To prove this I have run the following commands:
C:\>echo %USERNAME% pythian C:\>osql -E 1> xp_cmdshell 'echo %USERNAME% > c:\whoami.txt' 2> go output ------------------------------------------------------------------------------- ------------------------------------------------------------------------ ------------------------------------------------------------------------ -------------------------------- NULL (1 row affected) 1> exit C:\>more whoami.txt NETWORK SERVICE
As we can see in the previous test that the xp_cmdshell is run under the NETWORK SERVICE account instead of the pythian user which I was logged in as.
Resolution
After making these changes the distribution clean up job was able to run successfully and the errors stopped occurring in the Error logs.
SQL Server Database Consulting Services
Ready to optimize your SQL Server Database for the future?
Share this
Share this
More resources
Learn more about Pythian by reading the following blogs and articles.
Migrate a SQL Server environment with complex replication without reinitializing or rebuilding replication
SQL server: Create missing indexes with unmessing names
SQL server error: 18056, Severity: 20, State: 29
Ready to unlock value from your data?
With Pythian, you can accomplish your data transformation goals and more.