Merge Replication Identity Range Management – An Identity Disorder? Part 1

Jan 24, 2014 / By Mohammed Mawla

Tags: , , , ,

Those who work with Merge replication know replicating articles with identity columns need some planning, especially if the identity column is part of the primary key or if there are many subscribers. Merge replication provides automatic identity range management across replication nodes that can offset the overhead of managing them manually. This blog will briefly shed some light on identity range management and how it works. In part 2, we will explore a specific case when batch inserts can affect the identity range management.

Why Identity Columns ?

Identity columns are very popular (and even touted more by DBA’s) to use as Primary Keys other than GUIDs for multiple reasons, including :

  • They are smaller in size ; GUIDs are four times larger than INT
  • Indexes over GUIDs tend to be larger. If the GUID is clustered index then it will affected every NON-Clustered index as well. Check out those two excellent blogs about GuIDs as PK and as Clustered indexes. http://www.sqlskills.com/blogs/kimberly/category/guids/
  • Because they are random, indexes fragmentation are common on indexes over GUIDs. You can reduce this by using NEWSEQUENTIALID() function but it has a security disadvantage http://technet.microsoft.com/en-us/library/ms189786.aspx

Here’s a very good blog about GUIDs comparing them to INT/BIGINT http://blogs.msdn.com/b/sqlserverfaq/archive/2010/05/27/guid-vs-int-debate.aspx



How Merge Replication Manages Identity Ranges:

So you have a few articles with identity columns that you want to replicate between a publisher and few subscriptions in a merge replication setup so the subscribers will also be able to update data back. Each “node” in the replication topology should be assigned a range which the user processes will use and then replicate to other nodes. It is important, especially if identity column is part of primary key, that no duplicates happen.

Let’s say we want to publish one article with identity column as a primary key to two subscribers. If we do not assign the correct ranges to each “node” here, then it’s possible that user processes will create same identity values. Once the merge agent tries to synchronize, it will insert duplicate values to the tables from other nodes and will fail if identity column is part of a primary key.

If we designate that each “node” gets isolated range of identity values, then user application will write values that cannot be used elsewhere. This is simply what Automatic Identity range does!

- On the publisher , the identity pool is used to assign values when adding new articles. There are two kinds of pools (ranges)

  1. Pool assigned to publisher and subscribers that will be used directly to insert data in these subscribers.

    This is controlled through @identity_range parameter of sp_addmergearticle

  2. If any of the subscribers are SERVER subscriptions , which can possibly republish data , they are assigned a pool (range) so that they can use their own subscribers. Usually this pool is larger than the pool in point one because it can serve multiple subscribers. This is controlled through @pub_identity_range parameter of sp_addmergearticle
Article

Example: If we configure the publisher to have range from 1-1000, Subscriber 1 to have 1100-2100 and subscriber 2 to have range from 2200-3200, then merge replication should NOT insert any duplicate values at any nodes. Kindly note that Merge agent processes do NOT increase the identity values when synchronizing , they just insert the plain values,  like the case when we use “set identity_insert ON”!

How the Ranges are Enforced

Through the usage of constraints on the identity column. Starting with SQL server 2005, each node is assigned two ranges, primary and secondary. Both primary and secondary ranges are equal in size and once the primary range is exhausted, the secondary one is used and merge agent assigns a new range to subscriber (becoming the secondary range); this is when automatic range management is chosen when the article is added (please see @identityrangemanagementoption parameter of sp_addmergearticle).

If you need more documentation, please refer to the following links:
http://blogs.msdn.com/b/suhde/archive/2010/03/21/inf-automatic-identity-range-management-with-merge-replication-in-microsoft-sql-server-2005-2008.aspx

Now some hands on!

A Simple Tutorial

Let us demonstrate a very simple merge replication topology with a publisher and two subscriptions. I am doing this on the same instance of my machine. The instance has to have SQL server agent to run job so STD, DEV or ENT edition is needed.

Let’s create three user databases and one table under the database that will be published:

Use Master
GO
Create database Pub;
GO
Create database sub1;
GO
Create database sub2;
GO
create table pub.dbo.tbl_pub (col1 int identity(1,1), col2 datetime)
GO

Enable distributor feature and create distribution database:

use master
GO
Declare @instance nvarchar(1000)

Set @instance = @@servername;

exec sp_adddistributor @distributor = @instance, @password = N''
exec sp_adddistributiondb @database = N'distribution', @log_file_size = 2, @min_distretention = 0, @max_distretention = 72, @history_retention = 48, @security_mode = 1
exec sp_adddistpublisher @publisher = @instance , @distribution_db = N'distribution', @security_mode = 1,@trusted = N'false', @thirdparty_flag = 0, @publisher_type = N'MSSQLSERVER'
GO

Create publication and add article:

use [Pub]
exec sp_replicationdboption @dbname = N'Pub', @optname = N'merge publish', @value = N'true'
GO
-- Adding the merge publication
use [Pub]
exec sp_addmergepublication @publication = N'Pub1',
@sync_mode = N'native', @retention = 14, @allow_push = N'true', @allow_pull = N'true', @allow_anonymous = N'false', @publication_compatibility_level = N'100RTM', @replicate_ddl = 1
GO
exec sp_addpublication_snapshot @publication = N'Pub1', @frequency_type = 1, @frequency_interval = 0, @frequency_relative_interval = 0,@publisher_security_mode = 1

use [Pub]
exec sp_addmergearticle @publication = N'Pub1',
@article = N'tbl_pub', @source_owner = N'dbo', @source_object = N'tbl_pub', @type = N'table', @pre_creation_cmd = N'drop', @schema_option = 0x000000010C034FD1, @identityrangemanagementoption = N'auto', @pub_identity_range = 1000, @identity_range = 100, @threshold = 80, @destination_owner = N'dbo', @force_reinit_subscription = 1, @column_tracking = N'false',  @allow_interactive_resolver = N'false'
GO

The important parameters here are @pub_identity_range = 1000, @identity_range = 100
@identity_range is what is allocated to the publisher and subscribers (i.e: this is the range that controls the values inserted on the publisher and subscribers as well).

@pub_identity_range : has a role when the subscribers also re-publish data because those republishing subscribers will synchronize data with their own subscribers and eventually will synchronize with the original publisher. It’s not a common practice but can happen , we won’t talk about it here.

Let’s create the subscriptions and run the snapshot:

use [Pub]
exec sp_addmergesubscription @publication = N'Pub1', @subscriber = N'MYINSTANCE',
@subscriber_db = N'sub1', @subscription_type = N'Push', @sync_type = N'Automatic',
@subscriber_type = N'Global', @subscription_priority = 75, @use_interactive_resolver = N'False'

Declare @instance nvarchar(1000)
Set @instance = @@servername;

exec sp_addmergepushsubscription_agent @publication = N'Pub1', @subscriber =@instance,
@subscriber_db = N'sub1', @subscriber_security_mode = 1, @publisher_security_mode = 1;
GO

exec sp_addmergesubscription @publication = N'Pub1', @subscriber = N'MYINSTANCE',
@subscriber_db = N'sub2', @subscription_type = N'Push', @sync_type = N'Automatic',
@subscriber_type = N'Global', @subscription_priority = 75, @use_interactive_resolver = N'False'

Declare @instance nvarchar(1000)
Set @instance = @@servername;

exec sp_addmergepushsubscription_agent @publication = N'Pub1', @subscriber =@instance,
@subscriber_db = N'sub2', @subscriber_security_mode = 1, @publisher_security_mode = 1;
GO

-- Start snapshot agent

Exec pub..sp_startpublication_snapshot 'pub1'
GO
Waitfor delay '00:00:15'
GO

Get snapshot agent output:

Exec sp_MSenum_replication_agents @type = 1
GO

dbname name status publisher publisher_db publication start_time time duration comments
distribution MYINSTANCE-Pub-Pub1-2 2 MYINSTANCE Pub Pub1 20140118 18:05:30.750 20140118 18:05:40.523 10 [100%] A snapshot of 1 article(s) was generated.

Start Merge agent for each subscription to push initial snapshot:
– job_name = —-

Exec msdb..sp_start_job @job_name='MYINSTANCE-Pub-Pub1-MYINSTANCE-3';
GO
Exec msdb..sp_start_job @job_name='MYINSTANCE-Pub-Pub1-MYINSTANCE-4';
GO

Waitfor delay '00:00:15'
-- Get output of the merge agent
use [master]
exec [distribution].sys.sp_replmonitorhelpmergesession @agent_name = N'MYINSTANCE-Pub-Pub1-MYINSTANCE-3', @hours = -1, @session_type = 1
GO
Session_id Status StartTime EndTime Duration UploadedCommands DownloadedCommands ErrorMessages PercentageDone LastMessage
3 2 1/18/2014 18:14:42 1/18/2014 18:14:54 13 0 0 0 100 Applied the snapshot and merged 0 data change(s) (0 insert(s),0 update(s), 0 delete(s), 0 conflict(s)).

This is where Merge replication metadata tracks the history allocated ranges:
http://technet.microsoft.com/en-us/library/ms182607.aspx

select
publication,
article,
subscriber,
subscriber_db,
range_begin,
range_end,
is_pub_range,
next_range_begin,
next_range_end
from
Distribution..MSmerge_identity_range_allocations
order by range_begin
GO
publication article subscriber subscriber_db range_begin range_end is_pub_range next_range_begin next_range_end
Pub1 tbl_pub MYINSTANCE Pub 1 101 0 101 201
Pub1 tbl_pub MYINSTANCE sub2 201 301 0 301 401
Pub1 tbl_pub MYINSTANCE sub1 401 501 0 501 601
Pub1 tbl_pub MYINSTANCE sub2 601 1601 1 1601 2601
Pub1 tbl_pub MYINSTANCE sub1 2601 3601 1 3601 4601

is_pub_range=1 is reserved for the subscribers in case they will re-publish the data.

Let’s check the schema on each table of the replication nodes:


select 'Publisher' [Node], name, definition from pub.sys.check_constraints where type = 'C' and parent_object_id =
(select object_id from pub.sys.objects where name ='tbl_pub')
union all
select 'Sub1', name, definition from Sub1.sys.check_constraints where type = 'C' and parent_object_id =
(select object_id from Sub1.sys.objects where name ='tbl_pub')
union all
select 'Sub2', name, definition from Sub2.sys.check_constraints where type = 'C' and parent_object_id =
(select object_id from Sub2.sys.objects where name ='tbl_pub')
GO
Node name definition
Publisher repl_identity_range_21A0E50F_D0F6_4E43_9FE1_349C1210247F ([col1]>=(1) AND [col1](101) AND [col1]<=(201))
Sub1 repl_identity_range_21A0E50F_D0F6_4E43_9FE1_349C1210247F ([col1]>(401) AND [col1](501) AND [col1]<=(601))
Sub2 repl_identity_range_21A0E50F_D0F6_4E43_9FE1_349C1210247F ([col1]>(201) AND [col1](301) AND [col1]<=(401))

Here we can see that each node has a primary and secondary range enforced through the constrainers.

Let’s add some data!


Insert into pub.dbo.tbl_pub (col2) select getdate();
Go 25
Insert into sub1.dbo.tbl_pub (col2) select getdate();
Go 25
Insert into sub2.dbo.tbl_pub (col2) select getdate();
Go 25

Start sync:


Exec msdb..sp_start_job @job_name='MYINSTANCE-Pub-Pub1-MYINSTANCE-3';
GO
Exec msdb..sp_start_job @job_name='MYINSTANCE-Pub-Pub1-MYINSTANCE-4';
GO

Count rows in each table, should be 75 (3 * 25 rows)
select (select count(*) from pub.dbo.tbl_pub),(select count(*) from sub1.dbo.tbl_pub), (select count(*) from sub2.dbo.tbl_pub)

75 — 75 — 75

Now, let’s see what will happen when we insert many rows that exceed the range. Let’s start with the publisher. We had inserted 25 rows and the primary + secondary ranges are 200.

Insert into sub1.dbo.tbl_pub (col2) select getdate();
Go 177

Let’s see the new range:

publication article subscriber subscriber_db range_begin range_end is_pub_range next_range_begin next_range_end
Pub1 tbl_pub MYINSTANCE Pub 4601 4701 0 4701 4801

How? Well , the largest range last allocated was 4601 (republishing range).

How “new range” was allocated

Under each table , there are 3 triggers corresponding for Insert, Update & Delete. The insert trigger ***On Publisher*** has an extra part to check for the identity range and use secondary range if Primary range is exhausted. If both primary and secondary range are exhausted, a new range is allocated.

Here’s what the trigger code looks here:


ALTER trigger [dbo].[MSmerge_ins_21A0E50FD0F64E439FE1349C1210247F] on [Pub].[dbo].[tbl_pub] for insert   as
    declare @is_mergeagent bit, @at_publisher bit, @retcode smallint

    set rowcount 0
    set transaction isolation level read committed

        select @is_mergeagent = convert(bit, sessionproperty('replication_agent'))
        select @at_publisher = 0
    if (select trigger_nestlevel()) = 1 and @is_mergeagent = 1
        return
    if is_member('db_owner') = 1
    begin
        -- select the range values from the MSmerge_identity_range table
        -- this can be hardcoded if performance is a problem
        declare @range_begin numeric(38,0)
        declare @range_end numeric(38,0)
        declare @next_range_begin numeric(38,0)
        declare @next_range_end numeric(38,0)

        select @range_begin = range_begin,
               @range_end = range_end,
               @next_range_begin = next_range_begin,
               @next_range_end = next_range_end
            from dbo.MSmerge_identity_range where artid='21A0E50F-D0F6-4E43-9FE1-349C1210247F' and subid='95208516-4B98-451D-B264-C1F27B20449A' and is_pub_range=0

        if @range_begin is not null and @range_end is not NULL and @next_range_begin is not null and @next_range_end is not NULL
        begin
            if IDENT_CURRENT('[dbo].[tbl_pub]') = @range_end
            begin
                DBCC CHECKIDENT ('[dbo].[tbl_pub]', RESEED, @next_range_begin) with no_infomsgs
            end
            else if IDENT_CURRENT('[dbo].[tbl_pub]') >= @next_range_end
            begin
                exec sys.sp_MSrefresh_publisher_idrange '[dbo].[tbl_pub]', '95208516-4B98-451D-B264-C1F27B20449A', '21A0E50F-D0F6-4E43-9FE1-349C1210247F', 2, 1
                if @@error<>0 or @retcode<>0
                    goto FAILURE
            end
        end
    end

--

The important part is this:

if @range_begin is not null and @range_end is not NULL and @next_range_begin is not null and @next_range_end is not NULL
        begin
            if IDENT_CURRENT('[dbo].[tbl_pub]') = @range_end
            begin
                DBCC CHECKIDENT ('[dbo].[tbl_pub]', RESEED, @next_range_begin) with no_infomsgs
            end
            else if IDENT_CURRENT('[dbo].[tbl_pub]') >= @next_range_end
            begin
                exec sys.sp_MSrefresh_publisher_idrange '[dbo].[tbl_pub]', '95208516-4B98-451D-B264-C1F27B20449A', '21A0E50F-D0F6-4E43-9FE1-349C1210247F', 2, 1
--

A very important note: Only members of Db_owner will be able to allocate a new range. This is a limitation because, in many instances, you want to limit the permissions of your connected users. If the user inserting the data is not a member of DB_OWNER, then merge agent has to run OR you need to manually allocate a new range using sp_adjustpublisheridentityrange http://msdn.microsoft.com/en-us/library/ms181527.aspx

What about subscribers? Should we expect a new range to be generated if we insert more rows than the allowed range? Let’s see

Insert into sub2.dbo.tbl_pub (col2) select getdate();
Go 175

use sub2
GO
Dbcc checkident (tbl_pub , noreseed)

--Checking identity information: current identity value '401', current column value '426'.

--We just reached the limit of the secondary range, any new insert will violate the constraint unless a new range is allocated

Insert into sub2.dbo.tbl_pub (col2) select getdate();
GO


Msg 548, Level 16, State 2, Line 1
The insert failed. It conflicted with an identity range check constraint in database ‘sub2′, replicated table ‘dbo.tbl_pub’, column ‘col1′. If the identity column is automatically managed by replication, update the range as follows: for the Publisher, execute sp_adjustpublisheridentityrange; for the Subscriber, run the Distribution Agent or the Merge Agent.

Why wasn’t this handled like the publisher? Simply put , it’s by design. New ranges are allocated for subscribers when they sync with the publisher. This is even reflected in the trigger code. The following part is missing from the trigger on the subscriber side:

exec sys.sp_MSrefresh_publisher_idrange ‘[dbo].[tbl_pub]‘, ’95208516-4B98-451D-B264-C1F27B20449A’, ’21A0E50F-D0F6-4E43-9FE1-349C1210247F’, 2, 1

Some people complained about this, but, no dice!

http://connect.microsoft.com/SQLServer/feedback/details/330476/identity-range-not-working-for-merge-replication-in-sql-server-2005



Conclusion

Here we learnt few points:

- Merge replication can handle identity ranges automatically.

- Starting with SQL 2005 , there are two range pools assigned to each publisher and subscriber.

- A republishing subscriber will have a “republishing” range to use for its own subscribers.

- The size of each pool can be configured.

Some recommendations here as well:

  1. You need to choose the datatype of your identity column carefully because you do not want the subscribers to run out of range. SMALLINT, for example, allows -2^15 (-32,768) to 2^15-1 (32,767) which is quite low for any production system while BIGINT is generous here with range of -2^63 (-9,223,372,036,854,775,808) to 2^63-1 (9,223,372,036,854,775,807), it consumes more storage though.
  2. When adding articles , choose a range that can cover your operations without the need to frequently assign new ones. If you choose a datatype like BIGINT then you can be a bit generous as a safe factor.
  3. Plan the subscribers sync before the subscribers consume all range values so that new ranges are assigned. You can monitor the ranges consumption and kick off sync manually.
  4. On publisher , if processes inserting data are not part of a db_owner, then you will need to run agent or manually run sp_adjustpublisheridentityrange

What about you folks? How do you manage identities on Merge replication? Did you face any problems?

Happy reading!

Leave a Reply

  • (will not be published)

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>