[an error occurred while processing this directive]

 


Exploring 2000 -
Replication is the real thing - March 2000
John Savill takes a look at inter- and intra-site replication
.

In a previous article (Masters without slaves) we looked at how changes are made between domain controllers in a single domain via Update Sequence Numbers and high-watermark vectors (and we will have a quick recap in the first section). In this article, I thought we would look deeper into replication and how domain controllers actually perform replication both within a site, between sites and between domains.

A quick recap


Unlike Windows NT 4.0 and earlier versions where all changes were made on a single primary domain controller and then replicated to other backup domain controllers for fault tolerant and load balancing reasons, Windows 2000 employs a multi-master replication mechanism. The concept of primary and backup domain controllers has gone in Windows 2000, you just have domain controllers some of which have extra operational roles (FSMOs, Flexible Single Master of Operations), one of which is a PDC role for use with older 4.0 BDC replication, which can operate in the domain, and also for password validation (more on this later).

Obviously with this more complex replication environment, in addition to changes being made at all domain controllers, it’s no longer relevant to just blindly copy any changes from one domain controller to all the other, as the same attribute may have changed multiple times on different domain controllers.

Unlike Exchange 5.5 replication where the entire object is replicated even if only a single attribute has changed, Windows 2000 only replicates the attribute changes not the entire object thus saving bandwidth and processing when committing the change. Change management is controlled using Update Sequence Numbers which is a 64-bit DWORD value and each domain controller increments its USN for each change it makes (original or replicated). This USN does not wrap around and once it reaches its maximum you will need to reinstall your domain controller; however, at 1,000 changes a minute it won’t wrap round for 17.5 million years, I’d hope by then the Microsoft Galactic Command will have addressed the problem.

At each replication cycle the replication partners (and it is these replication partnerships we will go into in more detail) exchange USNs and request any changes since the last replication. They store a table of USNs for all their replication partners and update with the partner’s new USN at each cycle to track the updates. For example, if server A has a USN in its table of 433 for server B, and, when replicating, server B tells A its USN is 436, server A knows it needs changes 434, 435 and 436.

Making changes


When a change is made, the domain controller waits a configurable interval – 5 minutes by default. The change is written to the domain controller’s local copy of the Active Directory, and a timer is started that determines when the domain controller’s replication partners should be notified of the change (the 5 minutes). When this interval elapses, the domain controller initiates a notification to each intra-site replication partner that it has changes that need to be propagated.

This 300 seconds (5 minutes) can be changed by making the following registry change on each domain controller:

  1. Start the registry editor (regedit.exe)
  2. Move HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NTDS\Parameters
  3. Double click on ‘Replicator notify pause after modify (secs)’
  4. Set to the number of seconds you wish for the pause. Click OK
  5. Close the registry editor
  6. Reboot

You will also notice another parameter, ‘Replicator notify pause between DSAs (secs)’, under the same registry key which determines the number of seconds to pause between notification of the Directory Service Agents. This parameter prevents simultaneous replies by the replication partners. If no changes are reported for a configurable period (6 hours by default) a replication sequence will be initiated to ensure no changes have been missed. Inter-site replications that will usually be connected over WAN links have much greater granularity in replication than under NT 4.0, which only had a few registry entries to control replication between the PDC and remote BDCs.Before we move onto how the machines actually perform the replication there is a concept of urgent replication that can be triggered by the SAM or the LSA (Local Security Authority) and is initiated for the following events:

  • Replicating a newly locked-out account (useful for when you have fired someone)
  • Changing an LSA secret (i.e. a trust account)
  • The RID Manager state changes

Any of the above causes a notification to be sent within the site triggering immediate replication. As it uses notification, this is only intra-site; however, you can modify site links to enable notification, as normally they will only be replicated as per the schedule. An exception to the multi-master normal replication is user passwords. As with any other attribute change, the password can be changed at any domain controller; however, the change is then pushed to the PDC FSMO role holder on a best attempt basis. Any other domain controllers receive the password through normal replication. The reason for this extra password work is that in the event of password validation failing, the domain controller validating will pass the request to the PDC FSMO in case the password has been changed and it has not yet received it via standard replication.

Protocols and Rings of Replication


The actual protocols used for the replication vary depending on if it is intra- or inter-site and the sort of data that is being carried. For intra-site replication RPC (Remote Procedure Call) is used over TCP/IP without any kind of compression. This is because domain controllers within a site are thought to be on a fast network (as per the definition of a site) and the extra processing required to compress/uncompress is undesirable. For inter-site replication a choice of RPC or SMTP (Simple Mail Transfer Protocol) using CDO v2 (Collaborative Data Objects) interface and the SMTP component in IIS 5.0 can be used. Compression is supported for both RPC and SMTP resulting in data 10-15% the size of the original! The compression for inter-site replication is important as sites may be connected by slow links.

There are some restrictions on the usage of SMTP however, it can be used to replicate the global catalog information, schema and configuration data, but it cannot replicate full domain name context data such as the data exchanged between domain controllers in the same domain. This is because some domain operations required the File Replication Server (FRS) such as the global policy which SMTP does not currently support.

Intra-site Ring


The Knowledge Consistency Checker (KCC) service that runs on each domain controller monitors the domain controllers within the domain and the site container in the enterprise configuration and automatically calculates the necessary replication topology required and creates/removes connection objects accordingly. This calculation happens every 15 minutes. You can view these links using the Active Directory Sites and Services MMC snap-in by expanding the site, expand the Servers container, expand the server and under the ‘NTDS Settings’ leaf are the created links.

Within a site the KCC creates a bi-directional ring topology connecting all domain controllers using their GUID (Globally Unique Identifier). Any new domain controller will automatically add themselves into the ring. There is an exception to this ring. In order to maintain reliability and performance there can never be more than 3 hops to get between any two domain controllers which means in some situations you may have multiple rings within a single site. The basic rule is if you have seven or more domain controllers, extra bi-directional connections will be added.

This gets more complicated when you have multiple domains within a site as there are actually two types of ring for any site:

  • Each naming context available in a site (each domain)
  • Schema and configuration information (this is shared between all domains so only one ring for each site)

In a site where there is only one naming context (one domain) then the two rings are actually the same. If you have more than one naming context within a site you will have multiple rings, for example, if you had two domains you would have three rings, two for the two naming contexts and one for the schema/configuration data, if you had 3 domains there would be four rings etc. Any manual configuration of intra-site replication should not be needed and is not recommended by Microsoft. The only task you may ever find yourself performing is to add extra connection objects to reduce the hop count between domain controllers.

Inter-site Replication


Where as the KCC automatically configures replication within a site, replication between sites has to be manually managed and links created using RPC or SMTP. Once you define the site links, schedules, cost factors and any site link bridges (if appropriate), the KCC can then create the connection objects providing the site links are transitive in nature. Unlike intra-site replication, inter-site replication does not use a ring topology but rather a spanning tree topology and as long as a replication route can be established between all sites in the enterprise forest, the replication tree is complete. The actual links between sites are created manually by the Administrator and involve defining costs with each link (the cost relates to the speed and/or reliability of the network) and a schedule of when replication can occur.

Site links are created and maintained using the Sites and Services MMC snap-in and by default you will have one site link, DEFAULTIPSITELINK of which your original site will be part and further sites can be added during their creation (sites have to be part of a site link when created). A site can be a member of multiple site links and can be added/removed by right clicking on a site link and just selecting the site and clicking Add.

It’s also possible to define bridgehead servers which are the contact point for exchange of directory information between sites. This is useful if you have a firewall and wish to ensure inter-site traffic is directed through a particular proxy server. You should ensure that the preferred bridgehead servers have sufficient bandwidth to transmit and receive information as it may become a bottleneck for the replication if it lack resources. To nominate a server as a bridgehead server perform the following:

  1. Start the Sites and Services MMC snap-in (Start – Programs – Administrative Tools – Active Directory Sites and Services)
  2. Expand the Sites – specific site – Servers
  3. Right click on the server and select Properties
  4. Select the protocol it is to become a bridgehead server for and click Add
  5. Click OK

Transitive sites

We mentioned that if you leave the site links as transitive, much of the work is done for you and this maximises available connections between sites; however, this can cause problems in some environments and you can choose to remove this transitivity and instead manually create site link bridges that can communicate using a common transport. The first task is to disable the site link transitiveness:

  1. Start the Sites and Services MMC snap-in (Start – Programs – Administrative Tools – Active Directory Sites and Services)
  2. Expand the sites branch
  3. Expand ‘Inter-Site Transports’
  4. Right click on the relevant transport (IP or SMTP) and select Properties
  5. Unselect ‘Bridge all site links’ and click OK

Now the sites are not all bridged by default, you can manually group the site links by creating a site link bridge (you must disable the bridge all site links first or this will have no effect).

  1. Start the Sites and Services MMC snap-in (Start – Programs – Administrative Tools – Active Directory Sites and Services)
  2. Expand the sites branch
  3. Expand ‘Inter-Site Transports’
  4. Select the protocol, e.g. IP
  5. Right click on the protocol and select ‘New Site Link Bridge’
  6. Select the site links to be a part of the bridge (at least two) and enter a name
  7. Click OK

It’s important to understand why these bridges are significant. Suppose we had three sites; A, B and C, and we have two site links defined, AtoB and BtoC. If the site links are not transitive, A and C have no method to communicate; so by linking them into a site link bridge they can now communicate via B.

You could, if you wish, turn off the KCC for Inter-Site replication totally and manually define every single connection object; however, I would recommend against it! For most sites you can just leave the site link transitivity turned on and you are done without any other configuration needed. If you are really interested in the replication there are a couple of tools that are useful, one of which is repadmin.exe and replmon.exe. Repadmin.exe is a command line tool which enables replication consistency to be checked for a KCC recalculation etc. A good switch is /showreps which displays a list of replication partners. The invocation ID is the database GUID and will also show reason for problems.

Replmon.exe is a GUI tool used to display and monitor replication status and is useful to use in conjunction with repadmin.exe. There is much more to replication and we did not touch on the specifics of the Global Catalog replication between sites, but hopefully you have an idea of why we love the KCC and why ignorance is bliss.