DFS, FRS, oh my!

Well, the upgrades continue. Now, I am in the process of moving all of our DFS (Distributed File System) shares from our existing Windows Server 2003 Enterprise Edition (WS2003EE), to the new Windows Server 2003 R2 Enterprise Edition (WS2003R2EE). Here are some of the fun things I have found out while doing this.

First of all, DFS is Distributed Files System, which allows you to have shared documents on servers, without having to remember the name of the server it is on. This is very handy when you decide to replace a file server, because the new share location will simply use the same DFS name, no matter what the server name is. Here is an example:

You have a server named FileServer1 that shares a folder called Home, that is the My Documents for all of your users. A couple of years have passed and you want to replace FileServer1 with StorageServer. Without DFS, that folder, and each person’s folder in it, would have been at:

\\FileServer1\Home\{person’s name}

You would have to change the logon scripts for users, and run around and change any shortcuts people have created (not to mention shortcuts in documents) to point to the new location of:

\\StorageServer\Home\{person’s name}

With DFS, you can setup the name \\MyDomain\Files\Home, and then setup \\FileServer1\Home as a target for it. Then, you would use:

\\MyDomain\Files\Home\{person’s name}

for their files. Then, when you bring in the new server, you simply add it as a target for the same root name, and you don’t have to change any shortcut names, or any logon scripts for anyone.

When you add a new server as a target for a name, you have to somehow copy all of the files to the new server. This can either be done manually using RoboCopy, or automatically using File Replication Server (FRS). FRS is also used after the files are copied to the server for the first time to keep copies on more than one server up to date.

Now that we have talked about the pie-in-the-sky way that DFS/FRS works, lets talk about the cold hard reality.

  1. DFS only works for file shares, it can not be used for printers. So, if you have a printer server setup to handle all of the printers for the department, then when you change servers, you are going to have to change the settings on every machine. You might be able to do some of that via VBS scripting or batch files, but then you get into a messy area of “what about that print queue for just that one manager” or how to implement the fix so that everyone gets it, without their machine taking far too long to startup on each restart.
  2. There are many handy tools that come with the DFS MMC snap-in. However, these are only for WS2003R2EE. If one of the machines is older, then those glorious tools are useless for you. Those tools would help you fine-tune replication properties and even view replication status. But, alas, if you are not on a R2 setup only, then they don’t do diddly for you. You are relegated to command line tools, editing registry entries, and the only replication status program I could find at MicroSoft.com, Sonar. It is about as user-friendly as jabbing ice-picks in your eyes.

    I mentioned that you will be editing registry entries, I meant that. There are many things you need to do that only can be done from the registry. After which, you have to either restart the File Replication Service service itself, or just reboot the machine.

  3. Making changes to DFS is insane. Adding a target to an existing share? Removing a target from a share? Testing with some fake shares to see the performance of FRS? Well, all of these will work to drive you insane.

Adding a new server (target) to an existing share (root):

Logically, you would think that adding a target to a root share would be a matter of a few steps. 1) Open the DFS console, select the group and then the specific share. 2) Add a new target to the share. 3) When prompted, setup replication according to the screens given you. 4) Verify the new target is enabled. Hopefully, that would be it. You job is finished, and you can go have a beer.

However if you did this, then what would happen is that people would connect to the share and see either: A) nothing, no files, no folders, just nothing. B) lots of things missing. Why would this happen? Well, if a new target is enabled, then users can, and will, be sent to it. Even before replication has finished. They will then see only the things that have already replicated to the new target.

When this happens, you have to open the DFS console, and disable the new target. This will still allow replication, but not send people to the new target for the files. Then, everyone who is already logged in will have to log out and back in, or just reboot their machines.

Another problem you will have is that any file over ~500MB will not replicate, and replication will be awfully slow for everything else. This is because the default staging space for incoming and outgoing files is only 660MB, and FRS starts to bitch and complain, and stop replication, when that is 60% full. Also, the priority for FRS defaults to low, or as I like to put it, “a European Swallow could carry the data on floppy discs faster” priority. The staging size is changed only in the registry, the priority is changed in what seems to be the super-secret menus.

Changing the size assigned to staging for FRS is done in the registry key:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NtFrs\
Parameters\Staging Space Limit in KB

Yes, ‘Staging Space Limit in KB’ is the name of the actual Registry entry. No, you can not enter the size in MB or GB, just KB. And remember, 1024KB = 1MB, 1024MB = 1GB, or 1048576KB = 1GB. Don’t forget the fact that staging begins to die out at 60% full. So find the largest file in the DFS share, and double that for the staging size. After changing that registry entry, you will need to restart the File Replication Service service, or reboot the server. You will need to edit this on all servers.

The steps to set the replication priority higher than European Swallow slow, requires navigating the myriad windows for the properties for the DFS console. Remember, all of the information you can easily find today is for the R2 release of Win Server 2003, not older versions.

  1. Open the DFS console, expand the group you want to edit, and then right-click on the share and choose Properties.
  2. From the properties window, click on the Replication tab, and then click the Customize button next to the topography type.
  3. This opens the topography window. From here, select one of the servers from the Connections area and click on the Priorities button.
  4. This opens yet another window, which shows a list of all of the other servers that connect to the server you just chose. Now you select a server, select a priority from the drop-down box, and then click Change. You must click Change, OK will not work. You will need to make this change for every server listed.
  5. Repeat step 4 for each server as the destination.

Once replication is complete, you can then enable the new target. You can either leave both enabled, or disable the old one after enabling the new one. However, if people map drives to these targets, it is best to have them reboot after disabling a target.

Removing an old server (target) from a share (root):

Once again, I like to look at how it would work, logically. 1) Open the DFS console, select the group and share, and then select the old server to remove. 2) Disable the target (old server). 3) Delete the target. If there were replications still waiting, or in process, then DFS should alert me, and allow me to not delete the target at that time. If so, then I would be able to try again later. When the old server was finally deleted, I could begin cleaning it up, removing files, or generally preparing it for removal.

Well, if I did that, I would quickly find files missing from the new server. Why, because even though DFS was turned off, and this is supposed to turn off File Replication, it doesn’t really. I don’t know how long it takes for that delete to propagate through the system, but 30 minutes is not enough. If you go to the old server and start deleting things, the new server will receive those delete commands through FRS and start deleting them. So, to fix this you have to either restart the File Replication Service service, or reboot the new server.

Testing with temporary shares:

If you decide to test how DFS and FRS would work on the new servers, you are going to have a number of headaches in store. Whenever you remove a target for a server, the registry entries for that target are not really removed. They still exist, and can cause worlds of problems with FRS, and even cause FRS to fail out. Yep, non-enabled targets that shouldn’t be replicating will cause problems with the replication service.

So, to actually remove something from File Replication Service, you have to dive back into the registry. There are two types of locations you need to work with. First is the Replica Sets, which is the information about what folders are replicated, replica name and type. These are under:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NtFrs\
Parameters\Replica Sets\

Each set with have it’s own unique Hex value for the key name. Then the rest of the information is under sub-keys for each one. The other type of location is the Cumulative Replica Sets. This seems to be information about how many replica targets there are, and therefore if replication should be started for that share. It is found at:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NtFrs\
Parameters\Cumulative Replica Sets\

It uses the same Hex value for each key name. You will need to delete the entries for each unused DFS share from each of these locations. You will, of course, need to stop the FRS service, and make sure that not only the current control set is changed, but any other ControlSet### sets.

Have fun with it.

A new(er) post about DFS can be found in my DFS Questions and Answers.

Advertisements

12 Responses

  1. Great post about DFS and FRS. I knew about the staging area limits, but I didn’t realize that would restrict large files from replicating. I will have to up my limits as you detail.

    Important question: Do you know if each replicated folder has to have a different staging area? Or can more than one replicated folder share the same staging area, e.g. C:\Frs-staging?

  2. Willie,

    I have done some looking around, and non-R2 2003, uses a single staging folder and a single staging pool size. This is the
    C:\Frs-Staging
    folder, and you need to consider all of your Targets when deciding what size to make the staging area.

    With 2003R2, when you use the newer DFS Management tool to setup DFS, then each Target uses
    TargetFolder\DfsrPrivate\Staging
    as the staging area for that Target. Also, each target has it’s own staging area size parameter, so you would need to setup each one individually.

  3. Good Post on DFS. Another thing that happens unexpected is that when you unshare a target,
    file replication still occurs.
    One question though, what happens if a target server
    is unavailable or crashes ? will that cause the
    other targets to delete their files ? If the crashed server
    is brought online later on, and I recreate the same folder (empty) and same share name target , will that cause other targets to delete their files ?

  4. Anand,

    There are two possible scenarios if a target disappears… the first situation is where an actual target, not the root, disappears. Then DFS will just wait until the target shows back up, and start replicating to it. If you build, or re-build, the server and have to add it as a target again, then DFS sees it as a new target, and starts replicating to it.

    However, if the root server disappears, then there can be problems. Especially if you have to reload the root server from a backup, or try to rebuild it. If you reload it from backup, then copy the data on the target server(s) to another location before bringing the re-built server online. When it comes back up, DFS “should” see that the data on the targets are newer than the data on the root, and start replicating the newer files to the root. Then again, it just might think the root has newer data, and blow away all of the updated data on the targets.

    If you have to build the root server from scratch, then it will not be the original root server. In that case, you would have to copy your data to a safe location, and then try to make the rebuilt server at the new root. Hopefully, DFS will replicate the data from the targets to the root, but you may have to copy the data from your safe location to the root.

    I have not had to deal with a full server failure for a root target, so I can not say with absolute certainty, but my experience with DFS leads me to believe that backups are my friend, and I will not assume DFS will do anything by try to screw me over any chance it can get.

  5. By root server do you mean root target ? If so, then I have 4 root targets for redundancy.

    One other question, where is the new DFS mgmt tool in R2 ? I am unable to find it.

  6. Anand,

    The root target is the “original” server. The one that had the data on it first, and then the other servers were added to the list.

    No matter how many servers are added, only one server is the root server. That is the one that is considered the final authority on data. While DFS is designed to allow you to add a file onto one of the target servers, and that file be replicated to all of the servers, there are times that does not work correctly.

    In the scenario I was commenting on, if you have a catastrophic failure of the root server, and it is rebuilt from a backup, then before bringing it back online to the network, copy the data from one of the target servers. Because, when the root comes back up, and it has older files (from the backup) and for some reason DFS decides that the root is correct on this item, newer files can be overwritten or deleted from the other target servers.

    In 2003EER2, you can find the ‘new” DFS tool at START->Administrative Tools->DFS Management. If that is not there, then the actual items is at:
    %SystemRoot%\system32\dfsmgmt.msc

  7. Hi, just a question.
    How can we prevent a DFS link and FRS Service from stopping. I get event ID 13552 and 13544 error.
    I’ve read many articles and can’t seem to figure out how to solve the problem.
    Any advice is deeply appreciated.
    Thank you.

  8. sysadwannabe,

    Here is what I have found out about those error messages.

    The error 13552 is usually stating that FRS could not start, while the error 13544 states that DFS is having an “overlap” problem with your shares.

    FRS not starting can happen for a couple of reason. Either because the necessary folders do not exist or because DFS bombed out, and FRS is just following along with it. Most likely, FRS is not starting because of the DFS problem in 13544.

    Now, the overlap error is one that happens when you have a folder that is already shared by a higher level folder, which is trying to be a DFS share folder also.

    Here is an example for you:

    Lets say you have the following folder structure on your D drive…
    D:\Shared\
    D:\Shared\Groups
    D:\Shared\Homes

    Now, you have shared the “Shared” folder, so everyone can get to both the Groups and Homes folders. You have set security permissions on the folders under Groups and Homes to keep people out of places they shouldn’t be.

    You decide that you would like everyone to be able to go to:
    \\Our Domain\Homes
    to get to the Homes folder. So, you share the Homes folder itself, go to the other server (your target server) and setup the same folder and sharing layout on it’s D drive. You then create your DFS namespace and setup replication for them… and things don’t work.

    The problem is simply because you are now sharing the Homes folder directly, but it was already shared by way of the Shared folder. This is the Overlap that error 13544 dies over.

    While there are many ways to fix this, they all depend upon how you are setup. Here is how I would handle the situation. This assumes that you want to share Shared and Homes both under their own names in DFS.

    First, remove any overlap DFS Namespace items. Then, make sure the Namespace, folder targets, and replication is setup for the Shared folder. Next, setup a new Namespace and folder targets for the Homes folder, but DO NOT ENABLE REPLICATION FOR THEM. Why? Well, the Homes folder is going to be replicated by way of the Shared folder already, so you don’t even need to try and replicate it again.

    This should directly fix the 13544 DFS error, which should indirectly fix the 13552 FRS problem.

  9. I just got through an entire day of setting up DFS and DFS Replication between two Win 2003 R2 SP2 Ent. Servers.

    Works in theory, fucked up in the real world. I did testing such as disabling the nic and watching the fail over. Adding files to the DFS NameSpace lInk and watching those actually show up on the last standing server. Restoring the disabled nic secondary DFS system resulted in, guess what, JACK SHIT. All the new files that were made in the DFS and last standing server are present, yet I am still waiting for the files created during the test outage to replicate.

    New files are still replicating between the servers fine.

    So this is to prevent outages and keep to folders identical. What a piece of crap, fails miserably at its sole purpose.

  10. venom,

    Just remember that you have to enable replication between each server IN EACH DIRECTION. If you setup replication from your original server to the new one, and did not setup replication the other way, or at least replication faster that European Swallow slow, then you would either see no replication back from the new server to the original, or the replication would be so slow as to seem nonexistent.

    Go back and check your replication between each of the servers. If you follow the steps under “Adding a New Server to an Existing Share,” just make sure you check all directions.

  11. Hi,
    I am having DFS problem in Windows 2000 domain controllers. We have set up DFS domain share between 2 domain controller but it stopped working now. I had to do the non-authoritative restore on the secondary DFS share, then it started replicating. Since our secondary DFS share was over the slow link at different site, it was going to take 8 hours or so. But since these are old servers, we had to setup the staging on the different volume, so I changed the staging path as shown in http://support.microsoft.com/kb/291823/ . This was mandatory for our setup. We changed the staging folder path in registry and in AD on both servers, but after some time it reverted back to the old path and it was no longer replicating. We changed it back to the correct staging path but still it is not replicating. I am going to try to run some fixes tomorrow and troubleshoot a little bit. If any of you guys have any input on this then that will be great.
    Thanks,
    Sudeep

  12. Sudeep,

    First, a couple of questions. Did you create an “Authoritative Restore Point” before making the Staging Path changes? Also, after the change was made, did you see a log entry with Event ID 13563? If so, there were steps in the page you linked to that you have to follow; basically they are:
    – set explorer to show hidden files and folders
    – from command prompt, run ‘net stop ntfrs’
    – Go to the original staging path and copy the files there to the new path
    – from the comand prompt, run ‘net start ntfrs’

    Also, remember that the changes are made in both the Registry and in the AD system using Adsiedit.

    Finally, don’t forget to look in your DFS Replication Event Viewer for Event ID 4208, which means your staging is running out of space. This can be because the stating space defaults to 660MB, and after you account for the space it wants to keep open all of the time, any file over 500MB will not replicate, but will simply clog up replication.

    Good luck.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: