Tuesday, April 16, 2013

Netlogon 5719 at startup

This issue was a real booger and I almost threw in the flag and called in the big guns.

This error has been around for awhile.  There is a lot of information out there on it and a LOT of reasons it can occur.

I've actually run across this now twice.  Once in my server VM environment and also on new desktops.  It is possible / likely they are related due to the use of switches being simular / same.


In this post I'm focusing on the Virtual Environment issue.

I first discovered the issue when building an Exchange 2010 server and finding that the services where not starting automatically on boot.  This led me to find the Netlogon 5719.  After a review of the events it was obvious that this service was attempting and failing to start before the network was connected.

After find this: http://support.microsoft.com/kb/938449 I tried some of the suggestions with no help. Note this setup was with ESXi 5.1 going back to HP ProCurve switches (2810's).  STP was off on the switches.  Also, connected to the same switches is a XenServer environment and a few physical servers which do not see the issue.

Some of the different posts and KB's I found suggested that this isn't an issue and can safely be ignored as long as you can reach the DC to login.  After the set timeperiod Group Policy will apply.  Unfortunately this is NOT a solution nor a good workaround (for desktops, servers, anything).  This causes lots of issues in a domain environment especially where folder redirection, logon scripts, etc.  The proper fix is to be able to get the NIC to initialize before netlogon OR for MS to provide a method for admins to reliably force netlogon to wait for the NIC.

After messing around for awhile I discovered that this only occurs if the NIC is set to static IP.  When set to DHCP all works as expected.

So, at this point we could do DHCP reservations to make it work, BUT this isn't a solution for DC's or DHCP servers, and sometimes a static address is necessary or easier.

After finding a thread on VMWare communities that was exactly my issue it was suggested to try changing the ArpRetryCount.
http://communities.vmware.com/thread/316237?start=15&tstart=0

Bingo!

This could indicate a deeper network issue or possibly a flaw in logic as to when netlogon service should attempt to start.


Note: I also commonly see an issue very simular to this on workstations with SSD's (some differences, occurs when set to DHCP but not static, etc).  In these cases changing the ArpRetryCount does not help although I did find that it is heavily dependent on the type of switch that the workstation is plugged into.  For instance, the issue occurs when plugged into HP ProCurve switches, but does not occur when plugged into cheapo Linksys / Cisco switches.  This likely indicates configuration issue with HP ProCurve (although, many report same or simular issues with enterprise Cisco switches).  It may also be caused by the type of NIC / driver on the system (ie Realtek driver issue).  I have not been able to dig into this issue in great detail yet.

1 comment:

  1. WE are having similar issues Aaron, thank you for reinforcing our hypothesis.

    ReplyDelete