IIS Server Troubleshooting

Issue:

There has been a time when a production IIS server becomes unreachable via RDP, WinRM, and RPC. Here’s the story of that scenario:

1. List Dynamic Ports
SHELL> netsh int ipv4 show dynamicport tcp
Protocol tcp Dynamic Port Range
---------------------------------
Start Port      : 1025
Number of Ports : 64511

2. Show existing connections
SHELL> netstat | findstr -i "ESTABLISHED LISTEN CLOSE_WAIT TIME_WAIT"
  TCP    127.0.0.1:80         192.168.2.250:4845       TIME_WAIT
  TCP    127.0.0.1:80         192.168.2.250:29244      TIME_WAIT
  TCP    127.0.0.1:80         192.168.2.250:31519      TIME_WAIT
  TCP    127.0.0.1:80         192.168.2.250:39922      TIME_WAIT
  TCP    127.0.0.1:80         192.168.2.250:55248      TIME_WAIT
  TCP    127.0.0.1:80         192.168.2.250:63718      TIME_WAIT
  TCP    127.0.0.1:42708      EAFBL:http             ESTABLISHED
  TCP    127.0.0.1:42709      EAFBL:http             ESTABLISHED
  TCP    127.0.0.1:43974      EAFSQL:ms-sql-s        ESTABLISHED
  TCP    127.0.0.1:47371      edwin1:http            ESTABLISHED
###################### Trucated for brevity #####################
  TCP    127.0.0.1:8402         CONTRA001:42673       ESTABLISHED
  TCP    127.0.0.1:42652        CONTRA001:42653       ESTABLISHED
  TCP    127.0.0.1:42653        CONTRA001:42652       ESTABLISHED
  TCP    127.0.0.1:42673        CONTRA001:8402        ESTABLISHED
  TCP    127.0.0.1:47001        CONTRA001:53423       ESTABLISHED
  TCP    127.0.0.1:53423        CONTRA001:47001       ESTABLISHED

3. Find PID of offending process(es) running on such ports
SHELL> netstat -aon | findstr :808

4. Find Process Name of PID
SHELL> tasklist /fi "pid eq 3008"
Image Name                     PID Session Name        Session#    Mem Usage
========================= ======== ================ =========== ============
w3wp.exe                      3008 Services                   0  1,400,800 K

5. Checking the Event Logs and Addressing various errors/warnings:

Event ID 56
1. Disable IPv6
2.Disable all SNP Features:
netsh int tcp set global chimney=disabled
netsh int tcp set global rss=disabled
netsh int tcp set global taskoffload=disabled
netsh int tcp set global autotuninglevel=disabled
netsh int tcp set global congestionprovider=none
netsh int tcp set global ecncapability=disabled
netsh int tcp set global timestamps=disabled
3. Disable IPv4 Large Send Offload, Checksum Offload, and TCP Connection Offload

Event ID 4427
1. The location of the TcpTimeWaitDelay is:
HKEY_LOCAL-MACHINE\System\CurrentControlSet\Services\Tcpip\Parameters. Add REG_DWORD named TcpTimedWaitDelay, we may set the value to 30 seconds, by default, the value is 4 minutes.
Here is the detailed information about TcpTimedWaitDelay:
https://learn.microsoft.com/en-us/previous-versions/windows/it-pro/windows-2000-server/cc938217(v=technet.10)?redirectedfrom=MSDN
2. Then we may use command netsh int ipv4 set dynamicport tcp start=10000 num=20000 to expand dynamic port range.

Event ID 5719
System can't find DC
Configure the Netlogon registry setting to a value that is safely beyond the time that is required allow DC connectivity. Please note this is only effective if the machine already has an IP address. This applies to scenarios where a NAP solution puts the machine into a quarantine network. Use the following settings as guidelines
Registry subkey: HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Netlogon\Parameters
Value Name: ExpectedDialupDelay
Data Type: REG_DWORD
Data Value is in seconds (default=0)
Data Range is between 0 and 600 seconds (10 minutes)

Event ID 36871 - nothing to do
FSMO must be accessible from this server

A suggestion is to set the IIS ApplicationPoolIdentity ‘s “Maximum Worker Processes” to = 1. Here are some source documentation: https://learn.microsoft.com/en-us/iis/configuration/system.applicationhost/applicationpools/add/cpu and https://serverfault.com/questions/563140/the-relationship-between-iis-application-pool-maximum-worker-processes-and-compo

Resolution:

From the System Admin / Dev Ops perspective, this issue is very illusive. The only clue we have with this scenario, which has not been mentioned previously, is that some new code has been deployed on the Application. If I were to guess, this may have to do with memory management of the SQL connections being overflown. There are tools that can validate that theory. However, it’s sometimes not appropriate to derive at a root cause and trigger opening “a can of worms” with regards to highlighting a developer’s possible mistake. We all do make mistakes at various occasions. Hence, the resolution here is to bypass any further discovery to proceed with a code roll-back plan. Another has been saved, and all will be forgiven as well as forgotten.

Categories:

Codes

Issue:

Resolution:

Leave a Reply Cancel reply

Search

Archives

Meta

IIS Server Troubleshooting

Issue:

Resolution:

Leave a Reply Cancel reply

Search

Archives

Meta

Tag Cloud