Clients and/or servers are not communicating with the Notification server, and are not showing active in the NS Console. How do we troubleshoot this?
The main problems I have with Task Server involves configuration, and remarkably, most of the communication issues are related to this. So, for this discussion / KB, we'll talk about the configuration from the top (NS) down. At each component, we'll also discuss some simple things to look for or check for.
For the sake of troubleshooting and using this document, I will refer to the following location as the TS Root: In the NS 6.0 console, Configuration\Notification Server Infrastructure\Task Server.
To troubleshoot, configure, and do just about anything except send tasks, you should use the NS 6.0 console. Most people using TS have the 6.5 console installed, and are likely using it. It's organization is nice, but I've found that the 6.0 console has everything in one place, and for troubleshooting, it's simply easier.
1) Install, Organize, and Configure the Task Servers
First, recognize that the NS should not manage more than about 500 TS clients. If that much. I would generally recommend 0 if possible, though it's completely reasonable to support a small number if that fits your environment. Remember that each TS client is, by default, communicating with it's TS every 5 minutes for basic updates, and sending updates during an install ever 10 seconds. So, for 4000 clients:
5x60 = 300 4000/300 = 13 client updates per second. If you are pushing a package to all of them, you increase that 30 fold.
The net translation is that I've had network adminstrators literally choke off the Notification Server, and that was the NS's that were still functioning.
Second, using the same logic as above, be sure to configure enough Task Servers for your envirionment. In general, people have utilized their Package Servers for this. This is a "default" in NS7, but in NS6 this has to be chosen.
There are 4 requirements to be met to get a task server installed:
- The server(s) have to be in the Approved Task Servers collection (In the TS Root)
- The server(s) have to be in the Task Servers collection under TS Root\Task Server Rollout
- The Task Server Install policy must be enabled under TS Root\Task Server Rollout.
- The Task Server in question must have IIS installed and using .NET 1.4333 (this isn't actually required in version 6, as we included a free-version of HTTP, but it is highly recommended, since the free version is impossible to troubleshoot - literally.) TS communication requires HTTP, so any TS must have a web service, and we only support IIS, or the free version we don't recommend using.
Meeting these three requirements will place the Task Server agent on the selected client systems. NOTE: Your NS will automatically get the server agent and does not need to be added to these collections.
Third, the task servers need to be organized under the Task Servers policy page. This is a highly important step that is often skipped by users who do not understand the product.
Technically, you can skip this step and TS will run. It just wont run well. Without configuring this page, your clients will "pick" which TS to report to, generally resorting to the one that answers a ping request fastest, and you'll get cross-country communication. It happens all the time, because the auto-selection feature does not work reliably. In theory this process should select the closest server based on subnet and ping results, but what we've found is, in fact, the opposite. The other unfortunate feature of the TS clients is that they remember their "chosen" configuration forever. Translation: Once they choose a server, they don't choose again. So, if on Friday, when the client was pushed, the local server was busy, but a remote server was available because they had shut down for the day from another time zone, all the clients will connect to that one, and REMAIN connected to that one into perpetuity.
Unless you configure this policy page. So it's a good idea to simply do this, out of the gate.
NOTE: There is no way to use Site Maintenance for this step in NS6. NS7 is integrated with Sites, but not NS6. The best way is to create a collection or collections for each location. Add the collections into this page using the yellow asterisk *. Once added, highlight each collection, click the blue + and add at least one task server to each. A task server may be assigned to more than one collection. Remember to include your Notification Server as one of these, IF you intend on using it. Many will NOT include their notification server, simply to offload all of this traffic completely.
If you have done this correctly, each collection will show an active (blue) computer(s) under it, indicating the associate Task Server is connected and active. In this application alone we have similarities with DS in that you can see the active and inactive state of systems in near real-time. If not, see step 3 below.
NOTE: There is a bug (at least there was) on this page where no scroll bar was included. After adding several collections (around 20 or 25) the screen will fill, and you'll have to arrow up and down to get through them. I never see the bug because I don't have that many collections to populate it with.
2) Install the client agents
Once you have the servers configured, you're ready to roll out the clients. This should not be done prior to the servers, because the clients remember their connections and must reboot to get the "new" connections from the NS.
Technically, not all of the TS agents need to be installed, but for general use, they should all be selected. Only NOT select a client installation if the customer knows for a fact they will not be using it. The problem is that the various task types don't always tell you which version must be installed, and troubleshooting that later can be a pain. So, enable the following policies to their default collection:
- Client Task Agent Rollout\Client Task Agent Install
- Client Task Agent Rollout\Client Task Agent Upgrade
- Power Management Task Agent rollout\Power Management Task Agent Install
- Power Management Task Agent rollout\Power Management Task Agent Upgrade
- Script Task Agent Rollout\Script Task Agent Install
- Script Task Agent Rollout\Script Task Agent Upgrade
- Service Control Task Agent Rollout\Serivce Control Task agent Install
- Service Control Task Agent Rollout\Serivce Control Task agent Upgrade
- Software Delivery Agent for Task Server Rollout\Software Delivery Agent for Task Server Install
- Software Delivery Agent for Task Server Rollout\Software Delivery Agent for Task Server Upgrade
If you have these agents installed, and if you have previously configured the servers in step 1 above, then when you highlight a server, several agents will show as reporting to it, and these agents will be the "correct" agents for that server (not from another collection/location). If not, see steps 3 and 4 below.
3. Troubleshooting Server Connectivity/Installation issues
The very first thing to check is the process described in step 1 above. About 50% of troubleshooting comes down to the fact that something was missed in the original configuration. For instance, forgetting to put a server in both the Task Servers and approved task servers
collection, or failing to configure the Task Servers policy page, or even forgetting to enable the Task Server agent. Methodically check through the configuration to see how things are built first, then move on to actual troubleshooting.
If you are confident things on the NS look good, the following are things to look at, and not necessarily in this order. (Generally, I'll launch Computer Management, so I can manipulate services, check logs, and look at IIS in a single window)
- Check the logs (log viewer) for anything obvious. Generally you wont see things here, but if you see repeated errors, it may be indicative of a problem. Most of the things you'd see here though will be client TS based, not Task Servers based.
- Verify that the NS Server Services are running. Task Servers connect directly over a port, and if the Altiris Service is not running, it will fail.
- Connect to and check the TS to see if the TS agent is installed. If not, verify it's getting policies regularly, and verify it's getting policy updates. If not, then you have agent connectivity issues, not TS issues.
- Check the TS to see if the TS Agent package has been received. If so, verify it has been downloaded and attempted to run. If not downloaded, you have Package issues, or connectivity issues that need to be looked at.
- If the TS agent package has been received, try running it manually from the agent. Sometimes running the install this way will resolve an issue because it simply "failed" the first time.
- Verify that the system has IIS installed, and that the .NET version registered is 1.4333. We don't support 2.0 or 3.5 at this time with this version. If IIS is using a number greater than 1.4333, the agent wont run correctly.
- If they are using the Altiris HTTP version, try to get them on to IIS and pull the agent and reinstall it under IIS instead. The Altiris HTTP option is convenient, but there is literally no troubleshooting methods available. It's either working, or it's not, but there's no way to fix it.
- Check the System and Application logs for IIS errors. WSW3 errors can indicate issues with IIS.
- Try restarting IIS (cmd window, run IISReset) to see if that will free up resources. You may even try reinstalling the agent after an IIS Reset.
- Verify that the IIS TS Server web component is present in IIS. Launch IIS, browse to the Default Web Site\altiris\Client Task\Server. If not, then the agent didn't complete it's install at least, or it never ran at all. This will prevent client communications with the TS.
- Verify the location of the Server web component. This is one of the more common issues I'll find with failed installations. Often the virtual folder for this will be pointing to a "bad" location, often indicating a previous installation that was moved, or something like that.
- Look for the Altiris Client Task Dataloader service. If not installed, or not started, this will prevent the Task Server from checking into the NS. If missing, reinstall the agent to see if it will get installed. If that fails, consider removing the agent and reinstalling it.
- Reinstall the Client TS Server agent. Generally, I find this easiest to do by manually adding it to the Uninstallation policy, and manually excluding it from the installation policy, forcing an update on the client, watching it pull all the processes, rebooting, then switching everything back, forcing an update again, and watching the agent install again.
- Check the agent logs on the TS to see if it tells you anything.
- Check communications between the TS and the server, especially if the client appears to be installed correctly but it is simply failing to check in. WINS caused one issue where the TS was looking for a partial FQDN of the server and we had to add a host entry to resolve the issue. See KB 47943 for more information on this specific issue. However, note that we do need communication to the server to be solid, and you may have to add entries to the HOST file for the FQDN or Server Name to make this work.
- Finally, make sure all of this works together. For instance, on one system, we got IIS configured, started the service... and IIS crashed. I don't recall off-hand what fixed it (I think it was a .NET 2.0 registration) but the point is to make sure that when you've fixed one thing you didn't break another.
Once you have the server functional, it should check-in to the NS and should show BLUE on the Task Server policy page you configured. Then you should continue on to Client troubleshooting below:
4) Troubleshooting Client Connectivity/Installation issues
Generally, I've seen very little actual Client issues, and would recommend you check the servers first. However, here are the things to check for the clients:
- Check the Console under the Task Server policy page to see if the client is checking in to an incorrect server. If it's active, but on the wrong server, you may need to reconfigure which server it's connected to manually on this page and then restart the client system. Remember that the client does not re-connect to the correct server until after a reboot.
- Check the server logs to see if there are client connectivity problems. This could indicate IIS being down on the server or on the client, or more likely that a Task Server hasn't been configured. It's pretty common for me to find that the client agents have been enabled/installed, but no task server configured, and the default Task Server on the NS has been removed (The TS uninstallation policy enabled for instance). This will fill the logs with blue warnings really fast in larger environments.
- Verify that more than the NS has been configured to be a Task Server. Several times I've had customers call after the Network Adminstrator cut off communications or choked it severly because the traffic to the NS was too heavy. Remember the discussion above about how much traffic this causes if there is not another server or servers to handle the load.
- Verify that the Altiris Agent is running on the client system
- Verify that the Task Server agents are installed on the client system
- Verify that the clients have communication with their assigned Task Server. Make sure they can ping it by name, and possibly by FQDN. Host entries may be requried, though this is pretty rare.
- Verify that the clients are connecting to the correct Task Server. If connecting to the wrong one, and IF you have configured the client to connect to a different one in the Task Servers policy page, the client will have to be restarted to "read" the changes given in the policy update. A restart of the agent services is not enough. An actual Reboot is required to re-direct a Task Server client to it's correct Task Server.
- Make sure Port 80 is open on the Task Server and available for annonymous access. This is how the clients communicate with the server in general.