Console Slowness After Certificate Replacement or Due to Communication Key Overuse
search cancel

Console Slowness After Certificate Replacement or Due to Communication Key Overuse

book

Article ID: 286506

calendar_today

Updated On:

Products

Carbon Black App Control (formerly Cb Protection)

Issue/Introduction

  • Agent Communication Certificate (System Configuration > Security) recently replaced
  • Console Slowness
  • Unstable Agent Connections
  • Server High Debug Logs shows frequent hits for "METHOD_SECURE_MESSAGE":
    ParseMessageHeader(): [action: METHOD_SECURE_MESSAGE] Session GUID From Message Header [0000-00-00]

Environment

  • App Control Server: 8.7.0 and higher
  • App Control Windows Agent: 8.7.0 and higher
  • App Control macOS Agent 8.9.2 and higher

Cause

Summary:

Changes to the Trusted Communication Certificate have not been fully received by a number of Agents. Those Agents must now rely on the Communication Key to send/receive information from the Server. Decryption of that communication by the Server relies on some Windows APIs that add overhead to the underlying application server, which is known to cause Console performance issues.

What scenarios can trigger this situation?

There are many potential scenarios that could cause this, however the most common include:

  • When replacing the Communication Certificate:
    • The Communication Certificate was already expired.
    • The Certificate Update Schedule chosen was too short.
    • A large number of Agents were offline during the Certificate Update Schedule.
    • The Communication Certificate was replaced multiple times in a row, preventing a proper Certificate Update Schedule.
    • Networking difficulties during the Certificate Update Schedule preventing Agents from downloading the new TrustedCertList.pem.
  • An alternate Resource Download Location is in use and the TrustedCertList.pem file there does not match the one on the App Control Server.
  • A large number of Agents were deployed using an outdated Policy Installer.
  • A Golden Image for the VDI environment has not yet been updated and non-persistent Clones are using outdated information.

How many Agents cause the performance issue?

There's no set number or way to exactly calculate what the application server is capable of. Factors that play a role in the performance impacts include:

  • How fast Agents are able to download the updated TrustedCertList.pem
    • Once an Agent has the updated PEM it will no longer use the Communication Key.
    • Downloading the updated PEM in this situation is much slower than normal circumstances.
  • How many Events are being sent to/from the Server/Agents.
    • These Events are sent using the Communication Key which adds to the overall load.
  • Application server load under normal operating conditions.
    • Example: If the application server is already at (or below) the minimum Operating Environment Requirements, performance impacts will be observed much sooner.

Ultimately; it could be as few as 50-100 Agents using the Communication Key will cause performance issues, or it could be 500 or more before performance impacts are observed. Each environment will have different considerations/variables to consider.

Resolution

Overview

Essentially this will be a simple three step task that itself is very straightforward:

  1. Identify endpoints using the outdated TrustedCertList.pem
  2. Get the updated TrustedCertList.pem file on the endpoints.
  3. Authenticate with the Agent and instruct it to import the updated file.

As mentioned, the Certificate Update Schedule introduced in Server 8.10.2 was meant as a way to allow more time for Agents to receive these changes. Under normal circumstances, the App Control Server can typically push these changes to hundreds of thousands of Agents in under 48 hours. However, when performance impacts are being encountered this will drastically limit the Server's ability to deliver the file automatically using IIS.

Identifying Endpoints

There are several ways to identify the endpoints either struggling to acquire the new TrustedCertList.pem file or using the Communication Key.

Remotely via Server High Debug Logs

  1. Capture 5-10 minutes while the Server is in High Debug and open the ServerLog-TIMESTAMP.bt9 in a text editor such as Notepad++.
  2. Searching for the following phrase will show you machines using the Communication Key:
    ParseMessageHeader(): [action: METHOD_SECURE_MESSAGE]
  3. When searching these results, note the number in brackets just before ParseMessageHeader, example:
    [46210] ParseMessageHeader(): [action: METHOD_SECURE_MESSAGE]
  4. There should be a line just before or after that has this same number in brackets, with the request started and IP Address of the endpoint, example:
    [46210] request started from host address [192.168.0.52]
  5. This means the Agent from IP Address 192.168.0.52 is using the Communication Key (keychain.json) due to a Trusted Certificate mismatch.
    • Until this Agent acquires the updated TrustedCertList.pem file, it will continue to use the Communication Key for all communication with the Server.

Locally via File Hash

  1. Remote in to the application server hosting the Console
  2. In File Explorer open the hostpkg directory, by default this is: C:\Program Files (x86)\Bit9\Parity Server\hostpkg
  3. Confirm the TrustedCertList.pem file exists, then use something like PowerShell to confirm the hash, example:
    Get-FileHash "C:\Program Files (x86)\Bit9\Parity Server\hostpkg\TrustedCertList.pem"
  4. Note the SHA256 Hash returned.
  5. On the relevant endpoint(s) compare the hash of the TrustedCertList.pem file against the Certificate List value reported by the Agent:
    Windows:
    "C:\Program Files (x86)\Bit9\Parity Agent\dascli.exe" status

    macOS:
    /Applications/Bit9/Tools/b9cli --status

    Example output to review:
    Server Information
      Server:            serveraddress.local:41002
        Status:            Up to date
      Policy:            Desktop-HE (4-00000007)
      Config List:       279116 of 279116 (100%)
      Yara Rule Version: 9
        Register Count:    11 (Last 10/3/2024 9:19:37 PM Session[1])
        Poll Count:        212 (Last 10/3/2024 11:04:59 PM)
        File Uploads:      0
        Unsent Queue:      0 Events, 0 File Reports
        Sent Queue:        64483-65595
        Prioritized:       No
        Communication Key: 9FD2F2AD-402B-419E-98EB-A227E3A36F63
        Certificate List:  e6ae090da3821d920580b5b5bd4bf729f3ed393b0302b599c60faacd701ee5da
  6. Any endpoint that does not have a Certificate List showing a matching result for the hash, is being forced to use the Communication Key.
    • Note: The TrustedCertList.pem file is encrypted locally on the endpoint as of Agent 8.9.0.
    • Comparing the hash value of the file locally is not sufficient. 

Updating the TrustedCertList.pem on Endpoints

Any workflow that is able to achieve the steps in Overview will resolve the issue. For purposes of this article, several options will be presented. No matter which option is chosen:

  • Work with your Internal Teams to determine which option is best for the situation & environment.
  • Test the commands manually on an endpoint to be sure your Authentication works as intended.
  • Test the Resolution Option on a small subset to be sure the deployment works as intended.

Option A: Scripting the Commands via SCCM

Scripting the commands could easily allow for your SCCM Team (or otherwise) to deploy a package that authenticates with the Agent and imports the updated TrustedCertList.pem file.

  1. From the application server hosting the Console, copy the TrustedCertList.pem somewhere else that is accessible to the script/SCCM/endpoints.
    • By default these are in the directory: C:\Program Files (x86)\Bit9\Parity Server\hostpkg
  2. Manually test the commands in this article
    • This will help you be sure authentication with the Agent, and the import commands, work.
  3. Use the same commands to write a script to authenticate and import the file.
    • An example using PowerShell on Windows is attached to the bottom of this article.
  4. Test the script on a couple machines identified using the steps above in Identifying Endpoints.
  5. Continue to expand out to all relevant endpoints accordingly.

Option B: Scripting the Commands via Active Directory/GPO Run Once

The concept here is very much the same as Scripting the Commands via SCCM, and you will want to implement against a small group of machines first as well to verify the results. Work with your Active Directory Team to verify the proper procedure for your environment, but the PDF attached to the bottom of this article (AD Import TrustedCertList.pdf) shows an example workflow of the process:

  1. Acquire the TrustedCertList.pem
  2. Create the GPO
  3. Assign tasks for the import and authentication.
  4. Push the GPOs out accordingly.

Option C: Network Segmentation

Engage your Firewall Team to block access to the Server Address via Port 41002. This will prevent Agents from communicating with the App Control Server using the Communication Key which will alleviate the performance impacts. Slowly open this restriction back up after confirming various segments have fully received the updated TrustedCertList.pem file. Example steps:

  1. Block communication to the Server Address on Port 41002 at the firewall layer.
    • It may be necessary to restart the Carbon Black App Control Server service to drop any existing connections.
  2. Allow communication to the Server Address on Port 41002 by a small segment (example, 10 machines).
    1. Verify those initial machines receive the updated TrustedCertList.pem file using the steps above in Identifying Endpoints.
    2. Continue to slowly open communication, expanding the number of machines accordingly.
    3. Monitor Console performance and that Agents are receiving the updated file in a timely fashion.
  3. If performance issues return, too many Agents using the Communication Key were allowed back in. Adjust the quantity or pacing of machines accordingly.

Additional Information

Preventing Future Issues

A combination of changes may be required, but it is recommended to verify the following items and best practices:

  1. Never allow the Communication Certificate to expire before it is replaced
  2. Replace the Communication Certificate several days (or more) ahead of expiration.
    1. Verify the Certificate Update Schedule chosen allows maximum time for the Agents to receive the changes.
    2. Verify the full procedure is followed, including updating the relevant IIS Certificate and (if applicable) the changes are synced to the alternate Resource Download Location.
  3. Verify any new Agent installs are always using the latest Policy Installer.
    • Policy Installers are regenerated frequently using all the latest settings, Rules, and any changes to the TrustedCertList.pem.
    • Using outdated Policy Installers will force Agents to download changes/updates that should otherwise be included at install.
  4. Verify any Golden Image(s) for Clones are updated when the Communication Certificate is replaced.
    • Non-persistent Clones will register with the Server matching the Golden Image.
    • If the Golden Image has an outdated Trusted Communication Certificate, so will the Clones.
  5. Verify Communication & Port Requirements between Server & Agent
  6. Upgrade to the latest Server release to take advantage of enhancements.
    • NOTE: Upgrading after encountering the performance issue will not resolve the issue alone.
    • Server 8.9.4 introduced a Certificate Delay Swap.
    • Server 8.10.2 introduced a customizable Certificate Update Schedule and file transfer improvements.
    • Server 8.11.0 will introduce awareness of Communication Key Usage and a customizable threshold to trigger an Alert.

Attachments

Trusted Cert Import.zip get_app
AD Import TrustedCertList.pdf get_app