Troubleshoooting Communication Key Overuse
search cancel

Troubleshoooting Communication Key Overuse

book

Article ID: 286506

calendar_today

Updated On:

Products

Carbon Black App Control (formerly Cb Protection)

Issue/Introduction

A recent Server Communication Certificate update (System Configuration > Security tab) is the cause of:

  • Very slow load times for all web console pages
  • Multiple Agents become disconnected
  • Multiple Agents that are connected show as "Approvals Out of Date"
  • Running "Dascli status" on the agents show:
   SSL Key Pinning:   On (Mismatch)
ParseMessageHeader(): [action: METHOD_SECURE_MESSAGE] Session GUID From Message Header [0000-00-00]

Environment

  • App Control Server: 8.7.0 and higher
  • App Control Windows Agent: 8.7.0 and higher
  • App Control macOS Agent 8.9.2 and higher

Cause

Summary:

  • Updates to the Trusted Server Certificate have not been fully received by a large number of Agents.
  • These Agents must now rely on the Communication Key to send/receive information from the Server.
  • Encrypting/decrypting of this communication by the Server adds significant performance overhead to the underlying app server, which cause variety of performance issues.

What scenarios can trigger this situation?

There are many potential scenarios that could cause this; however, the most common include:

  1. When replacing the Server Communication Certificate:
    • The Communication Certificate was already expired
    • The Certificate Update Schedule chosen was too short
    • A large number of Agents were offline during the Certificate Update
    • The Communication Certificate was replaced multiple times in a row, preventing a proper Certificate Update Schedule.
    • Networking issues updating preventing Agents from downloading the new TrustedCertList.pem
  2. Invalid IIS Certificate bound on Port 443, examples include:
    • Mismatch between the IIS Certificate Common Name and the App C Server name
    • Missing or invalid Subject Alternative Names
    • The IIS certificate was expired
    • If Strong SSL is enabled, Root Certificate used by the IIS web server must be present in the local machine's Trusted Root Certificate Store
  3. SSL Inspection between Agents and App Control Server
    • If SSL inspection is used, the certificate used by the VPN/Firewall must be trusted by adding to the Trusted Certificates table in the App C web console
  4. An alternate or incorrect Resource Download Location URL is in use and the TrustedCertList.pem file does not match the one on the App Control Server.
  5. A large number of Agents were deployed using an outdated Policy Installer.
  6. A Golden Image for the VDI environment has not yet been updated and non-persistent Clones are using outdated information.

How many Agents cause the performance issues?

There's no exact number or way to calculate what the application server is capable of handling. Factors that play a role in the performance impacts include:

  • How fast Agents are able to download the updated TrustedCertList.pem
    • Once an Agent has the updated PEM it will no longer use the Communication Key.
    • Downloading the updated PEM in this situation is much slower than normal circumstances.
  • How many Events are being sent to/from the Server/Agents.
    • These Events are sent using the Communication Key which adds to the overall load.
  • Application server load under normal operating conditions.
    • Example: If the application server is already at (or below) the minimum Operating Environment Requirements, performance impacts will be observed much sooner.

Ultimately, it could be as few as 50-100 Agents using the Communication Key will cause performance issues, or it could be 500 or more before performance impacts are observed. Each environment will have different considerations/variables to consider.

Resolution

    • Overview

      Essentially this will be a simple three-step task that itself is very straightforward:

      1. Identify endpoints using the outdated TrustedCertList.pem
      2. Get the updated TrustedCertList.pem file on the endpoints.
      3. Authenticate with the Agent and instruct it to import the updated file.

      As mentioned, the Certificate Update Schedule introduced in Server 8.10.2 was meant as a way to allow more time for Agents to receive these changes. Under normal circumstances, the App Control Server can typically push these changes to hundreds of thousands of Agents in under 24 hours. However, when performance impacts are being encountered, this will drastically limit the Server's ability to deliver the file automatically using IIS.

      Identifying Endpoints

      There are several ways to identify the endpoints either struggling to acquire the new TrustedCertList.pem file or using the Communication Key.

      Remotely in the Web Console

      Server 8.11.0+

      1. Navigate to Assets > Computers
      2. Set the Saved View to (none)
      3. Click Show Filters > Add Filter > Using Communication Key > Yes > Apply

      Server 8.10.4-

      1. Navigate to Reports > Events >
      2. Set the Saved View to (none)
      3. Group By: Source
      4. Click Show Filters > Add Filters:
        • Subtype is: Agent Health Check
        • Description begins with: Carbon Black App Control Agent detected a problem: Untrusted server certificate

      Remotely via Server High Debug Logs

      1. Capture 5-10 minutes while the Server is in High Debug and open the ServerLog-TIMESTAMP.bt9 in a text editor such as Notepad++.
      2. Searching for the following phrase will show you machines using the Communication Key:
        ParseMessageHeader(): [action: METHOD_SECURE_MESSAGE]
      3. When searching these results, note the number in brackets just before ParseMessageHeader, example:
        [46210] ParseMessageHeader(): [action: METHOD_SECURE_MESSAGE]
      4. There should be a line just before or after that has this same number in brackets, with the request started and IP Address of the endpoint, example:
        [46210] request started from host address [192.168.0.52]
      5. This means the Agent from IP Address 192.168.0.52 is using the Communication Key (keychain.json) due to a Trusted Certificate mismatch.
        • Until this Agent acquires the updated TrustedCertList.pem file, it will continue to use the Communication Key for all communication with the Server.

      Locally via Dasli Status:

      1. On an endpoint > Open CMD and run:
        • "C:\Program Files (x86)\Bit9\Parity Agent\DasCLI.exe" status
      2. Check for the following line:
        • Client Information
          ...
            SSL Key Pinning:   On (Mismatch)
        • Any Agent showing "SSL Key Pinning: On (Mismatch)" is forced to use the Communication Key.

    Locally via TrustedCertList File Hash

      1. Remote in to the application server hosting the Web Console
      2. Use PowerShell to get the hash, for example:
        Get-FileHash "C:\Program Files (x86)\Bit9\Parity Server\hostpkg\TrustedCertList.pem"

        Algorithm       Hash                                                                   Path
        ---------       ----                                                                   ----
        SHA256          BC375FC5A9B115C18505565F806E17E7D3193BF217D3F49C3BE940FE0CEC4B80       C:\Program Files (x86)\Bit9\P...
      3. On the relevant endpoints compare the hash of the TrustedCertList.pem file against the Certificate List value reported by the Agent:
        Windows:
        "C:\Program Files (x86)\Bit9\Parity Agent\dascli.exe" status

        macOS:
        /Applications/Bit9/Tools/b9cli --status

        Example output:
        Server Information
          Server:            serveraddress.local:41002
        ...
            Certificate List:  e6ae090da3821d920580b5b5bd4bf729f3ed393b0302b599c60faacd701ee5da
        • Any Agent that does not have a Certificate List showing a matching result for the hash, is forced to use the Communication Key.
        • Note: The TrustedCertList.pem file is encrypted locally on the agents as of Agent 8.9.0+

    Updating the TrustedCertList.pem on Endpoints

    Any workflow that is able to achieve the steps in Overview will resolve the issue. For purposes of this article, several options will be presented. No matter which option is chosen:

    • Work with your Internal Teams to determine which option is best for the situation & environment.
    • Test the commands manually on an endpoint to be sure your Authentication works as intended.
    • Test the Resolution Option on a small subset to be sure the deployment works as intended.

    Option A: Resolve any network issues preventing successful downloads

    1. Verify Port 443 into the server is open
    2. Verify the Resource Download Location in System Configuration > Advanced Options is still accurate and contains the updated files.
    3. Verify the IIS Certificate bound to Port 443 is not expired and formatted correctly:
      • Common Name shown should match Server Address from the General tab.
      • Expiration Date should be in the future.
      • A matching Certificate should be listed in the Trusted Communication Certificates list at the bottom on the System Config > Security tab
    4. Verify whether a Proxy or other VPN/SSL Inspection network device is between the Agents and the App Control Server.
      • If a certificate exists on the Proxy or other Network Appliance, it must be imported in the Trusted Communication Certificates list (System Config > Security tab 
    5. If SSL Inspection is enabled, the Agents will reject the modified packets.
    6. If any other authentication (such as 2FA) is enabled for 41002 or 443 the Agents may fail to properly communicate.

    Option B: Scripting the Commands and deploying via any deployment tool (GPO/SCCM/Intune)

    Scripting the commands could easily allow for your SCCM Team to deploy a package that authenticates with the Agent and imports the updated TrustedCertList.pem file.

    1. From the application server hosting the Console, copy the TrustedCertList.pem file on a location that is accessible to all the agents
      • By default the file is here: C:\Program Files (x86)\Bit9\Parity Server\hostpkg\Trustedcertlis.pem
    2. Manually import the files with commands in this article on one or more agents. 
      • This will ensure authentication with the Agent, and that the import commands work.
    3. Use the same commands to write a script or use the PowerShell script attached to the bottom of this article.
    4. Test the script on a couple of agents identified using the steps above in Identifying Endpoints.
    5. Continue to expand out to all relevant endpoints accordingly.

    Option C: Deploying via Active Directory/GPO Run Once

    Work with your Active Directory team to verify the proper procedure for your environment using the PDF attached to the bottom of this article (AD Import TrustedCertList.pdf):

    1. Acquire the TrustedCertList.pem
    2. Create the GPO
    3. Assign tasks for the import and authentication.
    4. Push the GPOs out accordingly.

    Option D: Traffic Throttling

    Work with your Firewall team to block access to the Server via port 41002. This will prevent Agents from communicating with the App Control Server using the Communication Key which will alleviate the performance impact.

    Slowly open this restriction back up after confirming various segments have fully received the updated TrustedCertList.pem file. Example steps:

      1. Block communication to the Server Address on Port 41002 at the firewall layer.
        • It may be necessary to restart the Carbon Black App Control Server service to drop any existing connections.
      2. Allow communication to the Server Address on Port 41002 by a small segment (example, 10 machines).
        • Verify those initial machines receive the updated TrustedCertList.pem file using the steps above in Identifying Endpoints.
        • Continue to slowly open communication, expanding the number of machines accordingly.
        • Monitor Console performance and that Agents are receiving the updated file in a timely fashion.
      3. If performance issues return, too many Agents using the Communication Key were allowed back in. Adjust the quantity or pacing of machines accordingly.

Additional Information

Preventing Future Issues

A combination of changes may be required, but it is recommended to verify the following items and best practices:

  1. Never allow the Communication Certificate to expire before it is replaced
  2. Replace the Communication Certificate several days (or more) ahead of expiration.
    1. Verify the Certificate Update Schedule chosen allows maximum time for the Agents to receive the changes.
    2. Verify the full procedure is followed, including updating the relevant IIS Certificate and (if applicable) the changes are synced to the alternate Resource Download Location.
  3. Verify any new Agent installs are always using the latest Policy Installer.
    • Policy Installers are regenerated frequently using all the latest settings, Rules, and any changes to the TrustedCertList.pem.
    • Using outdated Policy Installers will force Agents to download changes/updates that should otherwise be included at install.
  4. Verify any Golden Image(s) for Clones are updated when the Communication Certificate is replaced.
    • Non-persistent Clones will register with the Server matching the Golden Image.
    • If the Golden Image has an outdated Trusted Communication Certificate, so will the Clones.
  5. Verify Communication & Port Requirements between Server & Agent
  6. Upgrade to the latest Server release to take advantage of enhancements.
    • NOTE: Upgrading after encountering the performance issue will not resolve the issue alone.
    • Server 8.9.4 introduced a Certificate Delay Swap.
    • Server 8.10.2 introduced a customizable Certificate Update Schedule and file transfer improvements.
    • Server 8.11.0 introduced awareness of Communication Key Usage and a customizable threshold to trigger an Alert.

Attachments

Trusted Cert Import.zip get_app
AD Import TrustedCertList.pdf get_app