Isolation not working after upgrade done from 1.13 to 1.14 - Failed to connect to engine at: /var/run/fireglass/engine_socket_1

book

Article ID: 242020

calendar_today

Updated On:

Products

Web Isolation

Issue/Introduction

[FGERROR:BROWSER_MANAGER_FAILED_TO_CONNECT_TO_ENGINE]  (-->) Failed to connect to engine at: /var/run/fireglass/engine_socket_1 

Cause

This error occurs when:

  • An engine, renderer_node, or renderer_core is experiencing an error that causes the connection to close.
  • An engine crashes.

Environment

Release : 1.14.50

Resolution

From the fgdiag output, we see a connection failure leading to the server-side page load failure displayed.

Having investigated the fireglass log, we see the below.

May 15 13:34:02 d1pintfrgtie08 renderer_core[17]:  [InfNetwork] [WARN] [FG:VUnixConnection::Connect:477] [FGERROR:THREAD_NET_CONNECTION_FAILED]  (-->) Unix connect failed: No such file or directory -- [(17)] <TabId: aa3601d9d47614c7, Username: [email protected], ClientId: RDYzMDU4N0BERVYuTE9, TenantId:  [2022-05-15T13:34:02.681031Z
May 15 13:34:02 d1pintfrgtie08 renderer_core[17]:  [RCoreTabs] [ERROR] [FG:ResCode FG::BrowserManager::ConnectToNewTab:120] [FGERROR:BROWSER_MANAGER_FAILED_TO_CONNECT_TO_ENGINE]  (-->) Failed to connect to engine at: /var/run/fireglass/engine_socket_1 -- [(17)] <TabId: aa3601d9d47614c7, Username: [email protected], ClientId: RDYzMDU4N0BERVYuTE9, TenantId:  [2022-05-15T13:34:02.681105Z
May 15 13:34:02 d1pintfrgtie08 localhost /usr/bin/node[45]: [INFO] [1652621642681] [clif_worker_2[45]::client_handler_map.js:20] Unregistering client with clientNetworkId: aa3601d9d47614c71652621642150 <ClientId: RDYzMDU4N0BERVYuTE9D TabId: aa3601d9d47614c7 Url: >

This error occurs when:

  • An engine, renderer_node, or renderer_core is experiencing an error that causes the connection to close.
  • An engine crashes.

Now, to detect the failed/crashed engine/engines and relaunch it/them, please follow the detailed guidance below.

  • Issues: engines are down.

  • What is the customer’s experience: Usually it’s a new/upgraded env that suffers from white pages on every site.

  • How to detect it:
    • Examine fgdiag. This is the expected results in this scenario: https://api-broadcom-ca.wolkenservicedesk.com/attachment/get_attachment_content?uniqueFileId=jDrX+5Q8PU7IO+zVej4YTw==

    • Run fgcli services status . These are the expected results in this scenario: https://api-broadcom-ca.wolkenservicedesk.com/attachment/get_attachment_content?uniqueFileId=Oj6s5aunQf8lAptjMdq5dg==https://api-broadcom-ca.wolkenservicedesk.com/attachment/get_attachment_content?uniqueFileId=DwVRVA8EYikXh2WGUBKN7w==

    • Run fgcli service get-engine-count . The expected result in this scenario is 0 since no engines are up.

  • How to resolve it
    • In the mgmt - go to system configuration->Gateway advanced setting->edit->scroll down to Advanced ->Editsearch k_engine_count. This would show you the manually set engine count if such a count exists. Usually, this is an empty value - and our flow “auto detect” the preferred number of engines based on the TIE machine resources (CPU, memory…). https://api-broadcom-ca.wolkenservicedesk.com/attachment/get_attachment_content?uniqueFileId=eUK0sY/9VHi7bpswyisFQw==

    • If there is no manual k_engine_count value - please perform the following actions: https://api-broadcom-ca.wolkenservicedesk.com/attachment/get_attachment_content?uniqueFileId=NXMmwcVI3dlIqLrCRazG3w==
    • Please look for the final_engine_count value. This value is the engine count that this env should have - either the manual k_engine_count or, if that value is empty - it is the auto detected, calculated preferred engine count based on the env’s cpu and memory : https://api-broadcom-ca.wolkenservicedesk.com/attachment/get_attachment_content?uniqueFileId=c5mPL3GS+O+YVwUVeKgqYA==
    • set the engine count to the final engine count number: [1] fgcli service set-engine-count <final_engine_count>