Config Manager (cbscfgmgrd) refuses connections and fails to start

book

Article ID: 168092

calendar_today

Updated On:

Products

XOS

Issue/Introduction

Describes a workaround for a problem in which the config manager (cbscfgmgrd) rejects connections, starts slowly, closes the port and then repeats.When this problem occurs, entries similar to the following appear in the log files. In addition, you may observe high system load on the primary CPM.
###
Aug 6 15:02:04 X45-2cp2 cbscfgmgrd[1922]: [I] Start listening on sock:19 0.0.0.0:9720
Aug 6 15:02:04 X45-2cp2 cbsalarmmond[2122]: [I] Connected to CM IP 1.1.130.20, port 9720
Aug 6 15:02:04 X45-2cp2 cbsflowcalcd[2031]: [I] sending 36 bytes of object registration to 1.1.130.20
Aug 6 15:02:06 X45-2cp2 cbshmonitord[1961]: [I] Successful Connecting to Cfg-Mgr
Aug 6 15:02:06 X45-2cp2 cbsirmd[2124]: [I] Connected to CM localhost 9720
Aug 6 15:02:09 X45-2cp2 cbsalarmmond[2122]: [E] read_cm_data error while peeking for header info from CM
Aug 6 15:02:09 X45-2cp2 cbsalarmmond[2122]: [I] Closing connection to CM Aug 6 15:02:09 X45-2cp2 cbsflowcalcd[2031]: [W] uint32_t Ctcpclient::recv (void *, unsigned int) read() of 4 bytes from 1.1.130.20:9720 failed, error=104 (Connection reset by peer)
Aug 6 15:02:09 X45-2cp2 cbsflowcalcd[2031]: [E] uint32_t CFSmanager::process_config_mgr_data () failed to rx message from config mgr errno=104 (Connection reset by peer)
Aug 6 15:02:09 X45-2cp2 cbsirmd[2124]: [W] procRecvComplete Error reading: Connection reset by peer
Aug 6 15:02:09 X45-2cp2 kernel: eth2: increased Tx threshold, txcfg 0xd0f01012.
Aug 6 15:02:09 X45-2cp2 cbshmonitord[1961]: [I] CFG MGR closed connection on HM
Aug 6 15:02:10 X45-2cp2 cbsflowcalcd[2031]: [I] sending 36 bytes of object registration to 1.1.130.20
Aug 6 15:02:10 X45-2cp2 cbsflowcalcd[2031]: [W] uint32_t Ctcpclient::connect() connect() to '1.1.130.20' failed, error=111 (Connection refused)
Aug 6 15:02:11 X45-2cp2 cbsflowcalcd[2031]: [I] sending 36 bytes of object registration to 1.1.130.20
###


Cause

Config Manager (cbscfgmgrd) rejects connections, starts slowly, and after a few seconds closes the port, and then attempts to restart. In addition, you may observe high system load on the primary CPM, possibly due to DRBD synchronization.

Problem source:
The problem can be triggered by a high number of entries in the database, for example, a very high number of circuits (several hundred). When counting the number of circuits, include all circuits, including the internal circuit. For example, the actual number of circuits used for a VAP group connected to a group-interface (MLT) with 8 physical ports is 9 circuits.


Resolution

N/A

Workaround

To work around the issue, change the cbsd timeout in /crossbeam/etc/cbsd.cf:

monitortimeout=600000

This configures a timeout delay of ten minutes, rather than the default 20 seconds. In the lab environment the database table building takes six minutes, so ten minutes provides a good margin of error for system with high number of circuits.