Config Manager (cbscfgmgrd) refuses connections and fails to start
book
Article ID: 168092
calendar_today
Updated On:
Products
XOS
Issue/Introduction
Describes a workaround for a problem in which the config manager (cbscfgmgrd) rejects connections, starts slowly, closes the port and then repeats.When this problem occurs, entries similar to the following appear in the log files. In addition, you may observe high system load on the primary CPM. ### Aug 6 15:02:04 X45-2cp2 cbscfgmgrd[1922]: [I] Start listening on sock:19 0.0.0.0:9720 Aug 6 15:02:04 X45-2cp2 cbsalarmmond[2122]: [I] Connected to CM IP 1.1.130.20, port 9720 Aug 6 15:02:04 X45-2cp2 cbsflowcalcd[2031]: [I] sending 36 bytes of object registration to 1.1.130.20 Aug 6 15:02:06 X45-2cp2 cbshmonitord[1961]: [I] Successful Connecting to Cfg-Mgr Aug 6 15:02:06 X45-2cp2 cbsirmd[2124]: [I] Connected to CM localhost 9720 Aug 6 15:02:09 X45-2cp2 cbsalarmmond[2122]: [E] read_cm_data error while peeking for header info from CM Aug 6 15:02:09 X45-2cp2 cbsalarmmond[2122]: [I] Closing connection to CM Aug 6 15:02:09 X45-2cp2 cbsflowcalcd[2031]: [W] uint32_t Ctcpclient::recv (void *, unsigned int) read() of 4 bytes from 1.1.130.20:9720 failed, error=104 (Connection reset by peer) Aug 6 15:02:09 X45-2cp2 cbsflowcalcd[2031]: [E] uint32_t CFSmanager::process_config_mgr_data () failed to rx message from config mgr errno=104 (Connection reset by peer) Aug 6 15:02:09 X45-2cp2 cbsirmd[2124]: [W] procRecvComplete Error reading: Connection reset by peer Aug 6 15:02:09 X45-2cp2 kernel: eth2: increased Tx threshold, txcfg 0xd0f01012. Aug 6 15:02:09 X45-2cp2 cbshmonitord[1961]: [I] CFG MGR closed connection on HM Aug 6 15:02:10 X45-2cp2 cbsflowcalcd[2031]: [I] sending 36 bytes of object registration to 1.1.130.20 Aug 6 15:02:10 X45-2cp2 cbsflowcalcd[2031]: [W] uint32_t Ctcpclient::connect() connect() to '1.1.130.20' failed, error=111 (Connection refused) Aug 6 15:02:11 X45-2cp2 cbsflowcalcd[2031]: [I] sending 36 bytes of object registration to 1.1.130.20 ###
Cause
Config Manager (cbscfgmgrd) rejects connections, starts slowly, and after a few seconds closes the port, and then attempts to restart. In addition, you may observe high system load on the primary CPM, possibly due to DRBD synchronization.
Problem source: The problem can be triggered by a high number of entries in the database, for example, a very high number of circuits (several hundred). When counting the number of circuits, include all circuits, including the internal circuit. For example, the actual number of circuits used for a VAP group connected to a group-interface (MLT) with 8 physical ports is 9 circuits.
Resolution
N/A
Workaround
To work around the issue, change the cbsd timeout in /crossbeam/etc/cbsd.cf:
monitortimeout=600000
This configures a timeout delay of ten minutes, rather than the default 20 seconds. In the lab environment the database table building takes six minutes, so ten minutes provides a good margin of error for system with high number of circuits.