Troubleshooting message processing issues in Workflow Server
search cancel

Troubleshooting message processing issues in Workflow Server

book

Article ID: 174085

calendar_today

Updated On:

Products

IT Management Suite

Issue/Introduction

Symptoms for message processing issues are commonly either actions relying on message processing or task timeouts not working.

This causes a real problem with ServiceDesk as ServiceDesk relies on message processing for communication between different projects and especially ServiceDesk 8.x which relies on message processing for getting data from most of the dialogs in SD.Forms projects to the main IM/PM/CM projects.

Resolution

CONFIGURATION

First thing to check is configuration:

  1. When this checkbox is checked, messages are not being processed. This can get inadvertently checked in some circumstances, notably sometimes (not always) when upgrading 8.0 and 8.1 RU1 with a Workflow installer.
    LocalMachineInfo Editor > 'Do Not Process Timeouts and Escalations'
  2. There are two places for configuring the interval for message processing frequency. In some cases the interval may be too small (1 ms) and increasing it makes it easier for Workflow server.
    LocalMachineInfo Editor > Workflow Server Configuration > AutoTrigger Info > Polling Interval
    This should be in milliseconds now and 10 is a good reference value for the setting.
  3. It should be default in Workflow/ServiceDesk 8.5, but WorkflowResponseQueue and Process Manager Sessions can (optionally) be moved to SQL instead of previous configuration of having these in file storage (%ProgramFiles%\Symantec\Workflow\Data\ProcessManagerFileStorage). Having these in SQL has performance advantages as both he number of files and time it takes for OS file access can be significant problem. Steps to change this configuration are in HOWTO98738.
  4. [Can timeouts/escalations be invoked manually?]
  5. LocalMachineInfo Editor - Start > All Programs > Symantec > Workflow Designer > Tools > LocalMachineInfo Editor
    Scroll down and confirm the 'Integrated Authentication URL' matches the other locations. There should be no :80 (port number) in the URL.
    URL should be set to: http://<SN/FQDN>/ProcessManager/WindowsAuthentication.aspx
  6. Properties.config of the problematic project - %\Program Files\Symantec\Workflow\WorkflowDeploy\Release\<Project_Name>\Properties.config
    Note: This needs to be confirmed in all projects that are having problems with pass-through authentication.
    Open Properties.config in Notepad, and search for <PropertyName>BaseURLToProject</PropertyName>. Just under BaseURLToProject there will be a URL value. Ensure the http://<SN/FQDN>/ portion of the URL matches what is in the other locations.
    If you have ${DEPLOYMENTROOTURL.EN_US} listed in Properties.config, ensure the (local) server contained in LocalMachineInfo Editor has the same URL set as Deployment Root URL.

There are similar settings as ones in points 1 and 2 for Process Manager Portal message processing in LBME.ReportingQueue (default location %ProgramFiles%\Symantec\Workflow\Data\MQFileStorage). These are generally not as important as this processing is rarely affected but could still be verified. Settings are in PM Portal:
Admin > Portal > Master Settings > Reports Settings

  • Process Reporting Messages
    This setting should be checked to ensure messages are processed.
  • Admin > Portal > Master Settings > Reports Settings > Process Reporting Interval (ms.)
    Akin to the other one above, 10 is good reference value.

VERIFYING THE CAUSE

Now, make sure message processing is indeed the cause for the issue. Usually this means doing an action that should result in creating a message, observing that message being created, waiting (or preferably forcing) the message processing and observing the same message still in the queue.

Common affected actions in ServiceDesk 8.x are related to Incident Management:

  • Incident moving from 'Awaiting Response' to 'Received' - 'Classify Email' action in related EM-ticket.
  • Putting Incident on Hold.
  • Resolving Incident.

Steps for an example test when Incident Management is affected. You need to be logged in to portal with administrator user to have access to all necessary things as well as access to SQL Server.

  1. Create a new Incident (or use an existing one)
  2. Get the GUID for 'Diagnose New Incident' task. The down arrow next to the task entry in Process History and Actions > Edit is the easiest way as there is a Task ID in the Advanced tab of the dialog that opens. Cancel out of the dialog window.
  3. Smart Tasks > Hold Management, fill out the fields and click 'Schedule for Later'
  4. There will be a Process Messge: 'Incident put on hold until...' and assignment is cleared but incident will not change status and same Work Incident actions are available.
  5. Now we want to make sure the message is there. Depending on whether WorkflowResponseQueue is in file storage or in database you need to check in different places:
    - File storage:
    %ProgramFiles%\Symantec\Workflow\Data\MQWorkflowFileStorage\localworkflowfilestorage-workflowresponsequeue
    There should be a file in that folder named as TaskID from step 2 and created at the time Incident was scheduled.
    - SQL Storage:
    Run this query against the ProcessManager database:
    SELECT MessageId, QueueName, MessagePostedDate FROM [Messages]
    WHERE QueueName='local.workflowsqlexchange-lbme.workflowresponsequeue'
    AND MessageId = '[TaskID from step 2]'

    This should result in one message posted at the time Incident was scheduled (Note: time is in UTC)
  6. This message should be processed pretty quickly and should be gone from these locations. If it is not gone within a couple minutes, message processing is the problem.

Note: When WorkflowResponseQueue contains a lot of items (50+ is concerning, 100+ is a problem) that are static, stopped message processing is a problem. If there are a lot of items but some of these get removed while others are added the speed of message processing and generally server performance are a problem. This is rarely the case with Workflow 8.x but worth mentioning.

For the amount of items in queue depending on whether file or SQL storage is used:

  • File storage:
    %ProgramFiles%\Symantec\Workflow\Data\MQWorkflowFileStorage\localworkflowfilestorage-workflowresponsequeue
    The amount of files in that folder.
  • SQL Storage:
    Run this query against the ProcessManager database to see a list of messages:
    SELECT MessageId, QueueName, MessagePostedDate FROM [Messages]
    WHERE QueueName='local.workflowsqlexchange-lbme.workflowresponsequeue'

FIND THE PROBLEMATIC QUEUE

While WorkflowResponseQueue itself is the most common one we have also seen issues caused by .tasks queues. Even though WorkflowResponseQueue is the one where messages remain in due to the order of things in message processing failing to process task messages will stop WorkflowResponseQueue processing as well.

The usual suspects are Incident Management or Change Management tasks queues (these are always stored in the SQL database in Message table):

  • local.workflowsqlexchange-incident_mgmt.tasks
  • local.workflowsqlexchange-change_mgmt.tasks

These two are singled out because these are what usually get affected for the simple reason that because both are the main processes our customers use in ServiceDesk the queues are busy and often with large amounts of items, making problems more likely to occur.

To find the queue, we need to try and see if messages for another queue are properly processed.

  1. If Incidents do not progress, try creating a Change ticket and Move it to the second (Planning) stage:
    - Submit Request > IT Services > Request Change
    - Fill out the required fields and click 'Finish'
    - Open the Change ticket and select 'Approve/Deny Change Plan task'
    - Fill out the required fields (Add a CAB and enter Implementer) and click 'Submit Change Plan'
    - If you get to the next task 'CAB Review and Approval' with 'Approve Change Plan' action message processing for Change Management is working.
  2. If Changes do not progress, try putting an Incident on Hold following the steps in the Verifying the cause section above.

Generally, neither of these working indicates either it is WorkflowResponseQueue that is blocked or that there is a more serious issue altogether. For more verification, two custom projects can be created (Workflow project to listen for a message and WebForm project to post a message) to verify the functionality but this is a bit more complex and not usually required.

UNBLOCKING THE QUEUE

The approach to unblock a queue is to delete the message that blocks it. Determining that exact message is where things get quite tricky.

If WorkflowResponseQueue is in the file system, the first step is simple and quite harmless:

  • Move all files out of the WorkflowResponseQueue folder:
    %ProgramFiles%\Symantec\Workflow\Data\MQWorkflowFileStorage\localworkflowfilestorage-workflowresponsequeue
  • Check if the issue is resolved. That is - messages are now being processed.
  • If it is resolved, start moving the files back to the folder in batches and these should get processed. If one of the batches does not get processed, the problematic message is in there. Keep moving the batches in and split a problematic batch to smaller ones to narrow down to the problematic message.

For all the steps from this point on we are going to eventually remove messages from the Messages table and while we can move messages over to temporary or backup table the changes may not be easy to reverse and messing with database directly is inherently dangerous. Please make sure to have a backup of the database at this point.

If WorkflowResponseQueue is in the SQL database,

[...]

If this does not resolve the issue, the problem is with tasks queues.

[to add:
- example sql for finding oldest message, message with past/oldest trigger-date. get-data in powershell to get date from ticks.
- advice for removing a message]