This content is dedicated to our next version. It is work in progress: its content will evolve until the new version is released.

Before that time, it cannot be considered as official.

Work execution

Definition

A work is an execution unit. It is java runnable object executed in a thread pool. It’s the most basic component of the Bonita Engine, and is central to the execution of processes.

A work often wraps one another to add specific behaviors. An ExecuteFlowNodeWork contains the flownode execution logic. It is wrapped in a TxBonitaWork, to ensure everything is executing inside a transaction. It is itself wrapped into a LockProcessInstanceWork, to allow to lock any concurrent access to the same process instance.
It is itself wrapped into a InSessionBonitaWork, to set the tenant context (tenantId).

Basic call flow

The ThreadPoolExecutor is responsible for executing works. It calls the work() method on the parent Work. The parent will then call work() on the wrapped work who will call work() on its wrapped work and so on.

Example : flownode execution

The BonitaThreadPoolExecutor is responsible for executing the work. The basic call flow is explained below:

  • The top-level work (of type InSessionBonitaWork) sets the tenantId and then calls the work() method on its wrapped lower-level work, which is of type ProcessInstanceContextWork.

  • ProcessInstanceContextWork fills in some context information and then calls work() on its wrapped lower-level work, which is of type LockProcessInstanceWork.

  • LockProcessInstanceWork [[tries to obtain an exclusive lock

    on the containing process instance corresponding to the flownode to execute:

  • If it cannot get the lock within 20ms (by default), a LockTimeoutException is thrown so that the work is placed at the end of the work queue, for later execution

  • In any case, if the lock has been obtained, it is released after the execution (in a finally block)

  • Then the wrapped TxBonitaWork is executed, which wraps the wrapped work into a JTA transaction

  • Then the ExecuteFlowNodeWork executes its logic.

Error management

There are 3 types of errors:

  • No lock could be acquired.

  • An exception of temporary nature occurred (Timeout, OptimisticLockException, database connection issues…​).

  • All the other type of exceptions.

If the lock could not be acquired a specific set of action occurs. see here. If an exception of temporary nature occurred, the work would be retried (see below). Other exceptions are handled by the method handleFailure() . On a flownode it sets its state to "failed".

Retry mechanism

When a work fails, there is a mechanism in place to retry it. If we identify that the issue that caused the work to fail is a temporary issue, we:

  • put it at the end of the queue

  • set a date before which the work should not be retried

  • increment the retryCount of the WorkDescriptor

Failures

When a flownode fails, the exception that caused the failure is stored and can be retrieved by the API (subscription editions only). A failure is annotated with additional context information and can also be consulted after the flownode has been archived.

Auditing

A special logger can help identify potential issue of work execution:

29-Jul-2020 17:24:04.674 AVERTISSEMENT [Bonita-Worker-1-09] org.bonitasoft.engine.work.audit.AuditListener.abnormalExecutionStatusDetected Potential abnormal execution detected - cause TOO_MANY_EXECUTIONS. org.bonitasoft.engine.work.WorkDescriptor@323b63d6[uuid=55a92f06-7061-43e9-8523-1c6955a26fc2,type=EXECUTE_FLOWNODE,tenantId=1,parameters={processDefinitionId=XXXX, processInstanceId=XXXX, stateCanceling=false, stateExecuting=false, stateId=37, stateAborting=false, flowNodeInstanceId=CCC},retryCount=Y,executionThreshold=2020-07-29T06:24:04.674Z,executionCount=10,registrationDate=2020-07-29T06:11:44.943Z,abnormalExecutionDetected=true]

In that case the warning tells us the work was taken for execution but not completed 10 times.

We can see here:

  • executionCount is the counter of every time was taken for execution.

  • retryCount is the counter of each time the work failed because of a retryable exception.

see Work executiuon audit for more details.