Work locking

Each work needs to lock resources to be executed. It is unable to lock the given resource, it’s execution is rejected and the work is put at the end of the queue

Why we lock

Most of the execution could happen outside of applicative locks but we need them for few concurrency use cases

Interruptions of process instance: we don’t want a work created a task while the process is being cancelled or aborted
Inclusive gateways: to know if an inclusive gateway can be merged, we need a view of the whole process instance
Other unidentified concurrency issues: We might have some optimistic locking issues that are not well handled

Scope of locks

Locks are on parent process instance level.

It means that:

2 work that executes something on the same process instance can’t be executed at the same time
2 work that executes something on differents process instances can be executed simultaneously
2 work that executes something on differents process instances, that are parts of the same root process instance, can be executed simultaneously

In case of execution on the cluster, the lock span on the whole cluster.

Resources requiring a process instance lock

`try lock`

only try to get a lock and fail fast if the lock is not available ( timeout = 20 milliseconds )

Execute flow node work (ExecuteFlowNodeWork)
Complete flow node work (NotifyChildFinishedWork)
Execute message couple work
Trigger signal work

`lock`

wait as long as possible to get a specific lock ( timeout = 60 secondes )

Execute output operation of the connector We need a lock to save values of data on the output of connector to avoid BatchUpdate kind of exception when saving data
Cancel of process instance through API We need a lock when cancelling to be sure that all elements of a process is cancelled (i.e. avoid that a new task is created while we cancel others)

Locking mechanism of works

All works that require a lock are wrapped in a LockProcessInstance work.

The work first tries to get its lock before opening a transaction.

The work tries to lock the given process instance with a maximum time of 20 ms.

If the lock is not available during those 20 ms, the work fails with a LockTimeoutException. In that case, the WorkExecutorService ignore the exception and put the work at the end of the execution queue. The Retry mechanism is not activated by this special case of lock unavailability so it does not increment the retry counter of the work.

Locking scenario for the execution of a flow node

A typical execution of a flow node happens as follow:

Work ExecuteFlowNodeWork starts
ExecuteFlowNodeWork take the lock on process instance
Open transaction
Execute business logic and change the state of flow node
Commit transaction
Submit NotifyChildFinishedWork
Release lock
NotifyChildFinishedWork take the lock on process instance
Open transaction
Create next flow nodes and delete the current flow node
Commit transaction
Submit 1 ExecuteFlowNodeWork for each next flow nodes
Release lock
and so on…