Mistral OCR connectors

The Bonita Mistral OCR connectors let you extract text, structured fields, tables, and classify documents using Mistral AI vision models directly from your Bonita processes.

The Bonita Mistral OCR Connectors are available for Bonita 10.2 Community (2024.3) version and above.

This connector is currently in Beta. It has not yet been fully validated in production environments.

We welcome your feedback — please report testing results or issues using the beta feedback form on GitHub.

We are eager to collaborate with early adopters to bring this connector to General Availability.

Overview

The Mistral OCR connector provides five operations:

Extract Text — extract raw text from documents or images using OCR
Extract Fields — extract structured fields from documents using a JSON schema
Classify Document — classify a document into predefined categories
Extract Table — extract tabular data from documents
Process Batch — process multi-page documents with page range support

Getting started

Add the connector as an extension dependency to your Bonita project. Import the .jar file via Import from file in Bonita Studio.

Connection configuration (shared by all operations)

Parameter	Required	Description	Default
apiKey	Yes	Mistral AI API key	—
baseUrl	No	Mistral API base URL	https://api.mistral.ai/v1
model	No	Model to use for processing	mistral-ocr-latest (extract-text, process-batch) or pixtral-large-latest (others)
connectTimeout	No	Connection timeout in milliseconds	30000
readTimeout	No	Read timeout in milliseconds	120000

Parameter

Required

Description

Default

apiKey

Yes

Mistral AI API key

—

baseUrl

Mistral API base URL

https://api.mistral.ai/v1

model

Model to use for processing

mistral-ocr-latest (extract-text, process-batch) or pixtral-large-latest (others)

connectTimeout

Connection timeout in milliseconds

30000

readTimeout

Read timeout in milliseconds

120000

Document input (shared by all operations)

Documents can be provided either as base64-encoded content or via URL.

Parameter	Required	Description	Default
documentBase64	Conditional	Base64-encoded document content (required if imageUrl is not set)	—
imageUrl	Conditional	URL of the document or image (required if documentBase64 is not set)	—
mimeType	No	MIME type of the document	application/pdf

Parameter

Required

Description

Default

documentBase64

Conditional

Base64-encoded document content (required if imageUrl is not set)

—

imageUrl

Conditional

URL of the document or image (required if documentBase64 is not set)

—

mimeType

MIME type of the document

application/pdf

Extract Text (`mistral-ocr-extract-text`)

Extract raw text from documents or images using OCR.

Input parameters

Parameter	Required	Description	Default
includePageSegmentation	No	Whether to segment text by page	true
language	No	Expected language of the document	—

Parameter

Required

Description

Default

includePageSegmentation

Whether to segment text by page

true

language

Expected language of the document

—

Output parameters

Parameter	Type	Description
extractedText	String	Full extracted text content
pages	List	List of page-level text segments
pageCount	Integer	Number of pages processed
tokensUsed	Integer	Number of tokens consumed
success	Boolean	Whether the operation succeeded
errorMessage	String	Error message if the operation failed

Parameter

Type

Description

extractedText

String

Full extracted text content

pages

List

List of page-level text segments

pageCount

Integer

Number of pages processed

tokensUsed

Integer

Number of tokens consumed

success

Boolean

Whether the operation succeeded

errorMessage

String

Error message if the operation failed

Extract Fields (`mistral-ocr-extract-fields`)

Extract structured fields from a document using a JSON schema definition.

Input parameters

Parameter	Required	Description	Default
fieldsSchema	Yes	JSON schema defining the fields to extract	—
extractionPrompt	No	Additional prompt to guide extraction	—
strictMode	No	Whether to enforce strict schema validation	false

Parameter

Required

Description

Default

fieldsSchema

Yes

JSON schema defining the fields to extract

—

extractionPrompt

Additional prompt to guide extraction

—

strictMode

Whether to enforce strict schema validation

false

Output parameters

Parameter	Type	Description
extractedFields	String	JSON string of extracted fields
extractedFieldsMap	Map	Extracted fields as a key-value map
fieldCount	Integer	Number of fields extracted
confidence	Double	Overall extraction confidence score
tokensUsed	Integer	Number of tokens consumed
success	Boolean	Whether the operation succeeded
errorMessage	String	Error message if the operation failed

Parameter

Type

Description

extractedFields

String

JSON string of extracted fields

extractedFieldsMap

Map

Extracted fields as a key-value map

fieldCount

Integer

Number of fields extracted

confidence

Double

Overall extraction confidence score

tokensUsed

Integer

Number of tokens consumed

success

Boolean

Whether the operation succeeded

errorMessage

String

Error message if the operation failed

Classify Document (`mistral-ocr-classify-document`)

Classify a document into one of several predefined categories.

Input parameters

Parameter	Required	Description	Default
documentTypes	Yes	Comma-separated list of possible document types	—
includeReasoning	No	Whether to include classification reasoning	false

Parameter

Required

Description

Default

documentTypes

Yes

Comma-separated list of possible document types

—

includeReasoning

Whether to include classification reasoning

false

Output parameters

Parameter	Type	Description
documentType	String	Detected document type
confidence	Double	Classification confidence score
reasoning	String	Reasoning for the classification (when includeReasoning is true)
allScores	String	JSON with scores for all document types
tokensUsed	Integer	Number of tokens consumed
success	Boolean	Whether the operation succeeded
errorMessage	String	Error message if the operation failed

Parameter

Type

Description

documentType

String

Detected document type

confidence

Double

Classification confidence score

reasoning

String

Reasoning for the classification (when includeReasoning is true)

allScores

String

JSON with scores for all document types

tokensUsed

Integer

Number of tokens consumed

success

Boolean

Whether the operation succeeded

errorMessage

String

Error message if the operation failed

Extract Table (`mistral-ocr-extract-table`)

Extract tabular data from a document.

Input parameters

Parameter	Required	Description	Default
columnHeaders	No	Expected column headers (comma-separated)	—
tableHint	No	Description of the table to extract	—
pageNumber	No	Page number to extract the table from	1

Parameter

Required

Description

Default

columnHeaders

Expected column headers (comma-separated)

—

tableHint

Description of the table to extract

—

pageNumber

Page number to extract the table from

Output parameters

Parameter	Type	Description
tableData	String	JSON representation of the table
tableDataList	List	Table data as a list of rows
rowCount	Integer	Number of rows extracted
columnCount	Integer	Number of columns detected
detectedHeaders	String	Detected column headers
tokensUsed	Integer	Number of tokens consumed
success	Boolean	Whether the operation succeeded
errorMessage	String	Error message if the operation failed

Parameter

Type

Description

tableData

String

JSON representation of the table

tableDataList

List

Table data as a list of rows

rowCount

Integer

Number of rows extracted

columnCount

Integer

Number of columns detected

detectedHeaders

String

Detected column headers

tokensUsed

Integer

Number of tokens consumed

success

Boolean

Whether the operation succeeded

errorMessage

String

Error message if the operation failed

Process Batch (`mistral-ocr-process-batch`)

Process multi-page documents with optional page range selection.

Input parameters

Parameter	Required	Description	Default
imageUrls	No	Comma-separated list of image URLs (alternative to documentBase64)	—
startPage	No	First page to process	—
endPage	No	Last page to process	—

Parameter

Required

Description

Default

imageUrls

Comma-separated list of image URLs (alternative to documentBase64)

—

startPage

First page to process

—

endPage

Last page to process

—

Output parameters

Parameter	Type	Description
fullText	String	Combined text from all pages
pages	List	List of page-level results
pageCount	Integer	Number of pages processed
totalWordCount	Integer	Total word count
tokensUsed	Integer	Number of tokens consumed
processingTimeMs	Long	Processing time in milliseconds
success	Boolean	Whether the operation succeeded
errorMessage	String	Error message if the operation failed

Parameter

Type

Description

fullText

String

Combined text from all pages

pages

List

List of page-level results

pageCount

Integer

Number of pages processed

totalWordCount

Integer

Total word count

tokensUsed

Integer

Number of tokens consumed

processingTimeMs

Long

Processing time in milliseconds

success

Boolean

Whether the operation succeeded

errorMessage

String

Error message if the operation failed

Error handling

All operations set success=false and populate errorMessage on failure. Error messages are truncated to 1000 characters to prevent database column overflow in Bonita.

HTTP Code	Behavior
200	Success — parse response and populate outputs
400	Bad request — invalid document or parameters
401	Unauthorized — invalid API key
413	Payload too large — document exceeds size limit
429	Rate limited — too many requests
5xx	Server error — Mistral AI service unavailable

Source code

The connector source code is available on GitHub: bonita-connector-mistral-ocr

Mistral OCR connectors

Overview

Getting started

Connection configuration (shared by all operations)

Document input (shared by all operations)

Extract Text (mistral-ocr-extract-text)

Input parameters

Output parameters

Extract Fields (mistral-ocr-extract-fields)

Input parameters

Output parameters

Classify Document (mistral-ocr-classify-document)

Input parameters

Output parameters

Extract Table (mistral-ocr-extract-table)

Input parameters

Output parameters

Process Batch (mistral-ocr-process-batch)

Input parameters

Output parameters

Error handling

Source code

Extract Text (`mistral-ocr-extract-text`)

Extract Fields (`mistral-ocr-extract-fields`)

Classify Document (`mistral-ocr-classify-document`)

Extract Table (`mistral-ocr-extract-table`)

Process Batch (`mistral-ocr-process-batch`)