Mistral OCR connectors
The Bonita Mistral OCR connectors let you extract text, structured fields, tables, and classify documents using Mistral AI vision models directly from your Bonita processes.
The Bonita Mistral OCR Connectors are available for Bonita 10.2 Community (2024.3) version and above.
|
This connector is currently in Beta. It has not yet been fully validated in production environments. We welcome your feedback — please report testing results or issues using the beta feedback form on GitHub. We are eager to collaborate with early adopters to bring this connector to General Availability. |
Overview
The Mistral OCR connector provides five operations:
-
Extract Text — extract raw text from documents or images using OCR
-
Extract Fields — extract structured fields from documents using a JSON schema
-
Classify Document — classify a document into predefined categories
-
Extract Table — extract tabular data from documents
-
Process Batch — process multi-page documents with page range support
Getting started
Add the connector as an extension dependency to your Bonita project. Import the .jar file via Import from file in Bonita Studio.
Connection configuration (shared by all operations)
| Parameter | Required | Description | Default |
|---|---|---|---|
apiKey |
Yes |
Mistral AI API key |
— |
baseUrl |
No |
Mistral API base URL |
|
model |
No |
Model to use for processing |
mistral-ocr-latest (extract-text, process-batch) or pixtral-large-latest (others) |
connectTimeout |
No |
Connection timeout in milliseconds |
30000 |
readTimeout |
No |
Read timeout in milliseconds |
120000 |
Document input (shared by all operations)
Documents can be provided either as base64-encoded content or via URL.
| Parameter | Required | Description | Default |
|---|---|---|---|
documentBase64 |
Conditional |
Base64-encoded document content (required if imageUrl is not set) |
— |
imageUrl |
Conditional |
URL of the document or image (required if documentBase64 is not set) |
— |
mimeType |
No |
MIME type of the document |
application/pdf |
Extract Text (mistral-ocr-extract-text)
Extract raw text from documents or images using OCR.
Input parameters
| Parameter | Required | Description | Default |
|---|---|---|---|
includePageSegmentation |
No |
Whether to segment text by page |
true |
language |
No |
Expected language of the document |
— |
Output parameters
| Parameter | Type | Description |
|---|---|---|
extractedText |
String |
Full extracted text content |
pages |
List |
List of page-level text segments |
pageCount |
Integer |
Number of pages processed |
tokensUsed |
Integer |
Number of tokens consumed |
success |
Boolean |
Whether the operation succeeded |
errorMessage |
String |
Error message if the operation failed |
Extract Fields (mistral-ocr-extract-fields)
Extract structured fields from a document using a JSON schema definition.
Input parameters
| Parameter | Required | Description | Default |
|---|---|---|---|
fieldsSchema |
Yes |
JSON schema defining the fields to extract |
— |
extractionPrompt |
No |
Additional prompt to guide extraction |
— |
strictMode |
No |
Whether to enforce strict schema validation |
false |
Output parameters
| Parameter | Type | Description |
|---|---|---|
extractedFields |
String |
JSON string of extracted fields |
extractedFieldsMap |
Map |
Extracted fields as a key-value map |
fieldCount |
Integer |
Number of fields extracted |
confidence |
Double |
Overall extraction confidence score |
tokensUsed |
Integer |
Number of tokens consumed |
success |
Boolean |
Whether the operation succeeded |
errorMessage |
String |
Error message if the operation failed |
Classify Document (mistral-ocr-classify-document)
Classify a document into one of several predefined categories.
Input parameters
| Parameter | Required | Description | Default |
|---|---|---|---|
documentTypes |
Yes |
Comma-separated list of possible document types |
— |
includeReasoning |
No |
Whether to include classification reasoning |
false |
Output parameters
| Parameter | Type | Description |
|---|---|---|
documentType |
String |
Detected document type |
confidence |
Double |
Classification confidence score |
reasoning |
String |
Reasoning for the classification (when includeReasoning is true) |
allScores |
String |
JSON with scores for all document types |
tokensUsed |
Integer |
Number of tokens consumed |
success |
Boolean |
Whether the operation succeeded |
errorMessage |
String |
Error message if the operation failed |
Extract Table (mistral-ocr-extract-table)
Extract tabular data from a document.
Input parameters
| Parameter | Required | Description | Default |
|---|---|---|---|
columnHeaders |
No |
Expected column headers (comma-separated) |
— |
tableHint |
No |
Description of the table to extract |
— |
pageNumber |
No |
Page number to extract the table from |
1 |
Output parameters
| Parameter | Type | Description |
|---|---|---|
tableData |
String |
JSON representation of the table |
tableDataList |
List |
Table data as a list of rows |
rowCount |
Integer |
Number of rows extracted |
columnCount |
Integer |
Number of columns detected |
detectedHeaders |
String |
Detected column headers |
tokensUsed |
Integer |
Number of tokens consumed |
success |
Boolean |
Whether the operation succeeded |
errorMessage |
String |
Error message if the operation failed |
Process Batch (mistral-ocr-process-batch)
Process multi-page documents with optional page range selection.
Input parameters
| Parameter | Required | Description | Default |
|---|---|---|---|
imageUrls |
No |
Comma-separated list of image URLs (alternative to documentBase64) |
— |
startPage |
No |
First page to process |
— |
endPage |
No |
Last page to process |
— |
Output parameters
| Parameter | Type | Description |
|---|---|---|
fullText |
String |
Combined text from all pages |
pages |
List |
List of page-level results |
pageCount |
Integer |
Number of pages processed |
totalWordCount |
Integer |
Total word count |
tokensUsed |
Integer |
Number of tokens consumed |
processingTimeMs |
Long |
Processing time in milliseconds |
success |
Boolean |
Whether the operation succeeded |
errorMessage |
String |
Error message if the operation failed |
Error handling
All operations set success=false and populate errorMessage on failure. Error messages are truncated to 1000 characters to prevent database column overflow in Bonita.
| HTTP Code | Behavior |
|---|---|
200 |
Success — parse response and populate outputs |
400 |
Bad request — invalid document or parameters |
401 |
Unauthorized — invalid API key |
413 |
Payload too large — document exceeds size limit |
429 |
Rate limited — too many requests |
5xx |
Server error — Mistral AI service unavailable |
Source code
The connector source code is available on GitHub: bonita-connector-mistral-ocr