Currently, OCR workflows must be installed into Production in advance by placing the ocrd process script files into ./kitodo/data/ocr_workflows with a .sh suffix, and then configuring them in the projects settings (thereby tying them to new processes).
But what if a user wants to use a different OCR workflow for some processes in a project, or change the workflow for existing processes (because they did not work / run through or the results do not look good)?
For now, one would need to edit the file ocr-workflow.sh in the process directory, and re-trigger the OCR script. (OCR Processing itself is already incremental, so the workflow will then continue to build what ever is still necessary or out-of-date.) But that is tedious and requires access to the file system (Manager share).
The user experience could be much better if we made workflows configurable on the web pages of the Monitor. Crucially, we should allow editing and re-running OCR workflows:
- create a volume for
kitodo/data/ocr_workflows to be shared by Production, Manager and Monitor
- add an endpoint (and reference it on th index page) for listing existing workflows
- make workflows editable (in a simple text form field, perhaps with syntax highlighting), create a new version when saving
- in the workspace view, make workspaces multi-selectable and add an action button for (re-)processing with a selectable workflow
- in the job view, add an action button for re-processing with a selectable workflow
So if a task cannot be finished, because the OCR workflow failed (which in the future could also mean that it did not meet the configured quality threshold), then one will manually trigger said re-processing.
We could even provide a null workflow that will always fail and therefore force you to choose your custom workflow dynamically (per-process).
Saved workflows could also be version-controlled. The workflows should have a free-form description, but their file name should be a hash of their (non-comment, non-whitespace) content.
Also, the Manager should collect statistics about all workflows (which ones ran how often and with what success or quality level), so the Monitor can show them.
Currently, OCR workflows must be installed into Production in advance by placing the
ocrd processscript files into./kitodo/data/ocr_workflowswith a.shsuffix, and then configuring them in the projects settings (thereby tying them to new processes).But what if a user wants to use a different OCR workflow for some processes in a project, or change the workflow for existing processes (because they did not work / run through or the results do not look good)?
For now, one would need to edit the file
ocr-workflow.shin the process directory, and re-trigger the OCR script. (OCR Processing itself is already incremental, so the workflow will then continue to build what ever is still necessary or out-of-date.) But that is tedious and requires access to the file system (Manager share).The user experience could be much better if we made workflows configurable on the web pages of the Monitor. Crucially, we should allow editing and re-running OCR workflows:
kitodo/data/ocr_workflowsto be shared by Production, Manager and MonitorSo if a task cannot be finished, because the OCR workflow failed (which in the future could also mean that it did not meet the configured quality threshold), then one will manually trigger said re-processing.
We could even provide a null workflow that will always fail and therefore force you to choose your custom workflow dynamically (per-process).
Saved workflows could also be version-controlled. The workflows should have a free-form description, but their file name should be a hash of their (non-comment, non-whitespace) content.
Also, the Manager should collect statistics about all workflows (which ones ran how often and with what success or quality level), so the Monitor can show them.