OCR Engine - the size of your documents matters!

As mentioned in our last blog post, in this blog we will start explaining the technology side of our OCR engine step by step. You will have access to the corresponding functions via our API.

Here we emphasize once again that the individual processes and functions run in the API. If you use our OCR engine on our cloudintegration platform, such functions run exclusively in the background and are irrelevant for you. The detailed explanation of the functions aims at a better understanding of the processes within the OCR Engine.

Recap

An OCR Server processes a regular PDF into a machine-encoded text, that is readable and accessible at any given point in time. This process involves many small steps and different functionalities, which can optimize the transformation.

To start off, we want to talk about the size of your document. Depending on the size, the transformation will take place on different servers depending on a decision, that is made at the very start of the transformation process.

Sync vs. async

Depending on the size of the document uploaded (larger or smaller than 4 MB), our server is able to decide whether to perform a sync or an async task.

What does that mean?

Synchronous operations tasks perform one at a time. Therefore, only when one completes, the following can take place.

It therefore only makes sense, that the transformation for smaller documents is performed as a sync task by our OCR Server.

Whereas asynchronous operations can move to another task before the previous task finishes. These means, that multiple large sized documents can be transformed at the same time.

Now this was the general definition for our sync and our async tasks. I still want to showcase the differences in the procedure of transformation. In our case, sync and async transformation tasks are performed on different servers. We perform the transformation of async tasks on a server called cloudintegration-celery, while the regular sized documents process on a general cloudintegration-server.

In the API an answer is given from the server in the form of a “Task-ID”

This Task ID enables you to check on your documents processing status whenever you want. It will tell you that the transformation is “pending” until it is “finished”. By that moment, your Task-ID turns into a Download-ID so you can download your processed OCR document.

When a sync task is performed, a Download-ID is given after the transformation right away.

Now you know about the different processes depending on the size of your document in our API. You can also test our online version of the Server here.

Thank your for reading!

Additional blogs:

Making the right choice

The secret ingredient to success

Optical Character Recognition (OCR)

Maximise your success

MJR – New FELLOWPRO partner