The previous approach was to uncompress N times a big tarball (638 MB)
where N=130 is the number of supported languages. Each iteration would
only extract a single file, but it still needs to uncompress the whole
tarball. This is of course completely inefficient.
Now, we uncompress the tarball only once to extract all relevant files,
and then iterate N times to copy the file needed for each language.
This massively speeds up builds, at the expense of temporarily requiring
more build space (about 1 GB more)
Signed-off-by: Baptiste Jonglez <git@bitsofnetworks.org>
Move language data menu under the package itself, and shorten the titles
so that all of them show up in the menu.
Signed-off-by: Eneas U de Queiroz <cotequeiroz@gmail.com>
Tesseract is an open source text recognizer (OCR) Engine, available under the Apache 2.0 license. It can be used directly, or (for programmers) using an API to extract printed text from images. It supports a wide variety of languages.
Signed-off-by: Valentín Kivachuk <vk18496@gmail.com>