Commit graph

39 commits

Author SHA1 Message Date
Stefan Weil e12c65a915 Rename frk -> deu_latf (ISO 639-3, ISO 15924)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2024-03-09 11:04:42 +01:00
Stefan Weil fa8481f199 Add equ.traineddata (copy from tessdata)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2023-07-24 10:03:38 +02:00
Stefan Weil e2aad9b983 ita: Remove ita.config from ita.traineddata
It added a user_words_suffix which should be reserved for
user configurations.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-11-30 22:03:13 +01:00
zdenop 9e8aeef07c
Merge pull request #47 from SherSpock/patch-2
Update README
2020-03-09 08:28:45 +01:00
Ryder Timberlake d288680f57
Update README
Replace unsupported wiki link with equivalent hosted doc link
2020-03-08 17:07:13 -04:00
Stefan Weil c5e0a7294a Update tessconfigs
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-10-23 13:32:42 +02:00
Stefan Weil e4173f4456 Update URL for tessconfigs submodule (use HTTPS)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-10-11 13:08:43 +02:00
Stefan Weil 41e829655f Add tessconfigs submodule and links for required tessdata files
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-03 16:07:05 +02:00
zdenop e9f15884bc
Merge pull request #37 from stweil/master
Fix extra intra-word spacing for several Asian languages (GitHub issue #991)
2019-05-22 12:15:06 +02:00
Stefan Weil ea00692e71 Fix extra intra-word spacing for Thai (GitHub issue #991)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-21 17:50:06 +02:00
Stefan Weil 80b4d76313 Fix extra intra-word spacing for Japanese (GitHub issue #991)
Fix also the encoding of tessedit_char_blacklist.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-21 17:49:35 +02:00
Stefan Weil 5075f27776 Fix extra intra-word spacing for Chinese (GitHub issue #991)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-21 17:48:52 +02:00
zdenop 95593f0b01
Merge pull request #33 from stweil/master
Improve documentation
2018-10-23 16:57:34 +02:00
Stefan Weil 2d255780f3 Improve documentation
These models don't work with old versions of Tesseract.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-23 16:36:18 +02:00
zdenop f8c44498f3
Merge pull request #28 from stweil/master
Remove parameter textord_tabfind_vertical_horizontal_mix
2018-05-28 16:11:44 +02:00
Stefan Weil 786983dddb Remove parameter textord_tabfind_vertical_horizontal_mix
It was added to Tesseract in 2010 and removed in 2018, but never used.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-05-28 15:57:28 +02:00
zdenop ce12640701
Merge pull request #26 from Shreeshrii/master
correct name kur_ara to kmr - Kurmanji (Latin script)
2018-04-25 19:31:01 +02:00
Shree Devi Kumar 788e2fe923 correct name kur_ara to kmr - Kurmanji (Latin script) 2018-04-25 22:47:45 +05:30
zdenop 09a3a39156
Merge pull request #25 from Shreeshrii/master
Fix config file for Korean, remove `tessedit_load_sublangs chi_tra`
2018-04-09 19:49:25 +02:00
Shreeshrii c7c86bb8de
Fix config file for Korean, remove tessedit_load_sublangs chi_tra
Addresses https://groups.google.com/d/msgid/tesseract-ocr/1e5142e1-d198-46d3-95ee-1a3206d1a2c4%40googlegroups.com?utm_medium=email&utm_source=footer
2018-04-09 19:58:26 +05:30
zdenop 7a1c6b06d7
Merge pull request #21 from stweil/script
Move trained data for scripts to new subdirectory
2018-03-10 21:29:59 +01:00
Stefan Weil a2f7ced76b Move trained data for scripts to new subdirectory
This fixes a name conflict for Lao.traineddata and lao.traineddata
which could not be distinguished on case insensitive filesystems
(for example macOS, Windows).

It makes it also easier for users to see which data is for scripts.
Choosing a script works now like this: tesseract -l script/Latin ...

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-10 21:12:04 +01:00
zdenop 51ebb64c29
Merge pull request #19 from stweil/master
Add Devanagari config file to fix auto PSM issue #1273
2018-02-27 08:25:38 +01:00
Stefan Weil 84bd10ed89 Add Devanagari config file to fix auto PSM issue #1273
Devanagari.config was copied from tessdata_fast.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-27 07:33:28 +01:00
zdenop 208f104882
Merge pull request #1 from stweil/master
Improve GitHub integration
2018-02-02 10:37:00 +01:00
Stefan Weil e744fa9056 Rename license file
Tesseract uses the file LICENSE to show the Apache License,
so rename COPYING to LICENSE.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-02 10:18:00 +01:00
Stefan Weil 9963c18ace README: Improve description and add link to Tesseract wiki
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-02 10:09:47 +01:00
Stefan Weil fb9ae6ba2d README: Add text from former COPYRIGHT and add links
Format also the text, so it looks nicer on GitHub.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-02 10:09:47 +01:00
Stefan Weil 4928952a62 Use the full Apache License text
Now GitHub is able to detect and show the project license.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-02 10:09:47 +01:00
zdenop 3e6ec162ae
Merge pull request #17 from stweil/deu
deu: Remove unwanted dependency
2018-02-02 10:02:28 +01:00
Stefan Weil ed5410b928 deu: Remove unwanted dependency
The data included a configuration which required frk.traineddata
("tessedit_load_sublangs frk"). Remove that.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-01 15:29:03 +01:00
Jeff Breidenbach f1d12682c0 Use legacy Orientation Script Detector (OSD) because that is the only thing that currently works. 2017-09-15 11:44:08 -07:00
zdenop 5cf1eaafa4 Merge pull request #3 from Shreeshrii/master
Fix Config files to LSTM only for nep and mar
2017-09-15 17:56:59 +02:00
Shreeshrii 9c5c2cb2e7 Fix Config files to LSTM only for nep and mar
Change default mode to
tessedit_ocr_engine_mode	1
2017-09-15 21:22:28 +05:30
zdenop 84ae67cd6f Merge pull request #2 from Shreeshrii/master
Fix config files - Tesseract/LSTM combiner to LSTM only
2017-09-15 17:04:17 +02:00
Shreeshrii 09e4326246 Fix config files from Use Tesseract/LSTM combiner to LSTM only
Config files had tessedit_ocr_engine_mode	2
causing processing with --oem 3 (default mode based on config file) to fail 

Failed loading language 'san' / 'hin'
Tesseract couldn't load any languages!
Could not initialize tesseract.
2017-09-15 18:37:50 +05:30
Jeff Breidenbach c222ed852e add license info 2017-09-14 15:04:55 -07:00
Jeff Breidenbach 9ddc24e750 Initial import (on behalf of Ray) 2017-09-14 14:45:10 -07:00
theraysmith 549354e9f1 Initial commit 2017-09-11 18:12:33 +01:00