Commit graph

39 commits

Author SHA1 Message Date
Stefan Weil
e12c65a915 Rename frk -> deu_latf (ISO 639-3, ISO 15924)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2024-03-09 11:04:42 +01:00
Stefan Weil
fa8481f199 Add equ.traineddata (copy from tessdata)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2023-07-24 10:03:38 +02:00
Stefan Weil
e2aad9b983 ita: Remove ita.config from ita.traineddata
It added a user_words_suffix which should be reserved for
user configurations.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-11-30 22:03:13 +01:00
zdenop
9e8aeef07c
Merge pull request #47 from SherSpock/patch-2
Update README
2020-03-09 08:28:45 +01:00
Ryder Timberlake
d288680f57
Update README
Replace unsupported wiki link with equivalent hosted doc link
2020-03-08 17:07:13 -04:00
Stefan Weil
c5e0a7294a Update tessconfigs
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-10-23 13:32:42 +02:00
Stefan Weil
e4173f4456 Update URL for tessconfigs submodule (use HTTPS)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-10-11 13:08:43 +02:00
Stefan Weil
41e829655f Add tessconfigs submodule and links for required tessdata files
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-03 16:07:05 +02:00
zdenop
e9f15884bc
Merge pull request #37 from stweil/master
Fix extra intra-word spacing for several Asian languages (GitHub issue #991)
2019-05-22 12:15:06 +02:00
Stefan Weil
ea00692e71 Fix extra intra-word spacing for Thai (GitHub issue #991)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-21 17:50:06 +02:00
Stefan Weil
80b4d76313 Fix extra intra-word spacing for Japanese (GitHub issue #991)
Fix also the encoding of tessedit_char_blacklist.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-21 17:49:35 +02:00
Stefan Weil
5075f27776 Fix extra intra-word spacing for Chinese (GitHub issue #991)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-21 17:48:52 +02:00
zdenop
95593f0b01
Merge pull request #33 from stweil/master
Improve documentation
2018-10-23 16:57:34 +02:00
Stefan Weil
2d255780f3 Improve documentation
These models don't work with old versions of Tesseract.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-23 16:36:18 +02:00
zdenop
f8c44498f3
Merge pull request #28 from stweil/master
Remove parameter textord_tabfind_vertical_horizontal_mix
2018-05-28 16:11:44 +02:00
Stefan Weil
786983dddb Remove parameter textord_tabfind_vertical_horizontal_mix
It was added to Tesseract in 2010 and removed in 2018, but never used.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-05-28 15:57:28 +02:00
zdenop
ce12640701
Merge pull request #26 from Shreeshrii/master
correct name kur_ara to kmr - Kurmanji (Latin script)
2018-04-25 19:31:01 +02:00
Shree Devi Kumar
788e2fe923 correct name kur_ara to kmr - Kurmanji (Latin script) 2018-04-25 22:47:45 +05:30
zdenop
09a3a39156
Merge pull request #25 from Shreeshrii/master
Fix config file for Korean, remove `tessedit_load_sublangs chi_tra`
2018-04-09 19:49:25 +02:00
Shreeshrii
c7c86bb8de
Fix config file for Korean, remove tessedit_load_sublangs chi_tra
Addresses https://groups.google.com/d/msgid/tesseract-ocr/1e5142e1-d198-46d3-95ee-1a3206d1a2c4%40googlegroups.com?utm_medium=email&utm_source=footer
2018-04-09 19:58:26 +05:30
zdenop
7a1c6b06d7
Merge pull request #21 from stweil/script
Move trained data for scripts to new subdirectory
2018-03-10 21:29:59 +01:00
Stefan Weil
a2f7ced76b Move trained data for scripts to new subdirectory
This fixes a name conflict for Lao.traineddata and lao.traineddata
which could not be distinguished on case insensitive filesystems
(for example macOS, Windows).

It makes it also easier for users to see which data is for scripts.
Choosing a script works now like this: tesseract -l script/Latin ...

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-10 21:12:04 +01:00
zdenop
51ebb64c29
Merge pull request #19 from stweil/master
Add Devanagari config file to fix auto PSM issue #1273
2018-02-27 08:25:38 +01:00
Stefan Weil
84bd10ed89 Add Devanagari config file to fix auto PSM issue #1273
Devanagari.config was copied from tessdata_fast.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-27 07:33:28 +01:00
zdenop
208f104882
Merge pull request #1 from stweil/master
Improve GitHub integration
2018-02-02 10:37:00 +01:00
Stefan Weil
e744fa9056 Rename license file
Tesseract uses the file LICENSE to show the Apache License,
so rename COPYING to LICENSE.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-02 10:18:00 +01:00
Stefan Weil
9963c18ace README: Improve description and add link to Tesseract wiki
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-02 10:09:47 +01:00
Stefan Weil
fb9ae6ba2d README: Add text from former COPYRIGHT and add links
Format also the text, so it looks nicer on GitHub.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-02 10:09:47 +01:00
Stefan Weil
4928952a62 Use the full Apache License text
Now GitHub is able to detect and show the project license.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-02 10:09:47 +01:00
zdenop
3e6ec162ae
Merge pull request #17 from stweil/deu
deu: Remove unwanted dependency
2018-02-02 10:02:28 +01:00
Stefan Weil
ed5410b928 deu: Remove unwanted dependency
The data included a configuration which required frk.traineddata
("tessedit_load_sublangs frk"). Remove that.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-01 15:29:03 +01:00
Jeff Breidenbach
f1d12682c0 Use legacy Orientation Script Detector (OSD) because that is the only thing that currently works. 2017-09-15 11:44:08 -07:00
zdenop
5cf1eaafa4 Merge pull request #3 from Shreeshrii/master
Fix Config files to LSTM only for nep and mar
2017-09-15 17:56:59 +02:00
Shreeshrii
9c5c2cb2e7 Fix Config files to LSTM only for nep and mar
Change default mode to
tessedit_ocr_engine_mode	1
2017-09-15 21:22:28 +05:30
zdenop
84ae67cd6f Merge pull request #2 from Shreeshrii/master
Fix config files - Tesseract/LSTM combiner to LSTM only
2017-09-15 17:04:17 +02:00
Shreeshrii
09e4326246 Fix config files from Use Tesseract/LSTM combiner to LSTM only
Config files had tessedit_ocr_engine_mode	2
causing processing with --oem 3 (default mode based on config file) to fail 

Failed loading language 'san' / 'hin'
Tesseract couldn't load any languages!
Could not initialize tesseract.
2017-09-15 18:37:50 +05:30
Jeff Breidenbach
c222ed852e add license info 2017-09-14 15:04:55 -07:00
Jeff Breidenbach
9ddc24e750 Initial import (on behalf of Ray) 2017-09-14 14:45:10 -07:00
theraysmith
549354e9f1 Initial commit 2017-09-11 18:12:33 +01:00