-
Notifications
You must be signed in to change notification settings - Fork 16
Open
Labels
bugSomething isn't workingSomething isn't working
Description
This was on a Linux system, and the "A~-" was an "Ö".
- Fix
Ã.
problem above - Fix
LookupError: unknown encoding: EUC-TW
problem
For plain text files it would be best to
-
Review CLI
-
cli.py
(esp.process_dir
) -
ocrd_cli.py
- any plain text files supported here? -
cli_line_dirs.py
-
cli_summarize.py
?
-
-
add
--plain-encoding
option so users have the chance to give it manually -
Fall back to detecting
-
while warning about the auto detecting
-
What about the BOM now?
- Do we have a test that checks if files with BOM are read correctly?
Later
- Autodetect over all files
- falling back to UTF-8 if the detected charset is way out there/unknown like
EUC-TW
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working