-
Notifications
You must be signed in to change notification settings - Fork 391
Description
The parsing of one PDF I try to read needs very long and the output result looks like eastern asian symbols although it should be german letters.
The length of the array "toUnicode" in fonts.js is 4294967293 and most elements in it are undefined. The traversal of this array take some minutes in buildToFontChar(). Other PDFs can get parsed without problems immediately. Unfortunately I cannot provide the document as it contains private information. If you need further information or if I can check something, please tell me.
Some more information:
- the version of pdf2json I use is 1.1.8
- the font name parameter value of the calling function is TimesNewRomanPSMT.
Nevertheless: thank you for this great project!
Edit:
It seems the problem is somewhere in readToUnicode() in evaluator.js. The big size of "toUnicode" is coming from the german umlauts "ä", "ö", "ü" und "ß".