Skip to content

pdf2json hangs in fonts.js in Font_buildToFontChar,  #184

@labsnoir

Description

@labsnoir

The parsing of one PDF I try to read needs very long and the output result looks like eastern asian symbols although it should be german letters.
The length of the array "toUnicode" in fonts.js is 4294967293 and most elements in it are undefined. The traversal of this array take some minutes in buildToFontChar(). Other PDFs can get parsed without problems immediately. Unfortunately I cannot provide the document as it contains private information. If you need further information or if I can check something, please tell me.

Some more information:

  • the version of pdf2json I use is 1.1.8
  • the font name parameter value of the calling function is TimesNewRomanPSMT.

Nevertheless: thank you for this great project!

Edit:
It seems the problem is somewhere in readToUnicode() in evaluator.js. The big size of "toUnicode" is coming from the german umlauts "ä", "ö", "ü" und "ß".

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions