-
Notifications
You must be signed in to change notification settings - Fork 31
Open
Labels
enhancementNew feature or requestNew feature or request
Description
I extract data from graphs in pdf files, where possible by disassembling the internal pdf commands, see https://shape-of-code.com/2013/12/19/converting-graphs-in-pdf-files-to-csv-format/
To date I have used qpdf, but the 'see content' tag in the browser output of pdfsyntax is wonderful :-)
The left yellow column currently contains the byte offset. Some suggestions for other information to include
- the page number
- the number of lines in the 'see content'
- percentage of non-digits/letters, i.e., is it text or binary image
For an example of data that can be extracted, see "108 0 obj " in
http://www.diva-portal.org/smash/get/diva2:509883/FULLTEXT01.pdf
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request