Skip to content

Left yellow column suggested information #7

@Derek-Jones

Description

@Derek-Jones

I extract data from graphs in pdf files, where possible by disassembling the internal pdf commands, see https://shape-of-code.com/2013/12/19/converting-graphs-in-pdf-files-to-csv-format/

To date I have used qpdf, but the 'see content' tag in the browser output of pdfsyntax is wonderful :-)

The left yellow column currently contains the byte offset. Some suggestions for other information to include

  • the page number
  • the number of lines in the 'see content'
  • percentage of non-digits/letters, i.e., is it text or binary image

For an example of data that can be extracted, see "108 0 obj " in
http://www.diva-portal.org/smash/get/diva2:509883/FULLTEXT01.pdf

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions