Skip to content

Conversation

@fzzylogic
Copy link

Added some low hanging fruit file types from the Go version. Put Dcm under archive (as on the Go side). Didn't improve matroska detection or add docx, xlsx or pptx. Nice lib ^^.

@codefo
Copy link

codefo commented Nov 27, 2019

Hey, @h2non what we need to do to merge this PR? Maybe I can help 🙂

@ixna
Copy link

ixna commented Nov 28, 2019

https://github.com/h2non/filetype.py/pull/41/files#diff-ad453f8a0e9dcc5a7320fb8fa6e98de5R96-R99

all doc, xls and ppt are have the same file signature, so no matter which one checked will always detected as a doc type, because it is evaluated first.

:edit:
and also the same case for docx, pptx, xlsx type which in current repository will be detected as zip archive type.

so here i am confused about how to implement this for ms office document types.

:update:
i think it's better to make a group type for doc file signature (magic number) to be application/x-ole-storage and determine which type by filename extension.

- **ttf** - ``application/font-sfnt``
- **otf** - ``application/font-sfnt``

Document
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Group microsoft office pre 2007 documents as application/x-ole-storage as this 3 types have same file signatures. We can determine the file type by filename extension.


# Supported application types
DOCUMENT = (
document.Doc(),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc type it will valid for xls and ppt because of same file signature, so no need to define the others.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants