-
Notifications
You must be signed in to change notification settings - Fork 116
Open
Description
Hello,
I've noticed that the current regex used to get the Content-Type is not sufficient to cover all content types.
Current regex:
https://github.com/laluka/bypass-url-parser/blob/main/src/bypass_url_parser/__init__.py#L1495
REGEX_CONTENT_TYPE = re.compile(r"Content-Type:\s+(\w+/\w+)", re.IGNORECASE)
This can be seen in regex101.com too
This will lead to empty content-types in the output.
Also, curl sometimes returns the headers case-insensitive, so to fix all these, I suggest the following regex:
(?i)content-type:\s*([-\w.]+/[-\w.]+(?:\s*;\s*[\w-]+=(?:\"[^\"]*\"|[^\s;]*))*)
So, the code would be:
REGEX_CONTENT_TYPE = re.compile(r'(?i)content-type:\s*([-\w.]+/[-\w.]+(?:\s*;\s*[\w-]+=(?:\"[^\"]*\"|[^\s;]*))*)')
Metadata
Metadata
Assignees
Labels
No labels