Skip to content

JSON parsing fails on "lone leading surrogate in hex escape" while normal json.loads don't #120

@lindycoder

Description

@lindycoder

Hello,

In out migration to pydantic 2, we found a JSON document that pydantic 1 was able to load and pydantic 2 can't with the error:

Invalid JSON: lone leading surrogate in hex escape at line...

Here's a simple way of reproducing:

import json

from pydantic_core import from_json

data = b'{"test": "text\udce2\udc80\udc99text"}'

print(json.loads(data))
print(from_json(data))

This first print from python's json works:

{'test': 'text\udce2\udc80\udc99text'}

The second one using pydantic_core (used by pydantic2) raises

Traceback (most recent call last):
  File "check.py", line 7, in <module>
    print(from_json(data))
          ^^^^^^^^^^^^^^^
ValueError: lone leading surrogate in hex escape at line 1 column 20

Here's some versions

Python 3.12.2
pydantic 2.8.2
pydantic-core 2.20.1

Thank you!

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions