From 1af1f23cc076bcba0853c49cca7e18a3bf5fc16b Mon Sep 17 00:00:00 2001 From: Jason Desrosiers Date: Wed, 6 Aug 2025 12:19:14 -0700 Subject: [PATCH 1/2] New Security Considerations --- specs/jsonschema-core.md | 149 ++++++++++++++++++++++++++++++++------- 1 file changed, 123 insertions(+), 26 deletions(-) diff --git a/specs/jsonschema-core.md b/specs/jsonschema-core.md index 2cf7f4ce..8c35b966 100644 --- a/specs/jsonschema-core.md +++ b/specs/jsonschema-core.md @@ -2029,32 +2029,129 @@ SHOULD use the terms defined by this document to do so. ## Security Considerations {#security} -Both schemas and instances are JSON values. As such, all security considerations -defined in [RFC 8259][rfc8259] apply. - -Instances and schemas are both frequently written by untrusted third parties, to -be deployed on public Internet servers. Implementations should take care that -the parsing and evaluating against schemas does not consume excessive system -resources. Implementations MUST NOT fall into an infinite loop. - -A malicious party could cause an implementation to repeatedly collect a copy of -a very large value as an annotation. Implementations SHOULD guard against -excessive consumption of system resources in such a scenario. - -Servers MUST ensure that malicious parties cannot change the functionality of -existing schemas by uploading a schema with a pre-existing or very similar -`$id`. - -Individual JSON Schema extensions are liable to also have their own security -considerations. Consult the respective specifications for more information. - -Schema authors should take care with `$comment` contents, as a malicious -implementation can display them to end-users in violation of a spec, or fail to -strip them if such behavior is expected. - -A malicious schema author could place executable code or other dangerous -material within a `$comment`. Implementations MUST NOT parse or otherwise take -action based on `$comment` contents. +While schemas and instances are not always represented as JSON text, they are +defined in terms of the JSON data model. As such, the security considerations +defined in [RFC 8259][rfc8259] may still apply in environments where text-based +representations are used, particularly those considerations related to parsing, +number precision, and structural limitations. + +Schemas and instances are frequently authored by untrusted parties. +Implementations that accept or evaluate such inputs may be exposed to several +classes of attack, particularly denial-of-service (DoS) by means of resource +exhaustion. + +### Nested `anyOf`/`oneOf` + +One risk for resource exhaustion in JSON Schema arises from the nested use of +`anyOf` and `oneOf`. While a single combinator keyword with multiple subschemas +is typically manageable, nesting them causes the number of evaluation paths to +grow exponentially. + +For example, a `oneOf` with 5 subschemas, each containing another `oneOf` with 5 +options, results in 25 evaluation paths. Adding a third level increases this to +125, and so on. Attackers can exploit this by crafting schemas that force +validators to explore a large number of branches. + +This evaluation explosion is particularly dangerous when each path involves +expensive work such as collecting large annotations or evaluating complex +regular expressions. These effects multiply across paths and can result in +excessive CPU or memory consumption, leading to denial-of-service. + +Implementations that evaluate untrusted schema are encouraged to take steps to +mitigate these threats with measures such as bounding combinator keyword depth +and breadth, limiting memory used for annotation collection, and guarding +against resource-intensive validations such as pathological regexes. + +### Dynamic References + +The paper ["The Complexity of JSON Schema: Undecidable, Expensive, Yet +Tractable" (Caroni et al., 2024)](https://doi.org/10.1145/3632891) has shown +that validation in the presence of dynamic references is PSPACE-complete. The +paper describes a method for replacing dynamic references with static ones, but +doing so can cause the size of the schema to grow exponentially. Implementations +should be aware of this risk and may wish to implement the method described in +the paper or impose limits on dynamic reference resolution. + +### Infinite Loops and Cycles + +Infinite loops can occur when evaluating schemas that produce cycles during +reference resolution. These cycles may involve multiple schemas. Not all +recursive schemas create loops, but implementations are advised to detect and +break these cycles when they are encountered. + +### Schema Identity and Collisions + +Schemas may declare an `$id` to identify themselves or have embedded schemas +that declare an `$id`. An attacker may attempt to register a schema with an +`$id` that collides with a previously registered schema, or that differs only by +case, encoding, or other URI normalization quirks. Such collisions could result +in overwriting or shadowing of trusted schemas. + +Implementations should consider rejecting schemas that have identifiers +(including embedded schema identifiers) that conflict with registered schemas +and should apply consistent URI normalization and comparison logic to detect and +prevent conflicts. + +### External Schema Resolution + +JSON Schema implementations are expected to resolve external references using a +local registry. Although the specification allows for dynamic retrieval +(`https:` to fetch schemas over HTTP, or `file:` to read schemas from disk), +this behavior is discouraged unless it's intrinsic to the use case, such as with +JSON Hyper-Schema. + +Resolving schemas dynamically introduces several security concerns, each of +which can be mitigated by limiting or controlling resolution behavior. A tightly +scoped schema resolution policy significantly reduces the attack surface, +especially when validating untrusted data. + +Implementations are advised to disable dynamic retrieval by default and limit +external schema resolution to the local registry unless dynamic retrieval is +explicitly enabled. If enabled, they should consider limiting the number of +dynamic retrievals a validation can perform and defining timeouts on dynamic +retrievals to reduce the risk of resource exhaustion. + +#### HTTP(S) Specific Threats + +Allowing schema references to resolve over HTTP or HTTPS introduces several +threats: + +* **Denial of Service (DoS)**: Validation may hang or become slow if a + referenced schema URL is slow to respond or never returns. +* **Server-Side Request Forgery (SSRF)**: Malicious schemas can reference + internal-only services using hostnames like localhost or private IPs. + Implementations are advised to restrict HTTP schema retrieval to a + configurable allowlist of trusted domains. +* **Lack of Integrity Guarantees**: Retrieved schemas may be altered in transit + or change between validations. If network retrieval is allowed, + implementations are advised to only allow retrieval over HTTPS unless + specifically configured to allow unsecured transport. + +#### File System Specific Threats + +Allowing resolution from the local filesystem (`file:` URIs) raises different +issues: + +* **Information Disclosure**: Malicious schemas may access sensitive files on + the system. Implementations should consider restricting filesystem access to + a specific schema directory tree. +* **Cross-Context Access**: A schema fetched from HTTP may try to reference a + schema on the filesystem. Implementations are advised to allow resolving + `file:` references only when the referencing schema was itself loaded from the + file system, similar to same-origin policies in web browsers. +* **Exposing Internal Paths**: Schemas that use `file:` URIs may reveal + host-specific filesystem details in two ways: through the `$id` itself or + through schema locations in validation output. Implementations are advised to + reject `$id` values that use the `file:` scheme. If `file:` URIs are permitted + internally, implementations are advised to sanitize them (for example, by + converting them to relative URIs) to avoid exposing host filesystem structure + to users. + +### Vocabulary-Specific Risks + +Third-party JSON Schema vocabularies may introduce additional risks. +Implementers are advised to consult the specifications of any extensions they +support and take into account their security considerations as well. ## IANA Considerations From 62dde431a817912e1efe8738a33e7939e92f903a Mon Sep 17 00:00:00 2001 From: Jason Desrosiers Date: Fri, 8 Aug 2025 11:08:52 -0700 Subject: [PATCH 2/2] Clarify reference resolution cycle warning --- specs/jsonschema-core.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/specs/jsonschema-core.md b/specs/jsonschema-core.md index 8c35b966..c2395565 100644 --- a/specs/jsonschema-core.md +++ b/specs/jsonschema-core.md @@ -2076,8 +2076,8 @@ the paper or impose limits on dynamic reference resolution. Infinite loops can occur when evaluating schemas that produce cycles during reference resolution. These cycles may involve multiple schemas. Not all -recursive schemas create loops, but implementations are advised to detect and -break these cycles when they are encountered. +recursive schemas create loops, but implementations are advised to detect these +cycles and terminate evaluation when they are encountered. ### Schema Identity and Collisions