Skip to content

Conversation

@pulphix
Copy link
Contributor

@pulphix pulphix commented Nov 17, 2025

Summary

This PR adds an allOf flattener for JSON Schemas used by PydanticAI tools and structured outputs, and keeps the repo compliant with contributing.md (tests + pre-commit all green).

Motivation

Many OpenAI(-compatible) providers, especially in strict structured-output mode, expect a flat object schema and either reject or mis-handle JSON Schema combinators like allOf / oneOf / anyOf in tool/function parameters. Without flattening, perfectly valid schemas generated by Pydantic can cause API errors or unstable model behavior.

What this PR does

allOf flattener for object schemas

In _json_schema.py, we add logic to rewrite allOf between object schemas into a single type: "object" schema that is semantically equivalent in the supported cases:

  • Merges properties
  • Unions required
  • Merges patternProperties
  • Combines additionalProperties conservatively:
    • false only if all members have additionalProperties: false
    • otherwise, falls back to a more permissive setting

Additional behavior:

  • The transformation is recursive: it also applies to nested objects and array items.
  • If an allOf contains non-object members or obviously conflicting structures, the flattener leaves it untouched to avoid changing semantics.

Provider integration

  • OpenAIJsonSchemaTransformer now applies the flattener at the start of transform, so schemas passed to OpenAI (and OpenAI-compatible providers) no longer contain object-level allOf before strict-mode adjustments.
  • GoogleJsonSchemaTransformer does the same before applying Gemini-specific schema tweaks (inlining $defs, removing unsupported keywords, etc.).

Testing

  • All existing tests pass.
  • New tests were added/updated to cover:
    • Flattening of simple and nested object-level allOf
    • Preservation of schemas when allOf contains non-object members or conflicting structures
  • pre-commit hooks run clean, keeping the repo compliant with contributing.md.

@DouweM
Copy link
Collaborator

DouweM commented Nov 17, 2025

Many OpenAI(-compatible) providers, especially in strict structured-output mode, expect a flat object schema and either reject or mis-handle JSON Schema combinators like allOf / oneOf / anyOf in tool/function parameters. Without flattening, perfectly valid schemas generated by Pydantic can cause API errors or unstable model behavior.

@pulphix Can you please share examples of schemas that currently result in API errors, that this PR fixes? If this only applies to OpenAI-"compatible" APIs and not OpenAI itself, we'll need to make sure it only applies to those APIs.


def transform(self, schema: JsonSchema) -> JsonSchema:
# Flatten object-only allOf to improve compatibility
schema = flatten_allof(schema)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not touch the Google transformer as we have a lot of changes underway in #3357.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed Change


def transform(self, schema: JsonSchema) -> JsonSchema: # noqa C901
# First, flatten object-only allOf constructs to avoid unsupported combinators in strict mode
schema = flatten_allof(schema)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be implemented on JsonSchemaTransformer with a keyword argument we can pass in __init__, similar to existing transformation flags?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

transformer = OpenAIJsonSchemaTransformer(schema, strict=True)

# Mock super().walk() to return a result without $defs to test the fallback path
with patch.object(transformer.__class__.__bases__[0], 'walk', return_value={'$ref': '#/$defs/MyModel'}):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is a bit unclear to me and transformer.__class__.__bases__[0] feels a bit brittle. What scenario are we testing exactly? Can't we get there without patching?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need a separate test file here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

],
}

transformer = transformer_factory(schema)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unnecessary as we only test one transformer :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

transformed = transformer.walk()

# allOf should have been flattened by the transformer
assert 'allOf' not in transformed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use == snapshot() as much as possible so we get to see the exact result.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

],
}

flattened = flatten_allof(copy.deepcopy(schema))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to not use the function directly, only the transformer classes.

And as mentioned above: please use a single snapshot() assertion (using the inline-snapshot library) rather than many asserts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@pulphix
Copy link
Contributor Author

pulphix commented Nov 17, 2025

Many OpenAI(-compatible) providers, especially in strict structured-output mode, expect a flat object schema and either reject or mis-handle JSON Schema combinators like allOf / oneOf / anyOf in tool/function parameters. Without flattening, perfectly valid schemas generated by Pydantic can cause API errors or unstable model behavior.

@pulphix Can you please share examples of schemas that currently result in API errors, that this PR fixes? If this only applies to OpenAI-"compatible" APIs and not OpenAI itself, we'll need to make sure it only applies to those APIs.

I'm trying to integrate with notion MCP and getting the following error:

pydantic_ai.exceptions.ModelHTTPError: status_code: 400, model_name: gpt-5-nano-2025-08-07, body: {'message': "Invalid schema for function 'notion__notion-update-page': In context=('properties', 'data'), 'allOf' is not permitted.", 'type': 'invalid_request_error', 'param': 'tools[28].function.parameters', 'code': 'invalid_function_parameters'}
{
  "type": "function",
  "function": {
    "name": "notion__notion-update-page",
    "description": "## Overview\nUpdate a Notion page's properties or content.\n## Properties\nNotion page properties are a JSON map of property names to SQLite values.\nFor pages in a database:\n- ALWAYS use the \"fetch\" tool first to get the data source schema and the\texact property names.\n- Provide a non-null value to update a property's value.\n- Omitted properties are left unchanged.\n\n**IMPORTANT**: Some property types require expanded formats:\n- Date properties: Split into \"date:{property}:start\", \"date:{property}:end\" (optional), and \"date:{property}:is_datetime\" (0 or 1)\n- Place properties: Split into \"place:{property}:name\", \"place:{property}:address\", \"place:{property}:latitude\", \"place:{property}:longitude\", and \"place:{property}:google_place_id\" (optional)\n- Number properties: Use JavaScript numbers (not strings)\n- Checkbox properties: Use \"__YES__\" for checked, \"__NO__\" for unchecked\n\n**Special property naming**: Properties named \"id\" or \"url\" (case insensitive) must be prefixed with \"userDefined:\" (e.g., \"userDefined:URL\", \"userDefined:id\")\nFor pages outside of a database:\n- The only allowed property is \"title\",\twhich is the title of the page in inline markdown format.\n\n## Content\nNotion page content is a string in Notion-flavored Markdown format. See the \"create-pages\" tool description for the full enhanced Markdown spec.\nBefore updating a page's content with this tool, use the \"fetch\" tool first to get the existing content to find out the Markdown snippets to use in the \"replace_content_range\" or \"insert_content_after\" commands.\n## Examples\n\t\t<example description=\"Update page properties\">\n\t\t{\n\t\t\t\"page_id\": \"f336d0bc-b841-465b-8045-024475c079dd\",\n\t\t\t\"command\": \"update_properties\",\n\t\t\t\"properties\": {\n\t\t\t\t\"title\": \"New Page Title\",\n\t\t\t\t\"status\": \"In Progress\",\n\t\t\t\t\"priority\": 5,\n\t\t\t\t\"checkbox\": \"__YES__\",\n\t\t\t\t\"date:deadline:start\": \"2024-12-25\",\n\t\t\t\t\"date:deadline:is_datetime\": 0,\n\t\t\t\t\"place:office:name\": \"HQ\",\n\t\t\t\t\"place:office:latitude\": 37.7749,\n\t\t\t\t\"place:office:longitude\": -122.4194\n\t\t\t}\n\t\t}\n\t\t</example>\n\t\t<example description=\"Replace the entire content of a page\">\n\t\t{\n\t\t\t\"page_id\": \"f336d0bc-b841-465b-8045-024475c079dd\",\n\t\t\t\"command\": \"replace_content\",\n\t\t\t\"new_str\": \"# New Section\nUpdated content goes here\"\n\t\t}\n\t\t</example>\n\t\t<example description=\"Replace specific content in a page\">\n\t\t{\n\t\t\t\"page_id\": \"f336d0bc-b841-465b-8045-024475c079dd\",\n\t\t\t\"command\": \"replace_content_range\",\n\t\t\t\"selection_with_ellipsis\": \"# Old Section...end of section\",\n\t\t\t\"new_str\": \"# New Section\nUpdated content goes here\"\n\t\t}\n\t\t</example>\n\t\t<example description=\"Insert content after specific text\">\n\t\t{\n\t\t\t\"page_id\": \"f336d0bc-b841-465b-8045-024475c079dd\",\n\t\t\t\"command\": \"insert_content_after\",\n\t\t\t\"selection_with_ellipsis\": \"## Previous section...\",\n\t\t\t\"new_str\": \"\n## New Section\nContent to insert goes here\"\n\t\t}\n\t\t</example>\n**Note**: For selection_with_ellipsis, provide only the first ~10 characters, an ellipsis, and the last ~10 characters. Ensure the selection is unique; use longer snippets if needed to avoid ambiguity.",
    "parameters": {
      "type": "object",
      "properties": {
        "data": {
          "allOf": [
            {
              "type": "object",
              "properties": {
                "page_id": {
                  "type": "string",
                  "description": "The ID of the page to update, with or without dashes."
                }
              },
              "required": [
                "page_id"
              ]
            },
            {
              "anyOf": [
                {
                  "type": "object",
                  "properties": {
                    "command": {
                      "type": "string",
                      "enum": [
                        "update_properties"
                      ]
                    },
                    "properties": {
                      "type": "object",
                      "additionalProperties": {
                        "type": [
                          "string",
                          "number",
                          "null"
                        ]
                      },
                      "description": "A JSON object that updates the page's properties. For pages in a database, use the SQLite schema definition shown in <database>. For pages outside of a database, the only allowed property is \"title\", which is the title of the page in inline markdown format. Use null to remove a property's value."
                    }
                  },
                  "required": [
                    "command",
                    "properties"
                  ],
                  "additionalProperties": false
                },
                {
                  "type": "object",
                  "properties": {
                    "command": {
                      "type": "string",
                      "enum": [
                        "replace_content"
                      ]
                    },
                    "new_str": {
                      "type": "string",
                      "description": "The new string to replace all content with."
                    }
                  },
                  "required": [
                    "command",
                    "new_str"
                  ],
                  "additionalProperties": false
                },
                {
                  "type": "object",
                  "properties": {
                    "command": {
                      "type": "string",
                      "enum": [
                        "replace_content_range"
                      ]
                    },
                    "selection_with_ellipsis": {
                      "type": "string",
                      "description": "Unique start and end snippet of the string to replace in the page content, including whitespace. DO NOT provide the entire string to replace. Instead, provide up to the first ~10 characters of the string to replace, an ellipsis, and then up to the last ~10 characters of the string to replace. Make sure you provide enough of the start and end snippet to uniquely identify the string to replace. For example, to replace an entire section, use \"old_start_and_end_snippet\":\"# Section heading...last paragraph.\""
                    },
                    "new_str": {
                      "type": "string",
                      "description": "The new string to replace the old string with."
                    }
                  },
                  "required": [
                    "command",
                    "selection_with_ellipsis",
                    "new_str"
                  ],
                  "additionalProperties": false
                },
                {
                  "type": "object",
                  "properties": {
                    "command": {
                      "type": "string",
                      "enum": [
                        "insert_content_after"
                      ]
                    },
                    "selection_with_ellipsis": {
                      "type": "string",
                      "description": "Unique start and end snippet of the string to match in the page content, including whitespace. DO NOT provide the entire string to match. Instead, provide up to the first ~10 characters of the string to match, an ellipsis, and then up to the last ~10 characters of the string to match. Make sure you provide enough of the start and end snippet to uniquely identify the string to match. For example, to match an entire section, use \"selection_with_ellipsis\":\"# Section heading...last paragraph.\""
                    },
                    "new_str": {
                      "type": "string",
                      "description": "The new content to insert."
                    }
                  },
                  "required": [
                    "command",
                    "selection_with_ellipsis",
                    "new_str"
                  ],
                  "additionalProperties": false
                }
              ]
            }
          ],
          "description": "The data required for updating a page"
        }
      },
      "required": [
        "data"
      ],
      "additionalProperties": false
    },
    "strict": true
  }
}

@pulphix pulphix requested a review from DouweM November 18, 2025 12:28

JsonSchema = dict[str, Any]

__all__ = ['JsonSchemaTransformer', 'InlineDefsJsonSchemaTransformer', 'flatten_allof']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make flatten_allof private and not export it

return s


def flatten_allof(schema: JsonSchema) -> JsonSchema:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may not need this method at all if the JSON transformer can call _recurse_flatten_allof directly


# The transformer should resolve the $ref and use the transformed schema from $defs
# (not the original self.defs, which was the bug we fixed)
assert isinstance(result, dict)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use snapshot, I want to see the entire result

# Simulate edge case: super().walk() returns $defs without root_key
# This shouldn't happen in normal flow, but we test the fallback path
with patch.object(
transformer.__class__.__bases__[0],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel comfortable with this; if we can't come up with a real schema that gets us to that line, do we need it at all?

'a': {'type': 'string'},
'b': {'type': 'integer'},
},
'additionalProperties': True, # When any member has dict additionalProperties, result should be True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that true? If I understand the allOf schema correctly, it would only accept an object with a b key and no other keys, or an empty object.

{
'type': 'object',
# No properties, required, or patternProperties
'additionalProperties': False,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this imply that no properties are allowed at all?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants