Skip to content

Conversation

@btucker
Copy link
Owner

@btucker btucker commented Nov 7, 2025

Summary by CodeRabbit

  • New Features

    • Tool-calling: decorator-based tool registration, runtime tool execution, transcript retrieval, and a new tool-call error type.
    • Session now accepts an optional config and unifies generation parameter handling (temperature/max_tokens).
  • Documentation

    • README and API docs updated with usage examples and a comprehensive tool-calling example.
  • Tests

    • New tests for tool registration, execution, transcript contents, and end-to-end integration.
  • Chores

    • Version bump and packaging/workflow updates for source distribution.

…lugin)

Implemented complete tool calling support for apple-foundation-models-py
following Pydantic AI's decorator-based pattern. All infrastructure is
ready and tested on macOS 26.1, but cannot be used yet due to Apple not
shipping the FoundationModelsMacros compiler plugin required by the
@generable macro.

## What's Implemented

### Python API Layer (applefoundationmodels/)
- Session.tool() decorator for registering functions (session.py:279-368)
- Session.transcript property for conversation history access
- Automatic JSON schema extraction from type hints (tools.py)
- ToolCallError exception for tool-specific errors

### Cython FFI Layer (_foundationmodels.pyx/.pxd)
- Tool callback wrapper for Python↔C marshalling
- register_tools() function with callback bridge
- get_transcript() function for history access
- C declarations in header file

### Swift Layer (foundation_models.swift)
- Tool callback infrastructure with proper types
- Transcript access exposing all entry types (.instructions, .prompt,
  .response, .toolCalls, .toolOutput)
- Session creation with tool support (disabled until macro available)
- PythonToolWrapper implementation (commented out)

### Documentation
- Comprehensive tool calling guide in TOOL_CALLING.md
- README.md updated with API examples and current status
- Verified on macOS 26.1 (Build 25B78)

## Current Status

Tool execution is blocked because:
- The @generable macro requires FoundationModelsMacros compiler plugin
- This plugin is NOT included in macOS 26.1 (Build 25B78)
- Apple explicitly requires @generable structs for Tool.Arguments
- Tested with simple types - all blocked with clear error messages

Error from Apple's framework:
```
error: 'Tool' that uses 'String' as 'Arguments' type is unsupported.
Use '@generable' struct instead.
```

## What Works

✅ Complete Python API - decorator registration, schema extraction
✅ Cython FFI layer - callback marshalling, tool registration
✅ Swift infrastructure - callbacks, transcript access
✅ Builds successfully - wheel compiles without errors
✅ All code is production-ready and tested

## When It Will Work

The implementation will work immediately when Apple ships the
FoundationModelsMacros plugin. No code changes needed - just uncomment
the PythonToolWrapper struct and enable tool registration.

Related: FB[pending] - Missing FoundationModelsMacros plugin in macOS 26.1
Major improvements:
- Modified setup.py to detect and use Xcode toolchain for @generable macro support
- Fixed transcript JSON serialization with proper NSString casting
- Implemented session recreation when tools are registered
- Tools are now successfully called by the model

Tool calling flow now works:
1. Tools registered via @session.tool decorator
2. Session recreated with tools enabled in FoundationModels
3. Model successfully invokes tools when appropriate
4. Python callback executes and returns results to model

Current limitation:
- @generable Arguments struct is empty, so parameters are not yet extracted
- Tool callbacks receive empty JSON ({}) instead of actual parameters
- Need to solve how to pass arbitrary parameters through @generable

Tested with tools that require no parameters - works perfectly.
Tools with parameters are called but don't receive parameter values yet.

This enables basic tool calling infrastructure. Parameter passing to be
addressed in follow-up work.
…onSchema

BREAKTHROUGH: Tool calling now fully operational with arbitrary parameters!

The solution uses GeneratedContent as the Arguments type, which:
- Is already Generable (conforms to ConvertibleFromGeneratedContent)
- Has a jsonString property for direct JSON extraction
- Supports dynamic schemas via GenerationSchema(root: DynamicGenerationSchema)

Key changes:
1. PythonToolWrapper now uses GeneratedContent instead of @generable struct
2. Parameters defined using DynamicGenerationSchema at runtime
3. JSON parameters extracted via arguments.jsonString
4. Fixed Cython buffer writing using memcpy instead of byte-by-byte

This elegantly combines structured output patterns with tool calling,
allowing arbitrary parameter schemas defined at registration time.

Tested with 5 different tool signatures:
✅ No parameters
✅ Single string parameter
✅ Multiple string parameters
✅ Mixed types (string + int)
✅ Optional parameters with defaults

All tools successfully registered, called by model, and executed with
correct parameter values. This completes the tool calling implementation.
Demonstrates all supported tool parameter patterns:
- No parameters
- Single parameters
- Multiple parameters
- Mixed types (string + int)
- Optional parameters

Each example includes docstrings and shows real-world use cases.
- Removed 'unavailable' notice - tool calling now works!
- Updated examples section with tool_calling_comprehensive.py
- Added supported parameter types section
- Removed outdated status information about waiting for Apple's plugin
@coderabbitai
Copy link

coderabbitai bot commented Nov 7, 2025

Walkthrough

Adds end-to-end tool-calling: Python tool decorator and schema extraction, session-level tool registration and transcript access, Cython/Swift FFI callback ABI and new error codes, example and tests, new ToolCallError and NormalizedGenerationParams, plus packaging/CI adjustments.

Changes

Cohort / File(s) Summary
Documentation
README.md
Adds Tool Calling docs and examples; documents Session.tool(description: str = None, name: str = None), Session.transcript property, and ToolCallError; updates Examples to examples/tool_calling_comprehensive.py.
Python session & client API
applefoundationmodels/session.py, applefoundationmodels/client.py, applefoundationmodels/exceptions.py
Session.__init__ gains optional config; adds _tools, _tools_registered, _config; adds tool() decorator, _register_tools() helper, and transcript property; Client.create_session forwards config; new ToolCallError exception class.
Tool utilities
applefoundationmodels/tools.py
New module: converts Python type hints to JSON Schema, extracts function schemas, attaches tool metadata, and exposes tool() decorator; raises ToolCallError on schema extraction/validation failures.
Cython FFI layer
applefoundationmodels/_foundationmodels.pxd, applefoundationmodels/_foundationmodels.pyx, applefoundationmodels/_foundationmodels.pyi
Adds tool-related error codes and BUFFER_TOO_SMALL; adds ai_tool_callback_t typedef; declares and implements apple_ai_register_tools()/register_tools() and apple_ai_get_transcript()/get_transcript(); implements callback wrapper that invokes Python-registered tools and writes results into buffers.
Swift native layer
applefoundationmodels/swift/foundation_models.h, applefoundationmodels/swift/foundation_models.swift
Adds C callback type and public APIs (apple_ai_register_tools/apple_ai_get_transcript), global registeredTools and PythonToolWrapper bridge, session creation updated to accept tools, transcript serialization includes tool_calls/tool_output, and AIResult error cases expanded.
Examples & tests
examples/tool_calling_comprehensive.py, tests/test_tools.py
New comprehensive example demonstrating varied tool signatures; tests covering registration, schema extraction, execution, transcript shape, and integration (readiness-gated).
Build & packaging
setup.py, pyproject.toml, MANIFEST.in, .github/workflows/publish-to-pypi.yml
Setup uses Xcode/sdk when available for Swift build; pyproject version bumped to 0.1.5 and package-data/exclude adjusted; MANIFEST.in now includes Swift sources; CI switched to build sdist and validate Swift sources.
Types
applefoundationmodels/types.py
Adds NormalizedGenerationParams dataclass with from_optional helper to normalize generation params for FFI; extends Result enum with BUFFER_TOO_SMALL.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant User as User code
    participant Session as Session (Python)
    participant Tools as Python Tool Registry
    participant FFI as Cython FFI
    participant Swift as Swift/Model

    User->>Session: decorate function with `@session.tool`
    Session->>Tools: extract schema & attach metadata
    Session->>FFI: register_tools(tools_json, callback)
    FFI->>Swift: apple_ai_register_tools(...)
    Swift-->>FFI: ack

    User->>Session: session.generate(prompt)
    Session->>Swift: invoke model (session with tools)
    Swift->>Swift: decides to call tool
    Swift->>FFI: call tool callback (tool_name, args_json)
    FFI->>Tools: invoke registered Python tool
    Tools-->>FFI: result_json or error
    FFI-->>Swift: return result or error code
    Swift->>Swift: record tool_call/tool_output in transcript
    Swift-->>Session: model response

    User->>Session: session.transcript
    Session->>FFI: get_transcript()
    FFI->>Swift: apple_ai_get_transcript()
    Swift-->>FFI: JSON transcript
    FFI-->>Session: decoded List[dict]
    Session-->>User: transcript list
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Pay attention to:
    • Cython ↔ Swift ABI: callback typedef, buffer sizing, ownership, and encoding/decoding.
    • Error-code mapping across .pxd/.pyx/.swift/.py and raising the correct exceptions (ToolCallError).
    • Schema extraction in applefoundationmodels/tools.py for Optional/Union, forward refs, nested containers.
    • Transcript JSON shape and ordering between native serialization and Python decoding.
    • Tests that depend on readiness gating (Client.is_ready()) and their flakiness.

Poem

🐇 I hop through types and stitch a schema bright,
I pass JSON carrots in the soft Swift light,
Callbacks hum, transcripts keep the trail,
Tools answer back in every little detail,
CodeRabbit cheers — the bridge is built just right!

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Support tool calling' accurately summarizes the main objective of this pull request, which adds comprehensive tool calling functionality to the Apple Foundation Models library.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/tool-calling

Comment @coderabbitai help to get the list of available commands and usage tips.

Tool calling now works completely using GeneratedContent and
DynamicGenerationSchema. The TOOL_CALLING.md doc was written when
we thought it wouldn't work due to missing macro plugin, but that's
no longer true. All relevant documentation is now in README.md.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (4)
setup.py (1)

44-60: Consider dynamic Xcode detection instead of hardcoded paths.

The current implementation checks for Xcode at a hardcoded path /Applications/Xcode.app and uses hardcoded SDK paths. This won't detect Xcode installations in non-standard locations or multiple Xcode versions.

Consider using xcode-select or xcrun to dynamically detect the active Xcode:

-    # Check for Xcode (required for @Generable macro support)
-    xcode_path = Path("/Applications/Xcode.app")
-    if xcode_path.exists():
-        # Use Xcode's toolchain which includes the FoundationModelsMacros plugin
-        swift_compiler = "xcrun"
-        sdk_args = [
-            "-sdk", "/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk"
-        ]
-        env = {**os.environ, "DEVELOPER_DIR": "/Applications/Xcode.app/Contents/Developer"}
-        print("✓ Using Xcode toolchain (@Generable macro support enabled)")
+    # Check for Xcode (required for @Generable macro support)
+    try:
+        # Use xcode-select to find active Xcode
+        result = subprocess.run(["xcode-select", "-p"], capture_output=True, text=True, check=True)
+        developer_dir = result.stdout.strip()
+        
+        # Use xcrun to get SDK path
+        result = subprocess.run(["xcrun", "--show-sdk-path"], capture_output=True, text=True, check=True)
+        sdk_path = result.stdout.strip()
+        
+        swift_compiler = "xcrun"
+        sdk_args = ["-sdk", sdk_path]
+        env = {**os.environ, "DEVELOPER_DIR": developer_dir}
+        print("✓ Using Xcode toolchain (@Generable macro support enabled)")
+    except (subprocess.CalledProcessError, FileNotFoundError):
+        # Fall back to command line tools
applefoundationmodels/_foundationmodels.pyx (1)

283-318: Document that register_tools replaces all existing tools.

Line 296 replaces the entire tool registry with _registered_tools = tools.copy(), discarding any previously registered tools. This appears intentional based on the session-level usage pattern, but should be documented to prevent confusion.

Consider adding a docstring note:

 def register_tools(tools: Dict[str, Callable]) -> None:
     """
     Register tool functions for model to call.
+    
+    Note: This replaces all previously registered tools. Tools should be
+    registered once per session with the complete set of available tools.

     Args:
         tools: Dictionary mapping tool names to callable functions
TOOL_CALLING.md (1)

47-51: Add language specifier to fenced code block.

The error message code block should specify a language for proper syntax highlighting.

-```
+```text
 external macro implementation type 'FoundationModelsMacros.GenerableMacro'
 could not be found for macro 'Generable(description:)'
 plugin for module 'FoundationModelsMacros' not found

</blockquote></details>
<details>
<summary>examples/tool_calling_comprehensive.py (1)</summary><blockquote>

`109-109`: **Rename unused loop variable.**

The `expected_tool` variable is defined but never used within the loop body. Python convention is to prefix unused variables with an underscore.



```diff
-        for i, (query, expected_tool) in enumerate(test_queries, 1):
+        for i, (query, _expected_tool) in enumerate(test_queries, 1):
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 08406a9 and da17f8b.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (12)
  • README.md (4 hunks)
  • TOOL_CALLING.md (1 hunks)
  • applefoundationmodels/_foundationmodels.pxd (3 hunks)
  • applefoundationmodels/_foundationmodels.pyx (2 hunks)
  • applefoundationmodels/client.py (1 hunks)
  • applefoundationmodels/exceptions.py (1 hunks)
  • applefoundationmodels/session.py (5 hunks)
  • applefoundationmodels/swift/foundation_models.h (2 hunks)
  • applefoundationmodels/swift/foundation_models.swift (6 hunks)
  • applefoundationmodels/tools.py (1 hunks)
  • examples/tool_calling_comprehensive.py (1 hunks)
  • setup.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
examples/tool_calling_comprehensive.py (2)
applefoundationmodels/client.py (1)
  • Client (17-187)
applefoundationmodels/session.py (1)
  • transcript (355-377)
applefoundationmodels/client.py (1)
applefoundationmodels/session.py (1)
  • Session (24-377)
applefoundationmodels/tools.py (2)
applefoundationmodels/exceptions.py (1)
  • ToolCallError (92-95)
applefoundationmodels/session.py (2)
  • tool (281-332)
  • decorator (308-330)
applefoundationmodels/session.py (2)
applefoundationmodels/tools.py (2)
  • extract_function_schema (73-151)
  • tool (154-191)
applefoundationmodels/client.py (1)
  • create_session (125-164)
applefoundationmodels/swift/foundation_models.swift (2)
applefoundationmodels/session.py (2)
  • tool (281-332)
  • transcript (355-377)
applefoundationmodels/tools.py (1)
  • tool (154-191)
🪛 LanguageTool
TOOL_CALLING.md

[grammar] ~5-~5: Use a hyphen to join words.
Context: ...ntains a complete implementation of tool calling support for apple-foundation-mod...

(QB_NEW_EN_HYPHEN)

🪛 markdownlint-cli2 (0.18.1)
TOOL_CALLING.md

47-47: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Ruff (0.14.3)
examples/tool_calling_comprehensive.py

109-109: Loop control variable expected_tool not used within loop body

Rename unused expected_tool to _expected_tool

(B007)


117-117: Do not catch blind exception: Exception

(BLE001)

applefoundationmodels/tools.py

96-96: Do not catch blind exception: Exception

(BLE001)


144-148: Consider moving this statement to an else block

(TRY300)


150-150: Do not catch blind exception: Exception

(BLE001)


151-151: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


151-151: Avoid specifying long messages outside the exception class

(TRY003)

setup.py

77-77: subprocess call: check for execution of untrusted input

(S603)

🔇 Additional comments (13)
applefoundationmodels/_foundationmodels.pyx (3)

201-202: Verify thread-safety of global tool registry.

The global _registered_tools dictionary could face race conditions if multiple threads register or execute tools simultaneously. While Python's GIL provides some protection for dict operations, the registration-execution flow may not be atomic.

Consider whether the library supports multi-threaded tool registration/execution, and if so, whether additional synchronization is needed.


205-281: LGTM! Buffer handling is correct.

The callback wrapper properly prevents buffer overflows:

  • Checks buffer size before writing
  • Truncates content to buffer_size-1 to leave room for null terminator
  • Consistently null-terminates all string outputs
  • Handles errors with appropriate error codes

The use of memcpy and manual buffer management is appropriate for the C FFI interface.


320-343: LGTM! Proper memory management.

The transcript retrieval correctly:

  • Checks for NULL return
  • Decodes and parses JSON
  • Ensures memory is freed in a finally block even on error
applefoundationmodels/exceptions.py (1)

92-96: LGTM! Exception follows established pattern.

The new ToolCallError exception correctly inherits from FoundationModelsError and follows the same pattern as other exceptions in the hierarchy.

applefoundationmodels/client.py (1)

162-162: LGTM! Config correctly passed to Session.

The change properly passes the optional configuration dictionary to the Session constructor, aligning with the updated Session API.

TOOL_CALLING.md (1)

1-233: Excellent comprehensive documentation.

The documentation clearly explains:

  • Current status and limitations (waiting for Apple's macro plugin)
  • Three-layer architecture
  • Usage examples
  • Implementation details
  • Next steps

This will be very helpful for users and future contributors.

examples/tool_calling_comprehensive.py (2)

113-119: Broad exception handling is appropriate for this demo.

While catching bare Exception is flagged by static analysis, it's acceptable here since this is a demonstration script that wants to report all errors to the user and exit cleanly.


36-96: Excellent demonstration of tool variety.

The example showcases diverse tool signatures:

  • No parameters (get_time)
  • Single parameter (get_weather)
  • Multiple parameters (search_docs, get_top_items)
  • Optional parameters (calculate)
  • Mixed types (string, int)

This provides a comprehensive reference for users.

applefoundationmodels/swift/foundation_models.h (2)

20-24: LGTM! Callback typedef is well-defined.

The ai_tool_callback_t typedef correctly specifies:

  • Return type for error codes
  • Tool name and arguments as input
  • Result buffer with size for output

This matches the Cython implementation's expectations.


38-41: LGTM! C API declarations are correct.

The function declarations properly expose:

  • Tool registration with callback
  • Transcript retrieval

These align with the Swift and Cython layers.

README.md (3)

160-258: Excellent tool calling documentation.

The new Tool Calling section provides:

  • Clear usage examples with practical code
  • Feature highlights (auto schema generation, parallel execution, transcript access)
  • Schema extraction explanation
  • Transcript access patterns
  • Supported parameter types

This gives users everything they need to get started with tool calling.


356-358: API reference correctly documents new Session methods.

The additions properly document:

  • tool() decorator for registering tools
  • transcript property for accessing conversation history

411-411: New exception properly documented.

ToolCallError is correctly added to the exception hierarchy with an appropriate description.

Changed tool registration to fail immediately when:
- Required fields (name, description, parameters) are missing
- JSON schema conversion fails

Benefits:
- Clear error messages with tool name and index
- Prevents partial tool registration
- Returns appropriate error codes (errorInvalidParams or errorJSONParse)
- Clears registeredTools on error to prevent inconsistent state

Previous behavior silently continued and returned success even when
some tools failed to register, which could lead to confusing runtime
errors when the model tried to call missing tools.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
applefoundationmodels/tools.py (1)

33-37: Critical: Optional/Union types are not recognized.

As flagged in the previous review, this check never matches Optional[X] because get_origin(Optional[X]) returns Union, not type(None). Consequently, Optional[str] parameters are incorrectly treated as required strings instead of nullable, causing tool calls with null values to fail with AI_ERROR_INVALID_PARAMS.

The previous review provided a fix. Apply this diff to handle Union/Optional types correctly:

-import inspect
-from typing import Callable, Dict, Any, Optional, get_type_hints, get_origin, get_args
+import inspect
+import types
+from typing import Callable, Dict, Any, Optional, Union, get_type_hints, get_origin, get_args
     # Get the origin type for generic types (e.g., List, Dict, Optional)
     origin = get_origin(python_type)
+    args = get_args(python_type)
 
-    # Handle Optional[X] as union of X and null
-    if origin is type(None) or (
-        hasattr(python_type, "__origin__") and python_type.__origin__ is type(None)
-    ):
-        return {"type": "null"}
+    # Handle Optional/Union types (Optional[T] is Union[T, None])
+    if origin in (Union, types.UnionType):
+        variants = []
+        for variant in args:
+            if variant is type(None):
+                variants.append({"type": "null"})
+            else:
+                variants.append(python_type_to_json_schema(variant))
+        if len(variants) == 1:
+            return variants[0]
+        return {"anyOf": variants}
     elif python_type is list or origin is list:
         # Get the item type if specified
-        args = get_args(python_type)
         if args:
     elif python_type is dict or origin is dict:
         # Get key/value types if specified
-        args = get_args(python_type)
         if len(args) == 2:
🧹 Nitpick comments (2)
applefoundationmodels/tools.py (2)

64-69: Remove redundant checks.

Lines 48-63 already handle origin is list and origin is dict, making this block unreachable.

-    elif origin is not None:
-        # Handle other generic types - try to extract the origin
-        if origin is list:
-            return {"type": "array"}
-        elif origin is dict:
-            return {"type": "object"}
-
     # If we can't determine the type, default to string
     return {"type": "string"}

152-155: Improve exception chaining.

Use raise ... from e to preserve the original traceback for easier debugging.

     except Exception as e:
-        raise ToolCallError(
+        raise ToolCallError(
             f"Failed to extract schema from function '{func.__name__}': {e}"
-        )
+        ) from e
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between da17f8b and d5320dd.

📒 Files selected for processing (2)
  • applefoundationmodels/session.py (5 hunks)
  • applefoundationmodels/tools.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
applefoundationmodels/tools.py (2)
applefoundationmodels/exceptions.py (1)
  • ToolCallError (92-95)
applefoundationmodels/session.py (2)
  • tool (290-341)
  • decorator (317-339)
applefoundationmodels/session.py (2)
applefoundationmodels/tools.py (3)
  • extract_function_schema (75-155)
  • tool (158-195)
  • decorator (178-193)
applefoundationmodels/client.py (1)
  • create_session (125-164)
🪛 GitHub Actions: Tests
applefoundationmodels/tools.py

[error] 189-189: "Callable[..., Any]" has no attribute "_tool_name" [attr-defined]


[error] 190-190: "Callable[..., Any]" has no attribute "_tool_description" [attr-defined]


[error] 191-191: "Callable[..., Any]" has no attribute "_tool_parameters" [attr-defined]

applefoundationmodels/session.py

[error] 332-332: "Callable[..., Any]" has no attribute "_tool_name" [attr-defined]


[error] 333-333: "Callable[..., Any]" has no attribute "_tool_description" [attr-defined]


[error] 334-334: "Callable[..., Any]" has no attribute "_tool_parameters" [attr-defined]


[error] 354-354: Module has no attribute "register_tools" [attr-defined]


[error] 386-386: Returning Any from function declared to return "list[dict[str, Any]]" [no-any-return]


[error] 386-386: Module has no attribute "get_transcript" [attr-defined]

🪛 Ruff (0.14.3)
applefoundationmodels/tools.py

98-98: Do not catch blind exception: Exception

(BLE001)


146-150: Consider moving this statement to an else block

(TRY300)


152-152: Do not catch blind exception: Exception

(BLE001)


153-155: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


153-155: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (2)
applefoundationmodels/session.py (2)

46-46: LGTM!

The constructor properly initializes tool management state with sensible defaults.

Also applies to: 59-61


9-18: LGTM!

Imports are correctly added for the new tool functionality.

Also applies to: 27-27

btucker and others added 7 commits November 7, 2025 15:02
Added tests/test_tools.py with 10 test cases covering:

Tool Registration:
- No parameters
- Single string parameter
- Multiple parameters
- Mixed types (string + int)
- Optional parameters with defaults

Tool Execution:
- Multiple tools in one session
- Different return types

Transcript:
- Tool calls in transcript
- Entry structure validation

Integration:
- End-to-end tool calling flow

All tests passing (10/10). Tests verify that tools are registered
correctly, called by the model with proper parameters, and results
are integrated into responses.
- Include Swift source in source distribution (MANIFEST.in)
- Exclude source files from wheels (pyproject.toml)
- Publish only sdist to PyPI until macOS 26.0 wheels are supported
- Remove build warnings from MANIFEST.in exclusions
This refactoring addresses 10 major DRY violations across Python, Swift,
and test code, significantly improving maintainability and code quality.

Python Changes:
- Extract shared attach_tool_metadata() function used by both tools.py and
  session.py, eliminating 40 lines of duplicate decorator logic
- Refactor python_type_to_json_schema() using lookup tables and helper
  functions, reducing cyclomatic complexity from 9 to 4
- Add NormalizedGenerationParams dataclass for type-safe parameter handling
- Add type stubs for register_tools() and get_transcript() to resolve mypy
  errors

Swift Changes:
- Consolidate three duplicate session creation functions into single
  createSession() with parameters, saving 60 lines
- Add SchemaConversionError with full property path tracking for better
  debugging of schema conversion failures
- Refactor schema conversion to use Result type with proper error context
  propagation through recursive calls
- Eliminate duplicate GeneratedContent extraction by extracting
  extractStructure() and extractArray() helpers

Test Improvements:
- Add ToolTestHarness helper class to reduce test boilerplate by 100+ lines
- Provides automatic call tracking with assert_called_once() and
  assert_called_with() helpers
- Makes test intent clearer by separating test mechanics from assertions

Impact:
- 288 fewer lines of duplicated code
- 15-point reduction in cyclomatic complexity
- Better error messages with full property paths
- Single source of truth for shared logic
- All 44 tests passing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
applefoundationmodels/tools.py (1)

117-141: Optional unions still reject None.

python_type_to_json_schema strips None out of every optional: Optional[str] becomes {"type": "string"} and str | None is treated as an unknown type that falls back to "string". As soon as the model sends null for a parameter that the signature explicitly allows, the schema validation fails and the platform responds with AI_ERROR_INVALID_PARAMS instead of calling the tool. The PEP 604 form is especially common now, so this bites immediately.

Please keep the null variant in the generated schema and recognise both typing.Union and types.UnionType when detecting optionals. One straightforward fix is to normalise the union and emit an anyOf that includes the nested schema plus {"type": "null"}. For example:

@@
-import inspect
-from typing import Callable, Dict, Any, Optional, get_type_hints, get_origin, get_args, Union
+import inspect
+import types
+from typing import Callable, Dict, Any, Optional, get_type_hints, get_origin, get_args, Union
@@
-    if origin is Union:
+    if origin in (Union, types.UnionType):
         args = get_args(python_type)
         return type(None) in args
@@
-    if is_optional_type(python_type):
-        inner_type = unwrap_optional(python_type)
-        return python_type_to_json_schema(inner_type)
+    if is_optional_type(python_type):
+        variants = []
+        for variant in get_args(python_type):
+            if variant is type(None):
+                variants.append({"type": "null"})
+            else:
+                variants.append(python_type_to_json_schema(variant))
+        if len(variants) == 1:
+            return variants[0]
+        return {"anyOf": variants}

(If you keep unwrap_optional, make it return every non-None variant so multi-type unions keep all branches.) With this change, both Optional[T] and T | None accept a null payload as intended.

🧹 Nitpick comments (1)
tests/test_tools.py (1)

127-138: Consider asserting on the response or removing the assignment.

The response variable is assigned but never used. Either add an assertion to verify the response content, or remove the assignment if you only want to verify the tool was called.

If you want to keep the assignment to verify no exception is raised, you can silence the warning:

-        response = session.generate("Search for 'authentication' in the API category")
+        _ = session.generate("Search for 'authentication' in the API category")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d5320dd and f62fc19.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (9)
  • .github/workflows/publish-to-pypi.yml (1 hunks)
  • MANIFEST.in (1 hunks)
  • applefoundationmodels/_foundationmodels.pyi (1 hunks)
  • applefoundationmodels/session.py (10 hunks)
  • applefoundationmodels/swift/foundation_models.swift (11 hunks)
  • applefoundationmodels/tools.py (1 hunks)
  • applefoundationmodels/types.py (2 hunks)
  • pyproject.toml (2 hunks)
  • tests/test_tools.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
applefoundationmodels/session.py (6)
applefoundationmodels/base.py (1)
  • ContextManagedResource (13-36)
applefoundationmodels/types.py (3)
  • GenerationParams (72-89)
  • NormalizedGenerationParams (93-129)
  • from_optional (109-129)
applefoundationmodels/pydantic_compat.py (1)
  • normalize_schema (116-144)
applefoundationmodels/tools.py (4)
  • extract_function_schema (144-224)
  • attach_tool_metadata (227-259)
  • tool (262-288)
  • decorator (282-286)
applefoundationmodels/_foundationmodels.pyi (6)
  • generate (23-23)
  • generate_structured (26-31)
  • generate_stream (34-39)
  • register_tools (47-47)
  • create_session (20-20)
  • get_transcript (48-48)
applefoundationmodels/client.py (1)
  • create_session (125-164)
tests/test_tools.py (2)
applefoundationmodels/client.py (1)
  • Client (17-187)
applefoundationmodels/session.py (4)
  • decorator (318-328)
  • tool (291-330)
  • generate (90-121)
  • transcript (353-375)
applefoundationmodels/swift/foundation_models.swift (2)
applefoundationmodels/session.py (2)
  • tool (291-330)
  • transcript (353-375)
applefoundationmodels/tools.py (1)
  • tool (262-288)
applefoundationmodels/tools.py (3)
applefoundationmodels/exceptions.py (1)
  • ToolCallError (92-95)
applefoundationmodels/session.py (2)
  • tool (291-330)
  • decorator (318-328)
tests/test_tools.py (1)
  • decorator (31-51)
🪛 Ruff (0.14.3)
tests/test_tools.py

21-21: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)


133-133: Local variable response is assigned to but never used

Remove assignment to unused variable response

(F841)


309-309: Avoid equality comparisons to True; use results.get('called'): for truth checks

Replace with results.get('called')

(E712)

applefoundationmodels/tools.py

167-167: Do not catch blind exception: Exception

(BLE001)


215-219: Consider moving this statement to an else block

(TRY300)


221-221: Do not catch blind exception: Exception

(BLE001)


222-224: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


222-224: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Run tests on macOS (3.9)
🔇 Additional comments (17)
MANIFEST.in (1)

1-12: LGTM! Source distribution packaging is correctly configured.

The changes appropriately:

  • Include Swift sources needed for compilation
  • Exclude build artifacts via prune lib
  • Maintain documentation files

This aligns with the shift to publishing source distributions that contain the Swift/FFI layer.

applefoundationmodels/types.py (1)

92-129: LGTM! Clean dataclass design for normalized parameters.

The NormalizedGenerationParams dataclass provides a type-safe way to pass generation parameters to FFI functions. The from_optional classmethod cleanly handles defaults.

.github/workflows/publish-to-pypi.yml (1)

13-52: LGTM! Workflow correctly shifted to source distribution publishing.

The changes appropriately:

  • Build and verify source distributions containing Swift sources
  • Document the reason for disabling wheel builds (PyPI platform tag support)
  • Maintain well-structured commented code for future wheel re-enabling
pyproject.toml (1)

52-60: LGTM! Package data configuration correctly excludes sources and includes type stubs.

The changes appropriately:

  • Exclude Cython sources (*.pxd, *.pyx), C files, and Swift sources from package data
  • Include Python type stubs (*.pyi) for static type checking
  • Maintain compiled artifacts (*.dylib)

This aligns with distributing only runtime-necessary files.

applefoundationmodels/_foundationmodels.pyi (1)

46-48: LGTM! Type stubs correctly declare the tool calling API.

The function signatures are well-typed:

  • register_tools accepts a dictionary mapping tool names to callables
  • get_transcript returns a list of transcript entry dictionaries

These stubs enable proper type checking for the new tool calling functionality.

tests/test_tools.py (2)

14-87: LGTM! Well-designed test infrastructure.

The ToolTestHarness and fixtures provide a clean, reusable pattern for testing tool calling functionality. The harness correctly:

  • Captures tool invocations
  • Preserves function metadata for schema extraction
  • Provides assertion helpers

90-244: LGTM! Comprehensive test coverage for tool calling.

The test suite thoroughly exercises:

  • Tools with varying parameter signatures (none, single, multiple, mixed types, optional)
  • Multiple tool registration
  • Different return types
  • Optional type annotations
  • Transcript structure and content

This provides excellent coverage of the tool calling API.

applefoundationmodels/session.py (5)

45-60: LGTM! Session initialization correctly supports tool configuration.

The changes appropriately:

  • Accept optional config for session creation
  • Initialize tool registry state
  • Maintain backward compatibility

75-88: LGTM! Parameter normalization is cleaner with the dataclass.

Using NormalizedGenerationParams.from_optional() provides type-safe parameter handling for FFI calls.


291-330: LGTM! Tool decorator provides clean registration API.

The decorator:

  • Automatically extracts JSON schema from function signatures
  • Attaches metadata for runtime access
  • Registers tools with the FFI layer
  • Provides a Pythonic API surface

This is a well-designed pattern for tool registration.


352-375: LGTM! Transcript property provides structured conversation history.

The property:

  • Exposes full conversation history including tool calls
  • Documents the expected entry structure
  • Provides a clean read-only API

This aligns with the broader tool calling feature.


332-350: Clarify tool registration timing and confirm typical usage pattern.

The concern about history loss assumes tools are registered after generation occurs, which is not the typical documented workflow. All tests and examples show tools registered immediately (via decorator) before any generation—at which point no history exists to lose.

However, the session recreation via _foundationmodels.create_session() in _register_tools() relies on opaque FFI layer behavior. If your use case involves generating first and then registering tools post-generation, verify that history is preserved by:

  • Testing session.generate() followed by @session.tool() decorator, then checking session.get_history()
  • Consulting underlying FFM framework documentation on session recreation semantics

For the standard pattern (tools registered first), no issue exists.

applefoundationmodels/swift/foundation_models.swift (5)

45-113: LGTM! Tool wrapper correctly bridges Swift and Python.

The PythonToolWrapper implementation:

  • Properly implements the Tool protocol
  • Uses GeneratedContent with dynamic schemas for flexible argument handling
  • Manages C-FFI buffer allocation safely with defer
  • Returns structured errors with context

This is a solid bridging layer.


137-200: LGTM! Session creation refactored for tool support.

The helper functions cleanly handle:

  • Optional instructions and tools
  • Forced session recreation
  • All combinations via switch statement

This improves maintainability and clarity.


287-361: LGTM! Tool registration now fails fast with proper error handling.

The implementation correctly:

  • Validates required fields (name, description, parameters) before processing
  • Returns appropriate error codes immediately on validation failure
  • Clears partial registrations on error
  • Provides detailed error messages with tool names/indices

This addresses the previous review concern about silent failures.


641-783: LGTM! Schema conversion provides robust error handling.

The implementation:

  • Uses Result types for proper error propagation
  • Tracks full path context in errors
  • Fails fast on unsupported types
  • Recursively handles nested schemas

This provides clear error messages when schema conversion fails.


786-820: LGTM! Content extraction helpers handle nested structures correctly.

The recursive extraction logic:

  • Properly handles all GeneratedContent kinds
  • Propagates errors via throws
  • Converts to native Swift/Foundation types for JSON serialization

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
tests/test_tools.py (1)

135-141: Consider asserting on the response content.

The response variable is assigned but not used. While the test correctly verifies tool invocation with proper arguments, asserting that the response contains expected content (e.g., "5 documents" or "authentication") would strengthen the test coverage.

applefoundationmodels/tools.py (1)

230-233: Preserve exception chain for better debugging.

The exception wrapping loses the original traceback context. Using raise ... from e would preserve the full exception chain for easier debugging.

Apply this diff:

     except Exception as e:
-        raise ToolCallError(
-            f"Failed to extract schema from function '{func.__name__}': {e}"
-        )
+        raise ToolCallError(
+            f"Failed to extract schema from function '{func.__name__}': {e}"
+        ) from e
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f62fc19 and c8c7cbc.

📒 Files selected for processing (2)
  • applefoundationmodels/tools.py (1 hunks)
  • tests/test_tools.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
applefoundationmodels/tools.py (3)
applefoundationmodels/exceptions.py (1)
  • ToolCallError (92-95)
applefoundationmodels/session.py (2)
  • tool (291-330)
  • decorator (318-328)
tests/test_tools.py (1)
  • decorator (32-52)
tests/test_tools.py (1)
applefoundationmodels/session.py (4)
  • decorator (318-328)
  • tool (291-330)
  • generate (90-121)
  • transcript (353-375)
🪛 Ruff (0.14.3)
applefoundationmodels/tools.py

176-176: Do not catch blind exception: Exception

(BLE001)


224-228: Consider moving this statement to an else block

(TRY300)


230-230: Do not catch blind exception: Exception

(BLE001)


231-233: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


231-233: Avoid specifying long messages outside the exception class

(TRY003)

tests/test_tools.py

21-21: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)


135-135: Local variable response is assigned to but never used

Remove assignment to unused variable response

(F841)


308-308: Avoid equality comparisons to True; use results.get("called"): for truth checks

Replace with results.get("called")

(E712)

🔇 Additional comments (5)
tests/test_tools.py (1)

187-285: LGTM!

The test coverage for tool execution and transcript functionality is comprehensive. The tests properly verify tool invocation, return type handling, optional parameters, and transcript structure.

applefoundationmodels/tools.py (4)

1-35: LGTM!

The module structure, imports, and type mapping tables are well-organized. The use of lookup tables for type conversion is efficient.


55-72: Note: Multi-type Union handling.

The unwrap_optional function returns the first non-None type from a Union. This works correctly for Optional[X] (i.e., Union[X, None]), but for unions with multiple non-None types like Union[str, int, None], only the first type is returned. This is likely acceptable for tool calling scenarios where complex unions are rare, but worth noting.


109-150: LGTM!

The type-to-schema conversion logic is well-structured with appropriate handling for None types, Optional types, basic types, and containers. The fallback to "string" for unknown types is a reasonable default.


236-297: LGTM!

The attach_tool_metadata helper and tool decorator are cleanly implemented. The dynamic attribute assignments are properly annotated with type ignore comments to suppress mypy warnings.

btucker and others added 3 commits November 7, 2025 15:55
The transcript entries for tool calls and tool outputs didn't match
the documented shape in Session.transcript. This fixes the structure
to fulfill the contract.

Changes:
- Replace single "tool_calls" entry with individual "tool_call" entries
  per call, each containing:
  - type: "tool_call"
  - tool_id: unique identifier
  - tool_name: name of the tool
  - arguments: JSON string of the arguments

- Fix tool output entries to include actual content:
  - Extract text from output.segments array
  - Handle both text and structured segments
  - Populate "content" field with actual output instead of empty string

- Add comprehensive test coverage:
  - test_transcript_includes_tool_calls now verifies presence of tool_call
    and tool_output entries
  - test_transcript_tool_call_shape validates exact structure and content
    of tool_call and tool_output entries
  - Validates that arguments is a JSON string and content is not empty

Root cause: Previous implementation created a single entry with an array
of tool call IDs instead of individual entries per call, and left tool
output content empty.

Impact: The transcript API now correctly represents tool interactions,
making it possible to reconstruct the full conversation flow including
tool invocations and their results.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add explicit Optional[str] type annotation to ToolTestHarness.register_tool() for PEP 484 compliance
- Replace boolean equality check with idiomatic truth assertion in integration test
- Add clarifying API surface comments to Swift transcript implementation documenting ToolCall and ToolOutput types

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add exception chaining (from e) to preserve traceback in schema extraction errors
- Add response content assertion to test_tool_with_multiple_parameters for better coverage

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
applefoundationmodels/swift/foundation_models.swift (1)

955-963: Preserve globally registered tools when clearing history.

The verification confirms the review comment is correct. createNewSession() omits the registeredTools parameter, causing any globally registered tools to be lost when appleAIClearHistory() creates a new session. This is inconsistent with the initialization pattern (lines 386-389) where registeredTools is explicitly passed and preserved.

The suggested fix is valid and follows the established pattern:

@_cdecl("apple_ai_clear_history")
public func appleAIClearHistory() {
    // Clear by creating a new session
    #if canImport(FoundationModels)
    if #available(macOS 26.0, *) {
-        currentSession = createNewSession()
+        currentSession = createSession(
+            instructions: sessionInstructions,
+            tools: registeredTools,
+            forceNew: true
+        )
    }
    #endif
}

Note: getOrCreateSession() (lines 193, 428, 494, 886) has the same omission and should be reviewed similarly.

🧹 Nitpick comments (2)
tests/test_tools.py (2)

46-49: Remove redundant metadata copying.

The @wraps(func) decorator on line 35 already copies __name__, __annotations__, __doc__, and other metadata attributes. Lines 47-49 redundantly copy these same attributes.

Apply this diff to remove the redundant lines:

             wrapper.__name__ = original_func.__name__
             wrapper.__annotations__ = original_func.__annotations__
             wrapper.__doc__ = original_func.__doc__
-
-            # Copy function metadata for schema extraction
-            wrapper.__name__ = original_func.__name__
-            wrapper.__annotations__ = original_func.__annotations__
-            wrapper.__doc__ = original_func.__doc__
-
             decorated = self.session.tool(description=description)(wrapper)

224-226: Remove redundant import.

Optional is already imported at the module level (line 9), making the import on line 226 redundant.

Apply this diff:

     def test_tool_with_optional_type_annotation(self, session):
         """Test that Optional[...] type annotations are properly handled."""
-        from typing import Optional
-
         called = {}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c8c7cbc and affaf6b.

📒 Files selected for processing (2)
  • applefoundationmodels/swift/foundation_models.swift (11 hunks)
  • tests/test_tools.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
applefoundationmodels/swift/foundation_models.swift (1)
applefoundationmodels/session.py (3)
  • callback (219-220)
  • tool (291-330)
  • transcript (353-375)
tests/test_tools.py (3)
applefoundationmodels/client.py (1)
  • Client (17-187)
applefoundationmodels/session.py (4)
  • decorator (318-328)
  • tool (291-330)
  • generate (90-121)
  • transcript (353-375)
applefoundationmodels/tools.py (2)
  • decorator (291-295)
  • tool (271-297)
🪛 Ruff (0.14.3)
tests/test_tools.py

135-135: Local variable response is assigned to but never used

Remove assignment to unused variable response

(F841)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Run tests on macOS (3.9)
🔇 Additional comments (7)
tests/test_tools.py (2)

248-342: LGTM! Comprehensive transcript testing.

The transcript tests thoroughly validate the structure and content of tool call entries, including required fields, data types, and actual content. The progression from basic inclusion checks to detailed schema validation is well-designed.


344-366: LGTM! Integration test properly guarded.

The integration test correctly uses @pytest.mark.skipif to ensure it only runs when Apple Intelligence is available, and the end-to-end flow verification is appropriate.

applefoundationmodels/swift/foundation_models.swift (5)

144-188: LGTM: Session creation properly handles tools.

The pattern matching for different combinations of instructions and tools is clean and ensures the appropriate LanguageModelSession initializer is called in each case.


317-354: LGTM: Fail-fast behavior correctly implemented.

The tool registration now properly validates required fields and schema conversion, returning appropriate error codes and clearing partial registrations on any failure. This addresses the previous review concern about silent tool drops.


593-627: LGTM: Transcript format matches documented API contract.

The implementation now emits individual tool_call entries with tool_name, tool_id, and arguments fields, and properly extracts content from tool output segments. This addresses the previous review concern about transcript shape.

Based on relevant code snippets showing the Python API expects this exact format.


660-802: LGTM: Schema conversion provides excellent error context.

The Result-based error handling with path tracking makes schema conversion failures easy to diagnose. The recursive structure properly handles nested objects and arrays while maintaining the full path to any problematic field.


804-839: LGTM: Content extraction properly handles all cases.

The recursive extraction correctly handles all GeneratedContent.kind types including nested structures and arrays, with appropriate error propagation.

btucker and others added 6 commits November 7, 2025 16:20
The test allowed operation to be either 'multiply' or 'times' but only 'multiply' was in the operations dict, causing the lookup to return 'unknown' if the model used 'times', which would fail the 105 assertion.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Add assertions to validate that the response is a non-empty string with expected content, ensuring the test fails early if search functionality returns None, empty string, or wrong type.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Fixed the 16KB fixed buffer limitation that could truncate or corrupt large tool outputs.

Changes:
- Swift: Implement retry loop with progressively larger buffers (16KB -> 32KB -> 64KB -> etc.)
- Add 1MB maximum buffer size cap to prevent unbounded allocations
- Python: Return AI_ERROR_BUFFER_TOO_SMALL (-13) instead of silently truncating
- Add buffer size validation and informative error messages
- Document buffer size limits in Session.tool() and @tool decorator docstrings

Implementation:
- Initial buffer: 16KB (handles most tool outputs efficiently)
- Automatic retry: Doubles buffer size on each retry until success or max reached
- Maximum size: 1MB (prevents memory exhaustion)
- Error handling: Clear error when output exceeds 1MB

Testing:
- Added test_large_tool_output to verify 20KB outputs work correctly
- All existing tests pass

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Critical fix for race condition where toolCallback could be accessed from multiple threads without synchronization, potentially causing:
- Crashes if callback becomes nil during tool execution
- Incorrect tool routing if tools are re-registered mid-flight

Implementation:
- Add NSLock for synchronized access to toolCallback
- Use private _toolCallback with computed property wrapper
- Lock acquired on both read and write operations
- defer ensures lock is always released even on early return

This prevents concurrent access issues when tools are registered while other tool calls are in flight.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
The test was flaky because:
- Vague prompt didn't consistently trigger tool call
- No verification that tool was actually invoked
- Weak assertion didn't catch truncation issues

Changes:
- Track tool invocation with called dict
- Use more specific tool description and prompt
- Add explicit assertion that tool was called
- Use START/END markers to detect truncation
- More direct prompt: "Use the get_system_logs tool..."

All 13 tests now pass consistently.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
tests/test_tools.py (1)

373-397: Strengthen assertion to verify markers.

The test comment (lines 395-396) mentions checking for "both start and end markers," but the assertion only verifies response length. Consider asserting that the tool output wasn't truncated by checking the transcript for the markers.

Apply this diff to verify the output contains both markers:

         # Verify the tool was called
         assert called.get("invoked"), "Tool should have been called"
 
-        # Verify output wasn't truncated - check for both start and end markers
-        # The response should reference or contain evidence of the large data
-        assert len(response) > 100, "Response should contain content from the tool"
+        # Verify output wasn't truncated - check transcript for both markers
+        transcript = session.transcript
+        tool_outputs = [e for e in transcript if e["type"] == "tool_output"]
+        assert len(tool_outputs) > 0, "Should have tool output in transcript"
+        content = tool_outputs[0]["content"]
+        assert "START-" in content and "-END" in content, "Tool output should not be truncated"
applefoundationmodels/session.py (1)

291-358: Session recreation on each tool registration may be inefficient.

The _register_tools method recreates the session every time a tool is decorated (line 357). If a user registers multiple tools sequentially, this triggers multiple session recreations.

Consider either:

  1. Deferring registration until first generate() call
  2. Adding a batch registration method
  3. Documenting this behavior so users know to register all tools before generating

Alternatively, if immediate registration is required by the FoundationModels API, add a note in the tool decorator docstring:

         Note:
             Tool output size limits:
             - Initial buffer: 16KB
             - Maximum size: 1MB (automatically retried with larger buffers)
             - Tools returning outputs larger than 1MB will raise an error
             - For large outputs, consider returning references or summaries
+            
+            Performance note:
+            - Each tool registration recreates the session
+            - For better performance, register all tools before calling generate()
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between affaf6b and f36b9ca.

📒 Files selected for processing (6)
  • applefoundationmodels/_foundationmodels.pyx (2 hunks)
  • applefoundationmodels/session.py (10 hunks)
  • applefoundationmodels/swift/foundation_models.swift (11 hunks)
  • applefoundationmodels/tools.py (1 hunks)
  • applefoundationmodels/types.py (3 hunks)
  • tests/test_tools.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • applefoundationmodels/_foundationmodels.pyx
🧰 Additional context used
🧬 Code graph analysis (4)
applefoundationmodels/tools.py (3)
applefoundationmodels/exceptions.py (1)
  • ToolCallError (92-95)
applefoundationmodels/session.py (2)
  • tool (291-337)
  • decorator (325-335)
tests/test_tools.py (1)
  • decorator (32-52)
applefoundationmodels/swift/foundation_models.swift (2)
applefoundationmodels/session.py (2)
  • tool (291-337)
  • transcript (360-382)
applefoundationmodels/tools.py (1)
  • tool (271-304)
applefoundationmodels/session.py (6)
applefoundationmodels/base.py (1)
  • ContextManagedResource (13-36)
applefoundationmodels/types.py (3)
  • GenerationParams (73-90)
  • NormalizedGenerationParams (94-130)
  • from_optional (110-130)
applefoundationmodels/pydantic_compat.py (1)
  • normalize_schema (116-144)
applefoundationmodels/tools.py (4)
  • extract_function_schema (153-233)
  • attach_tool_metadata (236-268)
  • tool (271-304)
  • decorator (298-302)
applefoundationmodels/_foundationmodels.pyi (6)
  • generate (23-23)
  • generate_structured (26-31)
  • generate_stream (34-39)
  • register_tools (47-47)
  • create_session (20-20)
  • get_transcript (48-48)
applefoundationmodels/client.py (1)
  • create_session (125-164)
tests/test_tools.py (5)
applefoundationmodels/client.py (1)
  • Client (17-187)
applefoundationmodels/session.py (4)
  • decorator (325-335)
  • tool (291-337)
  • generate (90-121)
  • transcript (360-382)
applefoundationmodels/tools.py (2)
  • decorator (298-302)
  • tool (271-304)
applefoundationmodels/swift/foundation_models.swift (1)
  • call (96-151)
applefoundationmodels/_foundationmodels.pyi (3)
  • create_session (20-20)
  • generate (23-23)
  • is_ready (13-13)
🪛 GitHub Actions: Tests
tests/test_tools.py

[error] 1-1: Black formatting check failed. 1 file would be reformatted; run 'uv run black --check' to verify or 'uv run black' to reformat.

🪛 Ruff (0.14.3)
applefoundationmodels/tools.py

176-176: Do not catch blind exception: Exception

(BLE001)


224-228: Consider moving this statement to an else block

(TRY300)


231-233: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (12)
applefoundationmodels/types.py (2)

8-8: LGTM!

The dataclasses import is correctly added to support the new NormalizedGenerationParams dataclass.


93-131: LGTM!

The NormalizedGenerationParams dataclass is well-designed with clear documentation. The lazy import of constants within from_optional is a sensible pattern to avoid potential circular dependencies.

applefoundationmodels/tools.py (1)

169-233: LGTM!

The exception handling in extract_function_schema is robust and appropriate:

  • Broad catch on line 176 is justified for get_type_hints which can fail in various ways
  • Contextual error messages provide good diagnostics
  • Error chaining preserves original exception details

The static analysis hints are false positives in this context.

applefoundationmodels/session.py (4)

9-26: LGTM!

Import additions are correct and necessary for the tool-calling functionality.


45-60: LGTM!

The __init__ changes correctly initialize the tool management state.


75-88: LGTM!

The refactored _normalize_generation_params cleanly delegates to NormalizedGenerationParams.from_optional.


359-382: LGTM!

The transcript property is well-documented and provides clear visibility into the session conversation history including tool calls.

applefoundationmodels/swift/foundation_models.swift (5)

96-151: LGTM!

The buffer retry logic in call(arguments:) is well-implemented:

  • Progressive doubling handles variable output sizes efficiently
  • Clear error messages when limits are exceeded
  • Proper memory management with defer
  • Aligns with documented 16KB initial / 1MB max limits

328-400: LGTM!

The appleAIRegisterTools function correctly implements fail-fast validation:

  • Required fields are validated with clear error messages
  • Partial registrations are cleared on any failure
  • Schema conversion failures are caught and reported
  • Tool index/name included in error messages for debugging

591-695: LGTM!

The appleAIGetTranscript implementation correctly matches the documented API contract:

  • Individual tool_call entries with tool_name, tool_id, and arguments (JSON string)
  • Tool outputs properly extract and populate content from segments
  • Handles both text and structured segments appropriately

Past review concerns about transcript shape have been fully addressed.


699-841: Excellent error handling for schema conversion.

The schema conversion helpers demonstrate best practices:

  • Result type for explicit error handling
  • Path tracking for debugging nested schema issues
  • Fail-fast with detailed error messages
  • Proper error propagation through recursion

This makes schema validation failures much easier to diagnose.


176-240: LGTM!

The createSession helper cleanly handles all combinations of instructions and tools with appropriate session initializers. The pattern-matching switch ensures all cases are covered.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/test_tools.py (1)

229-231: Remove redundant import.

Optional is already imported at the top of the file (line 9), so the local import here is unnecessary.

Apply this diff:

     def test_tool_with_optional_type_annotation(self, session):
         """Test that Optional[...] type annotations are properly handled."""
-        from typing import Optional
-
         called = {}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f36b9ca and fdf13d8.

📒 Files selected for processing (1)
  • tests/test_tools.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/test_tools.py (4)
applefoundationmodels/client.py (1)
  • Client (17-187)
applefoundationmodels/tools.py (2)
  • decorator (298-302)
  • tool (271-304)
applefoundationmodels/session.py (4)
  • decorator (325-335)
  • tool (291-337)
  • generate (90-121)
  • transcript (360-382)
applefoundationmodels/swift/foundation_models.swift (1)
  • call (96-151)
🔇 Additional comments (3)
tests/test_tools.py (3)

14-74: Well-designed test harness.

The ToolTestHarness class provides a clean abstraction for testing tool calls with proper metadata preservation and convenient assertion helpers.


92-190: Comprehensive parameter signature coverage.

The test cases thoroughly validate tool registration across no-parameter, single-parameter, multi-parameter, mixed-type, and optional-parameter scenarios.


253-347: Thorough transcript validation.

The tests comprehensively verify transcript structure, entry types, required fields, and content format including JSON argument parsing.

btucker and others added 4 commits November 7, 2025 16:40
The test created START- and -END markers but didn't check them, only asserting len(response) > 100 which doesn't verify the full data was preserved.

Changes:
- Access session.transcript to get actual tool output
- Assert both "START-" and "-END" markers are present
- Assert output length is at least 20KB
- Assert exact equality: tool_output_content == large_data

This properly verifies that:
1. Buffer resizing worked correctly
2. No truncation occurred during FFI transfer
3. The full 20KB was preserved byte-for-byte

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
While the .pyi stub correctly types get_transcript() -> List[Dict[str, Any]], adding an explicit typing.cast() makes the type absolutely clear to all type checkers and eliminates any ambiguity.

Changes:
- Import cast from typing
- Add explicit cast(List[Dict[str, Any]], ...) to get_transcript() return
- Add comment explaining the explicit cast

This is defensive typing to ensure all static analysis tools correctly infer the return type.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…_type_annotation

Optional is already imported at the top of the file (line 9), making the local import on line 231 unnecessary.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@btucker
Copy link
Owner Author

btucker commented Nov 7, 2025

@coderabbitai please review

@coderabbitai
Copy link

coderabbitai bot commented Nov 7, 2025

@btucker, I'll review the changes in this PR right away.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
tests/test_tools.py (1)

390-392: Consider removing unused variable assignment.

The response variable is assigned but never used. Since this test focuses on verifying tool output preservation in the transcript rather than the model's response, you can simplify by removing the assignment.

Apply this diff:

-        response = session.generate(
-            "Use the get_system_logs tool to retrieve the system diagnostic data"
-        )
+        session.generate(
+            "Use the get_system_logs tool to retrieve the system diagnostic data"
+        )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fdf13d8 and 2fbc050.

📒 Files selected for processing (2)
  • applefoundationmodels/session.py (10 hunks)
  • tests/test_tools.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.14.3)
tests/test_tools.py

390-390: Local variable response is assigned to but never used

Remove assignment to unused variable response

(F841)

🔇 Additional comments (15)
tests/test_tools.py (6)

1-12: LGTM!

The imports are clean and the module docstring clearly explains the test file's purpose.


14-75: LGTM!

The ToolTestHarness class is well-designed with proper type hints, good use of functools.wraps, and helpful assertion methods. This reduces test boilerplate effectively.


77-90: LGTM!

The pytest fixtures properly use context managers and provide clean test setup.


92-190: LGTM!

The test class provides comprehensive coverage of tool registration scenarios including various parameter types, multiple parameters, and optional parameters. Previous issues have been properly addressed.


192-249: LGTM!

The execution tests properly verify multiple tool registration, return type handling, and Optional type annotations.


251-345: LGTM!

The transcript tests thoroughly validate the structure and content of transcript entries, including proper verification of tool_call and tool_output fields and JSON argument parsing.

applefoundationmodels/session.py (9)

9-27: LGTM!

The import additions are clean and all imported symbols are properly used throughout the file.


46-61: LGTM!

The session initialization properly adds tool management state and configuration handling with appropriate type hints.


76-89: LGTM!

The refactor to use NormalizedGenerationParams improves type safety and maintainability compared to returning tuples.


121-122: LGTM!

The generate method correctly uses the normalized parameters with proper attribute access.


180-187: LGTM!

The generate_structured method properly uses normalized parameters throughout.


215-228: LGTM!

The generate_stream method consistently uses normalized parameters, completing the refactor across all generation methods.


292-338: LGTM!

The tool decorator implementation is clean and well-documented. It properly extracts schemas, attaches metadata, registers tools with the FFI layer, and includes helpful documentation about size limits.


340-359: LGTM!

The _register_tools method correctly registers tools with the FFI layer and recreates the session to enable tool support. The early return optimization and state management are appropriate.


360-384: LGTM!

The transcript property provides clean access to the session transcript with proper type annotations, closure checks, and comprehensive documentation of the transcript entry structure.

@btucker btucker merged commit 29e58e3 into main Nov 7, 2025
8 checks passed
@btucker btucker deleted the feature/tool-calling branch November 7, 2025 23:48
This was referenced Nov 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants