Skip to content

ENH: Add DataFrame.to_toon() method for TOON serialization support #63138

@MuhammadUsman-Khan

Description

@MuhammadUsman-Khan

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

Pandas is increasingly being used in workflows involving Large Language Models (LLMs), RAG pipelines, context-packing, and prompt engineering. Currently, pandas supports exporting data to formats such as JSON, Markdown, CSV, Parquet, and others, but there is no built-in way to export DataFrames to the TOON (Token-Oriented Object Notation) format.

TOON is a new open serialization format designed specifically for LLM-optimized structured data representation. It achieves significant token savings compared to JSON (often 30–60%) and is ideal for passing structured tabular data into LLM prompts.

The problem: pandas has no native way to convert a DataFrame into TOON format, even though DataFrames map naturally to TOON’s tabular representation. Adding .to_toon() would enable users to efficiently serialize DataFrames for LLM applications, improve cost efficiency, and reduce token usage.

Feature Description

I propose adding a new method:

DataFrame.to_toon(index=False, indent=2, columns=None, **kwargs)

Implementation outline:

  1. A new IO function similar to to_json, to_markdown, and to_csv

  2. Pandas normalizes the DataFrame to a Python dict or list-of-records structure

  3. An external dependency (e.g., pytoon) performs the dict → TOON encoding

  4. TOON’s tabular layout:
    fields: col1 col2 col3
    rows: N
    - value1 value2 value3

  5. Follows pandas conventions:
    - optional index=bool
    - column selection via columns
    - indent for readable output
    - returns a string or writes to file

This design matches existing .to_* patterns and keeps TOON encoding logic outside core pandas.

Alternative Solutions

A user can manually convert a DataFrame to a list of dicts and use a separate Python TOON encoding library, but this requires custom code for every project and loses the convenience and consistency of the pandas IO API.

There is currently no built-in pandas method or widely-used third-party library that provides seamless DataFrame → TOON serialization similar to to_json().

Additional Context

TOON Specification:
https://github.com/toon-format/spec

Reference implementation:
https://github.com/toon-format/toon

TOON (Token-Oriented Object Notation) is designed for compact, LLM-friendly serialization. It reduces token usage significantly when serializing large DataFrames, especially in uniform tabular structures.

Example DataFrame:

df = pd.DataFrame({"name": ["Ali", "Sara"], "age": [23, 21]})
df.to_toon(index=False)

Possible output according to the TOON spec:

fields: name age
rows: 2
- Ali 23
- Sara 21

I am willing to contribute a PR with full implementation + tests if maintainers approve the feature.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions