-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
Pandas is increasingly being used in workflows involving Large Language Models (LLMs), RAG pipelines, context-packing, and prompt engineering. Currently, pandas supports exporting data to formats such as JSON, Markdown, CSV, Parquet, and others, but there is no built-in way to export DataFrames to the TOON (Token-Oriented Object Notation) format.
TOON is a new open serialization format designed specifically for LLM-optimized structured data representation. It achieves significant token savings compared to JSON (often 30–60%) and is ideal for passing structured tabular data into LLM prompts.
The problem: pandas has no native way to convert a DataFrame into TOON format, even though DataFrames map naturally to TOON’s tabular representation. Adding .to_toon() would enable users to efficiently serialize DataFrames for LLM applications, improve cost efficiency, and reduce token usage.
Feature Description
I propose adding a new method:
DataFrame.to_toon(index=False, indent=2, columns=None, **kwargs)
Implementation outline:
-
A new IO function similar to
to_json,to_markdown, andto_csv -
Pandas normalizes the DataFrame to a Python dict or list-of-records structure
-
An external dependency (e.g.,
pytoon) performs the dict → TOON encoding -
TOON’s tabular layout:
fields: col1 col2 col3
rows: N
- value1 value2 value3 -
Follows pandas conventions:
- optionalindex=bool
- column selection viacolumns
-indentfor readable output
- returns a string or writes to file
This design matches existing .to_* patterns and keeps TOON encoding logic outside core pandas.
Alternative Solutions
A user can manually convert a DataFrame to a list of dicts and use a separate Python TOON encoding library, but this requires custom code for every project and loses the convenience and consistency of the pandas IO API.
There is currently no built-in pandas method or widely-used third-party library that provides seamless DataFrame → TOON serialization similar to to_json().
Additional Context
TOON Specification:
https://github.com/toon-format/spec
Reference implementation:
https://github.com/toon-format/toon
TOON (Token-Oriented Object Notation) is designed for compact, LLM-friendly serialization. It reduces token usage significantly when serializing large DataFrames, especially in uniform tabular structures.
Example DataFrame:
df = pd.DataFrame({"name": ["Ali", "Sara"], "age": [23, 21]})
df.to_toon(index=False)
Possible output according to the TOON spec:
fields: name age
rows: 2
- Ali 23
- Sara 21
I am willing to contribute a PR with full implementation + tests if maintainers approve the feature.