diff --git a/peps/pep-0822.rst b/peps/pep-0822.rst new file mode 100644 index 00000000000..29eeabe44ab --- /dev/null +++ b/peps/pep-0822.rst @@ -0,0 +1,319 @@ +PEP: 822 +Title: Dedented Multiline String (d-string) +Author: Inada Naoki +Discussions-To: https://discuss.python.org/t/105519 +Status: Draft +Type: Standards Track +Created: 05-Jan-2026 +Python-Version: 3.15 +Post-History: `05-Jan-2026 `__, + + +Abstract +======== + +This PEP proposes to add a feature that automatically removes indentation from +multiline string literals. + +Dedented multiline strings use a new prefix "d" (shorthand for "dedent") before +the opening quote of a multiline string literal. + +Example (spaces are visualized as ``_``): + +.. code-block:: python + + def hello_paragraph() -> str: + ____return d""" + ________

+ __________Hello, World! + ________

+ ____""" + +The closing triple quotes control how much indentation would be removed. +In the above example, the returned string will contain three lines: + +* ``"____

\n"`` (four leading spaces) +* ``"______Hello, World!\n"`` (six leading spaces) +* ``"____

\n"`` (four leading spaces) + + +Motivation +========== + +When writing multiline string literals within deeply indented Python code, +users are faced with the following choices: + +* Accept that the content of the string literal will be left-aligned. +* Use multiple single-line string literals concatenated together instead of + a multiline string literal. +* Use ``textwrap.dedent()`` to remove indentation. + +All of these options have drawbacks in terms of code readability and +maintainability. + +* Left-aligned multiline strings look awkward and tend to be avoided. + In practice, many places including Python's own test code choose other + methods. +* Concatenated single-line string literals are more verbose and harder to + maintain. +* ``textwrap.dedent()`` is implemented in Python so it requires some runtime + overhead. + It cannot be used in hot paths where performance is critical. + +This PEP aims to provide a built-in syntax for dedented multiline strings that +is both easy to read and write, while also being efficient at runtime. + + +Rationale +========= + +The main alternative to this idea is to implement ``textwrap.dedent()`` in C +and provide it as a ``str.dedent()`` method. +This idea reduces the runtime overhead of ``textwrap.dedent()``. +By making it a built-in method, it also allows for compile-time dedentation +when called directly on string literals. + +However, this approach has several drawbacks: + +* To support cases where users want to include some indentation in the string, + the ``dedent()`` method would need to accept an argument specifying + the amount of indentation to remove. + This would be cumbersome and error-prone for users. +* When continuation lines (lines after line ends with a backslash) are used, + they cannot be dedented. +* f-strings may interpolate expressions as multiline string without indent. + In such case, f-string + ``str.dedent()`` cannot dedent the whole string. +* t-strings do not create ``str`` objects, so they cannot use the + ``str.dedent()`` method. + While adding a ``dedent()`` method to ``string.templatelib.Template`` is an + option, it would lead to inconsistency since t-strings and f-strings are very + similar but would have different behaviors regarding dedentation. + +The ``str.dedent()`` method can still be useful for non-literal strings, +so this PEP does not preclude that idea. +However, for ease of use with multiline string literals, providing dedicated +syntax is superior. + + +Specification +============= + +Add a new string literal prefix "d" for dedented multiline strings. +This prefix can be combined with "f", "t", and "r" prefixes. + +This prefix is only for multiline string literals. +So it can only be used with triple quotes (``"""`` or ``'''``). +Using it with single or double quotes (``"`` or ``'``) is a syntax error. + +Opening triple quotes needs to be followed by a newline character. +This newline is not included in the resulting string. + +The amount of indentation to be removed is determined by the whitespace +(``' '`` or ``'\t'``) preceding the closing triple quotes. +Mixing spaces and tabs in indentation raises a ``TabError``, similar to +Python's own indentation rules. + +The dedentation process removes the determined amount of leading whitespace +from every line in the string. +Lines that are shorter than the determined indentation become just an empty +line (e.g. ``"\n"``). +Otherwise, if the line does not start with the determined indentation, +Python raises an ``IndentationError``. + +Unless combined with the "r" prefix, backslash escapes are processed after +removing indentation. +So you cannot use ``\\t`` to create indentation. +And you can use line continuation (backslash at the end of line) and remove +indentation from the continued line. + +Examples: + +.. code-block:: python + + # Whitespace is shown as _ and tab is shown as ---> for clarity. + # Error messages are just for explanation. Actual messages may differ. + + s = d"" # SyntaxError: d-string must be a multiline string + s = d"""Hello""" # SyntaxError: d-string must be a multiline string + s = d"""Hello + __World! + """ # SyntaxError: d-string must start with a newline + + s = d""" + __Hello + __World!""" # SyntaxError: d-string must end with an indent-only line + + s = d""" + __Hello + __World! + """ # Zero indentation is removed because closing quotes are not indented. + print(repr(s)) # '__Hello\n__World!\n' + + s = d""" + __Hello + __World! + _""" # One space indentation is removed. + print(repr(s)) # '_Hello\n_World!\n' + + s = d""" + __Hello + __World! + __""" # Two spaces indentation are removed. + print(repr(s)) # 'Hello\nWorld!\n' + + s = d""" + __Hello + __World! + ___""" # IndentationError: missing valid indentation + + s = d""" + --->Hello + __World! + __""" # IndentationError: missing valid indentation + + s = d""" + --->--->__Hello + --->--->__World! + --->--->""" # Tab is allowed as indentation. + # Spaces are just in the string, not indentation to be removed. + print(repr(s)) # '__Hello\n__World!\n' + + s = d""" + --->____Hello + --->____World! + --->__""" # TabError: mixing spaces and tabs in indentation + + s = d""" + __Hello \ + __World!\ + __""" # line continuation works as ususal + print(repr(s)) # 'Hello_World!' + + s = d"""\ + __Hello + __World + __""" # SyntaxError: d-string must starts with a newline. + + s = dr""" + __Hello\ + __World!\ + __""" # d-string can be combined with r-string. + print(repr(s)) # 'Hello\\\nWorld!\\\n' + + s = df""" + ____Hello, {"world".title()}! + ____""" # d-string can be combined with f-string and t-string too. + print(repr(s)) # 'Hello, World!\n' + + s = dt""" + ____Hello, {"world".title()}! + ____""" + print(type(s)) # + print(s.strings) # ('Hello, ', '!\n') + print(s.values) # ('World',) + print(s.interpolations) + # (Interpolation('World', '"world".title()', None, ''),) + + +How to Teach This +================= + +In the tutorial, we can introduce d-string with triple quote string literals. +Additionally, we can add a note in the ``textwrap.dedent()`` documentation, +providing a link to the d-string section in the language reference or +the relevant part of the tutorial. + + +Other Languages having Similar Features +======================================== + +Java 15 introduced a feature called `text blocks `__. +Since Java had not used triple qutes before, they introduced triple quotes for +multiline string literals with automatic indent removal. + +C# 11 also introduced a similar feature called +`raw string literals `__. + +`Julia `__ and +`Swift `__ +also support triple-quoted string literals that automatically remove indentation. + +PHP 7.3 introduced `Flexible Heredoc and Nowdoc Syntaxes `__ +Although it uses closing marker (e.g. ``<<`__. + + +Rejected Ideas +============== + +``str.dedent()`` method +----------------------- + +As mentioned in the Rationale section, this PEP doesn't reject the idea of a +``str.dedent()`` method. +A faster version of ``textwrap.dedent()`` implemented in C would be useful for +runtime dedentation. + +However, d-string is more suitable for multiline string literals because: + +* It works well with f/t-strings. +* It allows specifying the amount of indentation to be removed more easily. +* It can dedent continuation lines. + + +Triple-backtick +--------------- + +It is considered that +`using triple backticks `__ +for dedented multiline strings could be an alternative syntax. +This notation is familiar to us from Markdown. While there were past concerns +about certain keyboard layouts, +nowadays many people are accustomed to typing this notation. + +However, this notation conflicts when embedding Python code within Markdown or +vice versa. +Therefore, considering these drawbacks, increasing the variety of quote +characters is not seen as a superior idea compared to adding a prefix to +string literals. + + +``__future__`` import +--------------------- + +Instead of adding a prefix to string literals, the idea of using a +``__future__`` import to change the default behavior of multiline +string literals was also considered. +This could help simplify Python's grammar in the future. + +But rewriting all existing complex codebases to the new notation may not be +straightforward. +Until all multiline strings in that source code are rewritten to +the new notation, automatic dedentation cannot be utilized. + +Until all users can rewrite existing codebases to the new notation, +two types of Python syntax will coexist indefinitely. +Therefore, `many people preferred the new string prefix `__ +over the ``__future__`` import. + + +Copyright +========= + +This document is placed in the public domain or under the +CC0-1.0-Universal license, whichever is more permissive.