Skip to content

Whitespaces around character entities #900

@synek317

Description

@synek317

I maintain a large codebase that used quick-xml for years across hundreds of structs.

After switching from 0.37.5 to 0.38, I hit the already well-known whitespace issues. 0.38.1 introduced Config::trim_text which saved me from touching thousands of fields, but now I found the next caveat: XML character entities make trim_text(true) trim more than needed:

#!/usr/bin/env cargo
---
[dependencies]
serde = { version = "1", features = ["derive"] }
quick-xml = { version = "0.38", features = ["serialize"] }
quick-xml-old = { package = "quick-xml", version = "0.37", features = ["serialize"] }
---

use quick_xml::reader::NsReader;
use serde::Deserialize;
use std::result::Result;

#[derive(Debug, Deserialize)]
struct Root {
    bar: String,
}

pub fn from_str_trim_text<'de, T>(s: &'de str) -> Result<T, Box<dyn std::error::Error>>
where
    T: Deserialize<'de>,
{
    let mut reader = NsReader::from_str(s);
    let config = reader.config_mut();

    config.trim_text(true); // Trims also spaces around decoded &amp;
    config.expand_empty_elements = true;

    let mut de = quick_xml::de::Deserializer::borrowing(reader);
    Ok(T::deserialize(&mut de)?)
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let xml = r#"
    <root>
        <bar>
            Foo &amp; Bar  &amp;  Baz Qux   Quux
        </bar>
    </root>"#;

    let old: Root = quick_xml_old::de::from_str(xml)?;
    let new: Root = quick_xml::de::from_str(xml)?;
    let new_trimming: Root = from_str_trim_text(xml)?;

    println!("Old: {}", old.bar); // "Foo & Bar  &  Baz Qux   Quux"
    println!("New: {}", new.bar); // "\n            Foo & Bar  &  Baz Qux   Quux\n        "
    println!("New trimming: {}", new_trimming.bar); // "Foo&Bar&Baz Qux   Quux"

    Ok(())
}

(cargo +nightly -Zscript example.rs)

I know this is documented behavior (thank you), so I'm not calling it a bug. I’m looking for a solution to preserve "internal" spaces (e.g, around the decoded entities) while still trimming leading/trailing whitespace globally.

The only workaround that comes to my mind atm is either introducing a newtype or using deserialize_with, which are both rather error-prone in a mature codebase like mine and require tons of work.

If there is no existing solution right now, I could try to implement a feature (e.g., a new config option?) if the maintainers think it is a good idea.

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedserdeIssues related to mapping from Rust types to XML

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions