-
Notifications
You must be signed in to change notification settings - Fork 266
Description
I maintain a large codebase that used quick-xml
for years across hundreds of structs.
After switching from 0.37.5
to 0.38
, I hit the already well-known whitespace issues. 0.38.1
introduced Config::trim_text
which saved me from touching thousands of fields, but now I found the next caveat: XML character entities make trim_text(true)
trim more than needed:
#!/usr/bin/env cargo
---
[dependencies]
serde = { version = "1", features = ["derive"] }
quick-xml = { version = "0.38", features = ["serialize"] }
quick-xml-old = { package = "quick-xml", version = "0.37", features = ["serialize"] }
---
use quick_xml::reader::NsReader;
use serde::Deserialize;
use std::result::Result;
#[derive(Debug, Deserialize)]
struct Root {
bar: String,
}
pub fn from_str_trim_text<'de, T>(s: &'de str) -> Result<T, Box<dyn std::error::Error>>
where
T: Deserialize<'de>,
{
let mut reader = NsReader::from_str(s);
let config = reader.config_mut();
config.trim_text(true); // Trims also spaces around decoded &
config.expand_empty_elements = true;
let mut de = quick_xml::de::Deserializer::borrowing(reader);
Ok(T::deserialize(&mut de)?)
}
fn main() -> Result<(), Box<dyn std::error::Error>> {
let xml = r#"
<root>
<bar>
Foo & Bar & Baz Qux Quux
</bar>
</root>"#;
let old: Root = quick_xml_old::de::from_str(xml)?;
let new: Root = quick_xml::de::from_str(xml)?;
let new_trimming: Root = from_str_trim_text(xml)?;
println!("Old: {}", old.bar); // "Foo & Bar & Baz Qux Quux"
println!("New: {}", new.bar); // "\n Foo & Bar & Baz Qux Quux\n "
println!("New trimming: {}", new_trimming.bar); // "Foo&Bar&Baz Qux Quux"
Ok(())
}
(cargo +nightly -Zscript example.rs
)
I know this is documented behavior (thank you), so I'm not calling it a bug. I’m looking for a solution to preserve "internal" spaces (e.g, around the decoded entities) while still trimming leading/trailing whitespace globally.
The only workaround that comes to my mind atm is either introducing a newtype or using deserialize_with
, which are both rather error-prone in a mature codebase like mine and require tons of work.
If there is no existing solution right now, I could try to implement a feature (e.g., a new config option?) if the maintainers think it is a good idea.