Commit 648c940
committed
[FIX] util/snippets: fix conversion of full html docs
upg-1158494
opw-3577439
```
File "/tmp/tmpjzebte5h/migrations/account_online_synchronization/saas~15.1.1.0/pre-migrate.py", line 6, in migrate
util.remove_field(cr, "account.link.journal.line", "action")
File "/tmp/tmpjzebte5h/migrations/util/fields.py", line 163, in remove_field
adapt_domains(cr, model, fieldname, "ignored", adapter=adapter, skip_inherit=skip_inherit, force_adapt=True)
File "/tmp/tmpjzebte5h/migrations/util/domains.py", line 316, in adapt_domains
with suppress(_Skip), edit_view(cr, view_id=view_id, active=None) as view:
File "/usr/lib/python3.8/contextlib.py", line 113, in __enter__
return next(self.gen)
File "/tmp/tmpjzebte5h/migrations/util/records.py", line 210, in edit_view
arch_etree = lxml.etree.fromstring(arch["en_US"])
File "src/lxml/etree.pyx", line 3257, in lxml.etree.fromstring
File "src/lxml/parser.pxi", line 1916, in lxml.etree._parseMemoryDocument
File "src/lxml/parser.pxi", line 1796, in lxml.etree._parseDoc
File "src/lxml/parser.pxi", line 1085, in lxml.etree._BaseParser._parseUnicodeDoc
File "src/lxml/parser.pxi", line 618, in lxml.etree._ParserContext._handleParseResultDoc
File "src/lxml/parser.pxi", line 728, in lxml.etree._handleParseResult
File "src/lxml/parser.pxi", line 657, in lxml.etree._raiseParseError
File "<string>", line 14
lxml.etree.XMLSyntaxError: Extra content at the end of the document, line 14, column 17
```
The Traceback from the lxml etree parser is caused by a previous corruption of
the document in question by a previous upgrade script. The document as is in
the origin DB is a full HTML doc with an `html` root tag and can be parsed by
lxml.etree just fine. The culprit script is
`website/16.0.1.0/pre-convert_html.py`, which ultimately calls the
`HTMLConverter` in `snippets.py`. The HTMLConverter always encloses the
document with tags `<wrap>` and `</wrap>`. This breaks a certain logic in the
`lxml.html.fromstring()` (see
https://github.com/lxml/lxml/blob/2ac88908ffd6df380615c0af35f2134325e4bf30/src/lxml/html/html5parser.py#L184)
and leads to a corrupted result when the converted document is recreated via
`etree.tostring()`: The tags `html`, `head` and `body` are lost.
To fix this, do not add the `wrap` tags if the document looks like a full HTML
doc according to the test and logic of `lxml.html.fromstring()`.
Part of odoo/upgrade#5352
Signed-off-by: Christophe Simonis (chs) <chs@odoo.com>1 parent e71e5a5 commit 648c940
1 file changed
+7
-3
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
199 | 199 | | |
200 | 200 | | |
201 | 201 | | |
202 | | - | |
203 | | - | |
204 | | - | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
205 | 209 | | |
206 | 210 | | |
207 | 211 | | |
| |||
0 commit comments