Skip to content

Commit ada8ff0

Browse files
committed
Keep h1 and other headings
Even though using h1 tags for sections inside an article is semantically wrong, a lot of websites are doing it anyway. So the idea here is to stop stripping headings, including h1 on Readability's side. Fixes wallabag/wallabag#5805 Signed-off-by: Kevin Decherf <kevin@kdecherf.com>
1 parent c506b7e commit ada8ff0

File tree

1 file changed

+0
-12
lines changed

1 file changed

+0
-12
lines changed

src/Readability.php

Lines changed: 0 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -427,18 +427,6 @@ public function prepArticle(\DOMNode $articleContent)
427427
$this->clean($articleContent, 'object');
428428
$this->clean($articleContent, 'iframe');
429429
$this->clean($articleContent, 'canvas');
430-
$this->clean($articleContent, 'h1');
431-
432-
/*
433-
* If there is only one h2, they are probably using it as a main header, so remove it since we
434-
* already have a header.
435-
*/
436-
$h2s = $articleContent->getElementsByTagName('h2');
437-
if (1 === $h2s->length && mb_strlen($this->getInnerText($h2s->item(0), true, true)) < 100) {
438-
$this->clean($articleContent, 'h2');
439-
}
440-
441-
$this->cleanHeaders($articleContent);
442430

443431
// Do these last as the previous stuff may have removed junk that will affect these.
444432
$this->cleanConditionally($articleContent, 'form');

0 commit comments

Comments
 (0)