-
Notifications
You must be signed in to change notification settings - Fork 117
Open
Description
pywebcopy/pywebcopy/parsers.py
Line 104 in 9f35b4b
data = source.read(0o3000) |
Here if it breaks between a href
, nothing will be further parsed.
See example:
Wrong:
from lxml import etree
parser = etree.HTMLPullParser()
for data in (b'<root><a href="2011-03-13_', b'135411/">2011-03-13_135411/</a></root>',):
parser.feed(data)
for _, elem in parser.read_events():
print(elem.tag) # nothing
parser.close()
Expected:
from lxml import etree
parser = etree.HTMLPullParser()
for data in (b'<root><a href="2011-03-13_135411/">2011-03-13_135411/</a></root>',):
parser.feed(data)
for _, elem in parser.read_events():
print(elem.tag) # a root
parser.close()
It may be better just to feed all at once.
parser.feed(source.fp.data)
for event, element in parser.read_events():
for child in links(element):
if child is None:
continue
yield child
Metadata
Metadata
Assignees
Labels
No labels