You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Feb 4, 2020. It is now read-only.
Most of the files process ok but some (for example ab.bib.14.20160401.full.mrc) produce errors when processing. The error I am getting is:
Traceback (most recent call last):
File "domark.py", line 21, in <module>
for record in reader:
File "/Library/Python/2.7/site-packages/six.py", line 535, in next
return type(self).__next__(self)
File "/Users/markwatkins/Sites/pharvard/pymarc/reader.py", line 97, in __next__
utf8_handling=self.utf8_handling)
File "/Users/markwatkins/Sites/pharvard/pymarc/record.py", line 74, in __init__
utf8_handling=utf8_handling)
File "/Users/markwatkins/Sites/pharvard/pymarc/record.py", line 307, in decode_marc
code = subfield[0:1].decode('ascii')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 0: ordinal not in range(128)
The driver code I am using is:
#!/usr/bin/python -tt
# -*- coding: utf-8 -*-
import codecs
import sys
from pymarc import MARCReader
UTF8Writer = codecs.getwriter('utf8')
sys.stdout = UTF8Writer(sys.stdout)
if len(sys.argv) >= 2:
files = [sys.argv[1]]
for file in files:
with open(file, 'rb') as fh:
reader = MARCReader(fh, utf8_handling='ignore')
for record in reader:
# print "%s by %s" % (record.title(), record.author())
print(record.as_json())
Other MARC processing tools (e.g. MarcEdit seem to process the file with no issues so I think the file is legitimate).
Am I doing something wrong? Is there an issue with pymarc, possibly UTF-8 processing related?