-
Notifications
You must be signed in to change notification settings - Fork 134
Description
Describe the bug
The splitter's methods _move_to_comma_or_closing_curly_bracket and _move_to_closed_bracket each contain a check for unexpected block starts. Unfortunately, this interferes with the parsing of entries that contain the @ sign as raw text.
Reproducing
Version: 2.0.0b7
Code:
This example parse fails because of the @ in the title, raising a BlockAbortedException and adding the block to failed_blocks.
test = bibtexparser.parse_string(
"""
@inproceedings{DBLP:conf/cikm/EsuliM021,
author = {Andrea Esuli and Alejandro Moreo and Fabrizio Sebastiani},
editor = {Gao Cong and Maya Ramanath},
title = {LeQua @ {CLEF} 2022: {A} Shared Task for Evaluating Quantification Systems},
booktitle = {Proceedings of the {CIKM} 2021 Workshops co-located with 30th {ACM}
International Conference on Information and Knowledge Management {(CIKM}
2021), Gold Coast, Queensland, Australia, November 1-5, 2021},
series = {{CEUR} Workshop Proceedings},
volume = {3052},
publisher = {CEUR-WS.org},
year = {2021},
url = {https://ceur-ws.org/Vol-3052/abstract4.pdf},
timestamp = {Fri, 10 Mar 2023 16:22:33 +0100},
biburl = {https://dblp.org/rec/conf/cikm/EsuliM021.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
"""
)
print(test.entries_dict['DBLP:conf/cikm/EsuliM021'])Bibtex:
@inproceedings{DBLP:conf/cikm/EsuliM021,
author = {Andrea Esuli and Alejandro Moreo and Fabrizio Sebastiani},
editor = {Gao Cong and Maya Ramanath},
title = {LeQua @ {CLEF} 2022: {A} Shared Task for Evaluating Quantification Systems},
booktitle = {Proceedings of the {CIKM} 2021 Workshops co-located with 30th {ACM}
International Conference on Information and Knowledge Management {(CIKM}
2021), Gold Coast, Queensland, Australia, November 1-5, 2021},
series = {{CEUR} Workshop Proceedings},
volume = {3052},
publisher = {CEUR-WS.org},
year = {2021},
url = {https://ceur-ws.org/Vol-3052/abstract4.pdf},
timestamp = {Fri, 10 Mar 2023 16:22:33 +0100},
biburl = {https://dblp.org/rec/conf/cikm/EsuliM021.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Workaround
Monkey-patching the two methods by removing the @ check leads to a successful parse.
Remaining Questions (Optional)
- I would be willing to contribute a PR to fix this issue.
- This issue is a blocker, I'd be grateful for an early fix.
It says in the code that new blocks are identified by being after a new line. If that assumption is generally safe to make, I could remove the two checks altogether. The only other solution I could think of is replacing the "@" check with a tuple of the most common entry types, e.g. startswith(("@article", "@book", "@proceedings", ...)). Let me know if one of those works and I'll gladly prepare a PR.