Skip to content

Improve handling of Roman numerals #1

@waldoj

Description

@waldoj

@twneale points out a use case that is not allowed for, but that should be:

(h) Lorem ipsum dolor sit amet, consectetur adipiscing elit.
    (i) Integer tincidunt, sem eu pretium condimentum.
    (ii) Sed dui justo, euismod nec mattis a, aliquet quis ante.
(i) Nulla dapibus sem et ligula consectetur vitae sagittis arcu varius.
(j) Proin a mauris sit amet enim ullamcorper ultricies vitae id lectus.

This is a non-trivial modification, because it requires statefulness—an understanding, upon “realizing” that it’s in the midst of a list of Roman numerals, that it must backtrack, reevaluate where that list began, and modify the ancestry of those subsections accordingly. If it encountered only a single subsection of (i), that's especially problematic, because it’s two “i”s in a row, and there’s no hint available that one of them should be a Roman numeral and, thus, a child of (h). That requires an understanding of order (alphabetic, numeric, and Roman numeric) that is not currently present in this, but that seems conceptually straightforward to add.

Thom has found the example problem within the U.S. Code, so it’s not merely hypothetical.

Realistically, this is two problems. The first is the ability to recognize and handle Roman numerals properly, which is to say to understand that "i" isn't necessarily the same as "i". Second is the ability to look ahead and understand the unusual-but-extant problem of the use of the Roman numeral "i" following immediately "h."

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions