Improve handling of Roman numerals

@twneale points out [a use case](https://twitter.com/twneale/status/306080682491396096) that is not allowed for, but that should be:

```
(h) Lorem ipsum dolor sit amet, consectetur adipiscing elit.
    (i) Integer tincidunt, sem eu pretium condimentum.
    (ii) Sed dui justo, euismod nec mattis a, aliquet quis ante.
(i) Nulla dapibus sem et ligula consectetur vitae sagittis arcu varius.
(j) Proin a mauris sit amet enim ullamcorper ultricies vitae id lectus.
```

This is a non-trivial modification, because it requires statefulness—an understanding, upon “realizing” that it’s in the midst of a list of Roman numerals, that it must backtrack, reevaluate where that list began, and modify the ancestry of those subsections accordingly. If it encountered only a single subsection of `(i)`, that's especially problematic, because it’s two “i”s in a row, and there’s no hint available that one of them should be a Roman numeral and, thus, a child of `(h)`. That requires an understanding of order (alphabetic, numeric, and Roman numeric) that is not currently present in this, but that seems conceptually straightforward to add.

Thom has found the example problem within the U.S. Code, so it’s not merely hypothetical.

Realistically, this is two problems. The first is the ability to recognize and handle Roman numerals properly, which is to say to understand that "i" isn't necessarily the same as "i". Second is the ability to look ahead and understand the unusual-but-extant problem of the use of the Roman numeral "i" following immediately "h."


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve handling of Roman numerals #1

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve handling of Roman numerals #1

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions