faster/simpler Age / DerivedAge.txt generation

We currently compute a character's Age by iterating over all Unicode versions from 1.1 to latest=dev and returning the first version where its code point is no longer unassigned.

This means that we have to load/parse data files for each Unicode version.

This should be unnecessary, because we already have a DerivedAge.txt file from at least the last version. We should be able to:
- parse the dev=latest file
- recent additions: take the set of `[[:age=NA:]&[:^gc=Cn:]]` and map them to the latest version
- recent removals: take the set of `[[:^age=NA:]&[:gc=Cn:]]` and map them to "NA"

We should move the old logic into a test, to verify that the simple logic yields the same Age property map as the brute force code.

Discussion: See this comment and replies to it:
https://github.com/unicode-org/unicodetools/pull/1116#issuecomment-2848036146

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

faster/simpler Age / DerivedAge.txt generation #1124

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

faster/simpler Age / DerivedAge.txt generation #1124

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions