Skip to content

faster/simpler Age / DerivedAge.txt generation #1124

@markusicu

Description

@markusicu

We currently compute a character's Age by iterating over all Unicode versions from 1.1 to latest=dev and returning the first version where its code point is no longer unassigned.

This means that we have to load/parse data files for each Unicode version.

This should be unnecessary, because we already have a DerivedAge.txt file from at least the last version. We should be able to:

  • parse the dev=latest file
  • recent additions: take the set of [[:age=NA:]&[:^gc=Cn:]] and map them to the latest version
  • recent removals: take the set of [[:^age=NA:]&[:gc=Cn:]] and map them to "NA"

We should move the old logic into a test, to verify that the simple logic yields the same Age property map as the brute force code.

Discussion: See this comment and replies to it:
#1116 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions