Understand the part of speech structure

Hi,

I am using sudachi-rs as part of a neovim plugin to help with japanese learning. 
I am trying to understand/see all possibilities for the part of speech component to declare my own enums and so on.

I've searched for an enum listing the lexicon types for instance (like ` 名詞`, `助詞`, `補助記号` , I want to know what are the other possible values) but I couldn't find it in here, or in sudachi dict. Unti l I reached https://github.com/WorksApplications/Sudachi but seems like there is no enum whatsoever, the  `名詞` is just part of the dictionary ? Seems like the part of speech is just a list of freeform strings ? There must be some convention though, where can I find such a list ?

My goal is to reproduce in neovim the output of https://www3.nhk.or.jp/news/easy/ne2025073011585/ne2025073011585.html , ie. where locations or people names are highlighted differently .

I wonder if the tokenizer could output json on top of the current format (e.g., with `--output=json`) ? Might not be good for perf but json would self-document the various part of speech fields.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Understand the part of speech structure #304

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Understand the part of speech structure #304

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions