'\n' mixed in Vocabulary['token']

it seems that counter in vocabulary is counting 'token' tokens with a newline character.
for example, vocabulary.pkl in java-small dataset, i can find
                   'return': 6020684,
and
                   'return\n': 33290,
separately.

i personally fixed this problem by stripping path_context on Vocabulary._process_raw_sample,
but im little confused whether this problem(mixing '\n' in tokens) is intended.

thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

'\n' mixed in Vocabulary['token'] #111

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

'\n' mixed in Vocabulary['token'] #111

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions