Skip to content

Create lexical statistics from ESCO skills #51

@ioggstream

Description

@ioggstream

I expect

To have a lexical summary of ESCO skills labels (both label and altLabel) that includes:

  • average number of words for each skill
  • number of unique lemmas for each skill
  • number of total lemmas

Example

labels =  ['assure customer satisfaction',
  'customer satisfaction guarantee',
  'ensure customer satisfaction',
  'guarantee customer satisfaction',
  'guaranteeing customer satisfaction',
  'promise customer satisfaction',
  'provide customer satisfaction',
  'to guarantee customer satisfaction']
  • average number of words for each skill: 3.125
  • number of unique lemmas for each skill: 8
  • lemmas histogram for each skill:
[('customer', 8),
 ('satisfaction', 8),
 ('guarantee', 4),
 ('promise', 1),
 ('to', 1),
 ('ensure', 1),
 ('provide', 1),
 ('assure', 1)]

Metadata

Metadata

Assignees

Labels

documentationImprovements or additions to documentationgood first issueGood for newcomers

Type

No type

Projects

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions