Skip to content

verbatimEventDate issues #78

@Mesibov

Description

@Mesibov

Some apparently valid verbatimEventDate (vED) entries do not have a corresponding eventDate (eD). These are listed with their id's in the attached valid-vED-no-eD.txt.

I also found 239 disagreements between vED and eD in 993 records where both fields have entries. These are listed by disagreement (sorted by vED, with number of records for each) in vED-eD-disagreements-by-type.txt, and by record in vED-eD-disagreements-by-record.txt (sorted by record id).

I've excluded from vED-eD disagreements all the various formatting variations and doubtful constructs, such as

  • single years expanded as interval dates, as in "1953" > "1953-01-01/1953-12-31"
  • single months expanded as interval dates, as in "1954-06" > "1954-06-01/1954-06-30", or single dates, as in ""1954-06" > "1954-06-01"
  • single days expanded as pseudo-interval dates, as in "17 Jan 1943" > "1943-01-17/1943-01-17"
  • seasonal dates, as in "Summer 1887" > "1887-06-01/1887-08-31"
  • various malformed vED entries

Formatting variations have greatly increased the number of eDs for the same unique vED, as in this example:

No. of records | verbatimEventDate | eventDate
9 | 20-July-1999 | 1999-07-20/1999-07-20
3 | 20-July-1999 | 1999-07-20/1999-07-21 #error?
2 | 20-July-1999 | 1999-07-20
2 | 20-July-1999 | 1999-06-20/1999-06-20 #apparent error
5 | 20-July-1999 | 1999-07-19/1999-07-21 #error?

Two sorts of ambiguities affect vED-eD relationships. One is that both DMY and MDY constructions appear in vED. Where these look questionable I've included them in the disagreements list.

The second ambiguity is that where only 2 digits are used to designate years in vED, the eD may have the wrong century. The eD compiler may have used formatting or information from other fields to decide on a century, so I've trusted the resulting eDs. To take just one example of many, from vED alone it's hard to understand why these two vEDs are assigned to different centuries:

id | vED | eD
6677199 | 1-VII-90 | 1890-07-01
7031601 | 1-VIII-90 | 1990-08-01

I'll look at possible "century errors" in a later issue here.

Finally, these 3 records have the invalid verbatimEventDate "31.ii.2021":

id | verbatimEventDate | recordedBy | verbatimLatitude | verbatimLongitude
6664453 | 31-ii-2021 | JLR MAR | 42.442 | -88.23489
6664454 | 31-ii-2021 | JLR MAR | 42.442 | -88.23489
6664469 | 31-ii-2021 | JLR MAR | 42.44200 | -88.23489

but they look to be part of a series and the lat/lons match for 31 March 2021:

...
6664550 | 30-III- 2021 | JLR MAR | 41.30246 | -89.03896
6664379 | 31-iii-2021 | JLR, MAR | 42.44200N | -88.23489W
6664445 | 1-IV-2021 | JLR MAR | 41.33954 | -89.04424
...

valid-vED-no-eD.txt

vED-eD-disagreements-by-record.txt

vED-eD-disagreements-by-type.txt

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions