-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Some apparently valid verbatimEventDate (vED) entries do not have a corresponding eventDate (eD). These are listed with their id's in the attached valid-vED-no-eD.txt.
I also found 239 disagreements between vED and eD in 993 records where both fields have entries. These are listed by disagreement (sorted by vED, with number of records for each) in vED-eD-disagreements-by-type.txt, and by record in vED-eD-disagreements-by-record.txt (sorted by record id).
I've excluded from vED-eD disagreements all the various formatting variations and doubtful constructs, such as
- single years expanded as interval dates, as in "1953" > "1953-01-01/1953-12-31"
- single months expanded as interval dates, as in "1954-06" > "1954-06-01/1954-06-30", or single dates, as in ""1954-06" > "1954-06-01"
- single days expanded as pseudo-interval dates, as in "17 Jan 1943" > "1943-01-17/1943-01-17"
- seasonal dates, as in "Summer 1887" > "1887-06-01/1887-08-31"
- various malformed vED entries
Formatting variations have greatly increased the number of eDs for the same unique vED, as in this example:
No. of records | verbatimEventDate | eventDate
9 | 20-July-1999 | 1999-07-20/1999-07-20
3 | 20-July-1999 | 1999-07-20/1999-07-21 #error?
2 | 20-July-1999 | 1999-07-20
2 | 20-July-1999 | 1999-06-20/1999-06-20 #apparent error
5 | 20-July-1999 | 1999-07-19/1999-07-21 #error?
Two sorts of ambiguities affect vED-eD relationships. One is that both DMY and MDY constructions appear in vED. Where these look questionable I've included them in the disagreements list.
The second ambiguity is that where only 2 digits are used to designate years in vED, the eD may have the wrong century. The eD compiler may have used formatting or information from other fields to decide on a century, so I've trusted the resulting eDs. To take just one example of many, from vED alone it's hard to understand why these two vEDs are assigned to different centuries:
id | vED | eD
6677199 | 1-VII-90 | 1890-07-01
7031601 | 1-VIII-90 | 1990-08-01
I'll look at possible "century errors" in a later issue here.
Finally, these 3 records have the invalid verbatimEventDate "31.ii.2021":
id | verbatimEventDate | recordedBy | verbatimLatitude | verbatimLongitude
6664453 | 31-ii-2021 | JLR MAR | 42.442 | -88.23489
6664454 | 31-ii-2021 | JLR MAR | 42.442 | -88.23489
6664469 | 31-ii-2021 | JLR MAR | 42.44200 | -88.23489
but they look to be part of a series and the lat/lons match for 31 March 2021:
...
6664550 | 30-III- 2021 | JLR MAR | 41.30246 | -89.03896
6664379 | 31-iii-2021 | JLR, MAR | 42.44200N | -88.23489W
6664445 | 1-IV-2021 | JLR MAR | 41.33954 | -89.04424
...