-
Notifications
You must be signed in to change notification settings - Fork 0
Description
The dataset is clean UTF-8 but contains several unwanted control characters and the unnecessary formatting character "non-breaking space". The records involved are listed in the attached text file with their "id", field name and field entry, with the unwanted character replaced by "{HERE}". The DEL is particularly worrying to see.
Output from "gremlins" (https://www.datafix.com.au/cookbook/characters3.html#1):
carriage return (CR, u000d, 0d): none
non-breaking space (NBSP, u00a0, c2 a0): 116 in 19 records
soft hyphen (SHY, u00ad, c2 ad): none
zero-width space (ZWSP, u200b, e2 80 8b): none
Checking now for gremlin control characters, please wait...
data link escape (DLE, u0010, 10): 1 in 1 records
delete (DEL, u007f, 7f): 241 in 241 records
single character introducer (SCI, u009a, c2 9a): 1 in 1 records