Skip to content

Make an improvement in accuracy of validated rows #4

@brycemcd

Description

@brycemcd

NOTE: after a very brief analysis (TODO: put link to analysis here), the algo in v. 0.0.1 is making a lot of mistakes in identifying valid and invalid taxi trips

A big lift can be made by making a few tweaks to the auditing workflow to appreciate the distribution of continuous values conditioned on the ratecode_id

Some analysis has shown that fitting the data to a linear regression conditioned on ratecode_id teases out some structure of the data and should provide a means of reducing the false positive / false negative rate during validation.

  • From the analysis, extract the linear regression coefficients and apply them to the extreme-numeric? function

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions