Skip to content

[Discussion] Performance in search dates #853

@surkova

Description

@surkova

I'm working on a Flask app which does some markup parsing. One of the things it does is it parses strings like Arriving tomorrow by 9pm or Delivered on Friday. All of the strings are in English and they are short. Today I bumped the version of dateparser from 0.7.6 to 1.0.0 and this is what I saw in the distribution metrics (p50, p95, p99) of the function calling search (function abridged):

STATUS_TEXT_DELIVERED = re.compile(r"delivered", re.IGNORECASE)
settings = {
    "PREFER_DATES_FROM": "past"
    if bool(STATUS_TEXT_DELIVERED.search(text))
    else "future",
}
search_results = search_dates(text, languages=["en"], settings=settings)

Screen Shot 2020-12-03 at 21 50 07
One thing which strikes me most is huge latency spikes when the app is rebooted on deploy and how it calms down after some significant amount of time. This function is currently called around 20 times per minute, but we are expecting this number to grow to at least 400 rpm. On the screenshot you can see three deploys (red stripes).

Now, I have a very limited insight into what performance instrumentation you've been using, but what would be the easiest way to pinpoint what's happening with the search right after it starts from scratch? And why does it take so long to figure out the happy state?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions