Skip to content

Conversation

E-B3rry
Copy link

@E-B3rry E-B3rry commented Jan 11, 2024

Related issue: #18

The site_restricted_api will soon be dead and can currently only be used by old customers, simple fix using the non restricted one by editing the endpoint url.
Tested a typical search engine on my end, restricted to all the websites made to be scraped and all the three tests passed.

The site_restricted_api will soon be dead and can currently only be used by old customers.
Tested a typical search engine on my end, restricted to all the websites made to be scraped and all the three tests passed.
@i-vis
Copy link

i-vis commented Dec 19, 2024

Thank you for this.
It also seems like scrapers from genius don't work well, I fixed it by replacing line 35 in lyrics.py to all_extracts = self.source_code.select('div[class*="Lyrics-sc-"]'). It works for me with the changes in this PR and the before mentioned change at this moment of time.

'lyricsmint': scraper_factory.lyricsmint_scraper,
}

headers = {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to pass the user agent? Is this only to trick the websites thinking it's a legitimate user so it doesn't get blocked by the firewalls?

Does it fail without this header?

@RishabhAcodes
Copy link

@i-vis Can you please raise a PR for the same. I'll merge the same from my other account.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants