Django Techcrunch Scrapper
DjangoTechcrunchScrapper is a Django app to scrape Techcrunch.com website items . Scrapped Data are authors , categories , articles . Application development and testing with django v4.2
-
Install all the packages and requirements with :
pip install -r requirements.txt -
Install broker manager like
rabbitmqorredis -
Set specific and custome settings for you project in
settings.py -
Set specific and custome settings for you celery in Celery name space in
settings.py# CELERY-SETCION CELERY_BROKER_URL = 'amqp://localhost:your port' (for rabbitmq) CELERY_TIMEZONE = 'Your timezone' CELERY_TASK_TIME_LIMIT = 60 * 60 CELERY_RESULT_BACKEND = 'django-db' CELERY_TASK_SERIALIZER = 'json' CELERY_RESULT_SERIALIZER = 'json' -
open terminal and make migrations for
models:python manage.py makemigrations python manage.py migrate -
First of all set the celery beat schedule, go to
celery.pyand find schedule , change it by second to change schedule:app.conf.beat_schedule = { 'every-day-start-daily-scrape': { 'task': 'techcrunch.tasks.daily_scrape_task', 'schedule': 86400, # One day }, } -
Before all the things you should be logged in to use specific services , so at first:
py manage.py createsuperuser -
Then log in with url
host:port/admin -
After setting celery settings call
celery-beatandcelery-workerwith each other in two cmd terminal:celery -A techcrunch_scrapper_with_django worker -l INFO -P eventlet celery -A techcrunch_scrapper_with_django beat --loglevel=INFO -
Then at last run the django server and run the app :
python manage.py runserver -
Links description :
admin/ => admin panel manual_daily_search [name='manual_daily_search'] => manual daily scrapping with out celery beat search_keyword [name='search_keyword'] => search by keyword page diagrams/<slug:model_name> [name='diagrams'] => draw diagrams : diagrams/author => number of articles of each author diagrams/category => number of articles of each category diagrams/article => number of articles seach by keyword -
The result of diagram generating , will be saved in
basedirectory / exports ...