Research Google App Engine pipeline replacements #41

mikelambert · 2025-12-01T05:36:17Z

No description provided.

Comprehensive migration plan covering: - 11 simple mapper jobs → Cloud Run Jobs - 4 MapReduce pipeline jobs → Cloud Run Jobs - 2 custom Pipeline orchestrations → Cloud Workflows - Utility module ports (fb_mapreduce, mr.py) - Infrastructure setup (base image, job framework) - Configuration updates (queue.yaml, Cloud Scheduler) Organized by phase with complexity ratings and FB API requirements.

This commit adds a complete Cloud Run Jobs framework that replaces the legacy App Engine MapReduce and Pipeline libraries. Infrastructure (Phase 1): - dancedeets/jobs/base.py: Job, BatchJob, JobRunner base classes - dancedeets/jobs/fb_utils.py: Facebook API token handling - dancedeets/jobs/metrics.py: Counter/metrics tracking - dancedeets/jobs/gcs_output.py: GCS output writer - dancedeets/jobs/runner.py: CLI entry point - Dockerfile.jobs: Container for Cloud Run Jobs - requirements-jobs.txt: Job dependencies Simple Mapper Jobs (Phase 2): - notify_users.py: Push notifications by timezone - post_japan_events.py: Post Japan events to social - compute_rankings.py: City/country rankings - compute_user_stats.py: User event statistics - refresh_users.py: Refresh Facebook profiles - send_weekly_emails.py: Weekly digest emails GCS Output Jobs (Phase 3): - generate_sitemaps.py: XML sitemap generation - dump_potential_events.py: Export potential events - generate_training_data.py: ML training data - classify_events_ml.py: ML event classification - auto_add_events.py: Auto-add dance events MapReduce Pipeline Replacements (Phase 4): - count_unique_attendees.py: Unique RSVPs by city - update_source_stats.py: Source quality metrics - scrape_and_classify.py: Scrape and classify events Cloud Workflows (Phase 5): - workflows/crawl_and_index_classes.yaml: Orchestration - start_spiders.py: Start ScrapingHub spiders - reindex_classes.py: Rebuild class index - email_crawl_errors.py: Send error reports

This cleanup removes the old mapreduce and pipeline code from original files after migrating all batch processing to Cloud Run Jobs. Files cleaned up (mapreduce code removed, core functions retained): - notifications/added_events.py - kept promote_events_to_user() - sitemaps/events.py - kept generate_sitemap_entry() - ml/gprediction.py - kept predict(), get_predict_service() - users/user_event_tasks.py - kept update_user_qualities() - users/user_tasks.py - kept fetch_and_save_fb_user() - search/email_events.py - kept email_for_user() - pubsub/pubsub_tasks.py - kept social handlers - rankings/rankings.py - kept utility functions - event_scraper/auto_add.py - kept classification logic - event_scraper/thing_db.py - kept Source model - event_scraper/thing_scraper2.py - deprecation stub - classes/class_pipeline.py - deprecation stub Files deleted (fully migrated): - logic/mr_dump.py -> jobs/dump_potential_events.py - logic/unique_attendees.py -> jobs/count_unique_attendees.py - ml/mr_prediction.py -> jobs/classify_events_ml.py The compat/ layer is retained with LEGACY_APIS_ENABLED=False for remaining imports that use json_util.JsonProperty and other utilities.

claude added 3 commits December 1, 2025 02:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Research Google App Engine pipeline replacements #41

Research Google App Engine pipeline replacements #41

Uh oh!

mikelambert commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Research Google App Engine pipeline replacements #41

Are you sure you want to change the base?

Research Google App Engine pipeline replacements #41

Uh oh!

Conversation

mikelambert commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants