Skip to content

Conversation

@devin-ai-integration
Copy link
Contributor

Update extract_ecs.py to coexist with realtime note extraction (Issue #177)

Summary

Modified extract_ecs.py to work alongside the realtime note extraction lambda. Previously, the TSV extraction would only add new notes and skip existing ones. Now it updates existing records with complete TSV data, enabling the realtime lambda to create partial records and the daily TSV job to fill in the missing fields.

Key changes:

  • Changed time window from "all data since last note" to "last 2 days only" to reduce unnecessary processing
  • Modified logic to UPDATE existing RowNoteRecord entries instead of skipping them
  • Removed unused imports (timezone, RowPostRecord)
  • SQS enqueueing remains only for new notes (updated notes were already processed by realtime lambda)

Review & Testing Checklist for Human

⚠️ IMPORTANT: This PR could not be tested locally due to infrastructure dependencies (PostgreSQL, Twitter API, AWS services). Please test carefully before merging.

  • End-to-end test: Run realtime lambda to create partial notes, then run extract_ecs.py and verify that existing notes are updated with complete TSV data (all fields populated)
  • Verify 2-day window: Confirm that processing only the last 2 days is sufficient and correct (was previously using settings.COMMUNITY_NOTE_DAYS_AGO = 3 days)
  • Check SQLAlchemy update behavior: Verify that the setattr() update pattern correctly commits changes to existing records (the rows_to_update list is created but relies on SQLAlchemy session tracking)
  • Confirm SQS behavior: Verify that updated notes should NOT be enqueued to SQS (only new notes need language/topic detection)
  • Test with edge cases: What happens when realtime lambda creates notes with null/empty fields? Does the update properly overwrite them?

Test Plan

  1. Manually trigger realtime lambda to create some partial RowNoteRecords
  2. Run extract_ecs.py for those same notes' dates
  3. Query database to verify existing records were updated with complete TSV data
  4. Check SQS queues to ensure only new notes were enqueued

Notes

- Change time window to process only last 2 days instead of all data since last note
- Update existing RowNoteRecord entries with TSV data instead of skipping them
- Keep row_note_status_history logic as daily update (unchanged)
- Remove unused imports (timezone, RowPostRecord)

Fixes #177

Co-Authored-By: [email protected] <[email protected]>
@devin-ai-integration
Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@yu23ki14 yu23ki14 merged commit be4478b into main Oct 31, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TSVデータの取得をおこなっているextract_ecs.pyをアップデート

2 participants