-
Notifications
You must be signed in to change notification settings - Fork 107
predup youtube playlists and tabs, and more... #374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds functionality for processing YouTube playlists and tabs.
- Introduces a new database query in brozzler/ydl.py to capture video URLs via psycopg.
- Updates model and schema files to propagate an account_id field.
- Adds a psycopg dependency in pyproject.toml to support the new database operations.
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| pyproject.toml | Adds psycopg dependency for database connectivity |
| brozzler/ydl.py | Adds get_video_captures function and updates YouTube URL handling |
| brozzler/model.py | Propagates account_id to Site objects in job creation |
| brozzler/job_schema.yaml | Updates schema to include an optional account_id field |
Comments suppressed due to low confidence (1)
brozzler/ydl.py:428
- Using string concatenation with '+' in the SQL LIKE clause is not standard for PostgreSQL. Consider using the concatenation operator '||' (e.g., "... containing_page_url like '%' || %s || '%'") or the CONCAT function.
pg_query = ("SELECT containing_page_url from video where account_id = %s and seed = %s and containing_page_url like '%'+%s+'%'", (account_id, seed, source,))
96aeeb7 to
b5258f4
Compare
brozzler/ydl.py
Outdated
| def get_video_captures(site, source="youtube"): | ||
| if not VIDEO_DATA_SOURCE: | ||
| return None | ||
|
|
||
| if VIDEO_DATA_SOURCE and VIDEO_DATA_SOURCE.startswith("postgresql"): | ||
| import psycopg | ||
|
|
||
| account_id = site.account_id if site.account_id else None | ||
| seed = site.metadata.ait_seed_id if site.metadata.ait_seed_id else None | ||
| if source == "youtube": | ||
| containing_page_url_pattern = "http://youtube.com/watch" # yes, video data canonicalization uses "http" | ||
| # support other sources here | ||
| else: | ||
| containing_page_url_pattern = None | ||
| if account_id and seed and source: | ||
| pg_query = ( | ||
| "SELECT distinct(containing_page_url) from video where account_id = %s and seed = %s and containing_page_url like %s", | ||
| ( | ||
| account_id, | ||
| seed, | ||
| containing_page_url_pattern, | ||
| ), | ||
| ) | ||
| elif seed and source: | ||
| pg_query = ( | ||
| "SELECT containing_page_url from video where seed = %s and containing_page_url like %s", | ||
| (seed, containing_page_url_pattern), | ||
| ) | ||
| else: | ||
| return None | ||
| with psycopg.connect(VIDEO_DATA_SOURCE) as conn: | ||
| with conn.cursor(row_factory=psycopg.rows.scalar_row) as cur: | ||
| cur.execute(pg_query) | ||
| return cur.fetchall() | ||
| return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should consider wrapping this call with an interface as a client
- This would allow us to have the client use pg directly or make an API call
- A client could maintain connections instead re-connecting to PG on every check
- We need to add unit tests before we ship this feature, a client might make unit testing easier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
recent updates include most of this work
46a44cc to
46e17c1
Compare
fe07f55 to
f22fa8f
Compare
f22fa8f to
6a8c62f
Compare
79fd88d to
8abb9cd
Compare
No description provided.