Skip to content

Conversation

@galgeek
Copy link
Contributor

@galgeek galgeek commented Jun 21, 2025

No description provided.

@galgeek galgeek requested a review from Copilot June 21, 2025 00:35
@galgeek galgeek marked this pull request as draft June 21, 2025 00:35
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds functionality for processing YouTube playlists and tabs.

  • Introduces a new database query in brozzler/ydl.py to capture video URLs via psycopg.
  • Updates model and schema files to propagate an account_id field.
  • Adds a psycopg dependency in pyproject.toml to support the new database operations.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
pyproject.toml Adds psycopg dependency for database connectivity
brozzler/ydl.py Adds get_video_captures function and updates YouTube URL handling
brozzler/model.py Propagates account_id to Site objects in job creation
brozzler/job_schema.yaml Updates schema to include an optional account_id field
Comments suppressed due to low confidence (1)

brozzler/ydl.py:428

  • Using string concatenation with '+' in the SQL LIKE clause is not standard for PostgreSQL. Consider using the concatenation operator '||' (e.g., "... containing_page_url like '%' || %s || '%'") or the CONCAT function.
        pg_query = ("SELECT containing_page_url from video where account_id = %s and seed = %s and containing_page_url like '%'+%s+'%'", (account_id, seed, source,))

@galgeek galgeek self-assigned this Jun 23, 2025
@galgeek galgeek force-pushed the predup_type_playlist branch from 96aeeb7 to b5258f4 Compare June 23, 2025 23:51
brozzler/ydl.py Outdated
Comment on lines 423 to 457
def get_video_captures(site, source="youtube"):
if not VIDEO_DATA_SOURCE:
return None

if VIDEO_DATA_SOURCE and VIDEO_DATA_SOURCE.startswith("postgresql"):
import psycopg

account_id = site.account_id if site.account_id else None
seed = site.metadata.ait_seed_id if site.metadata.ait_seed_id else None
if source == "youtube":
containing_page_url_pattern = "http://youtube.com/watch" # yes, video data canonicalization uses "http"
# support other sources here
else:
containing_page_url_pattern = None
if account_id and seed and source:
pg_query = (
"SELECT distinct(containing_page_url) from video where account_id = %s and seed = %s and containing_page_url like %s",
(
account_id,
seed,
containing_page_url_pattern,
),
)
elif seed and source:
pg_query = (
"SELECT containing_page_url from video where seed = %s and containing_page_url like %s",
(seed, containing_page_url_pattern),
)
else:
return None
with psycopg.connect(VIDEO_DATA_SOURCE) as conn:
with conn.cursor(row_factory=psycopg.rows.scalar_row) as cur:
cur.execute(pg_query)
return cur.fetchall()
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consider wrapping this call with an interface as a client

  • This would allow us to have the client use pg directly or make an API call
  • A client could maintain connections instead re-connecting to PG on every check
  • We need to add unit tests before we ship this feature, a client might make unit testing easier

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recent updates include most of this work

@galgeek galgeek force-pushed the predup_type_playlist branch 5 times, most recently from 46a44cc to 46e17c1 Compare July 1, 2025 00:34
@galgeek galgeek force-pushed the predup_type_playlist branch from fe07f55 to f22fa8f Compare August 15, 2025 18:52
@galgeek galgeek force-pushed the predup_type_playlist branch from f22fa8f to 6a8c62f Compare August 15, 2025 18:55
@galgeek galgeek force-pushed the predup_type_playlist branch from 79fd88d to 8abb9cd Compare August 15, 2025 19:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants