-
Notifications
You must be signed in to change notification settings - Fork 272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add support for raw text file as input #49
base: main
Are you sure you want to change the base?
Conversation
@souzatharsis @mobarski raw_text absolutely needs to be supported, I think local files already does support local files which are real urls in the end. but I think the signature of generate_podcast starts to look weird. As "urls" already do some magic by determining the parsing strategy, can we go one step further and replace Maybe this very simple content abstration would be useful in my design PR too @souzatharsis ? |
@@ -97,6 +103,9 @@ def main( | |||
transcript: typer.FileText = typer.Option( | |||
None, "--transcript", "-t", help="Path to a transcript file" | |||
), | |||
raw_file: typer.FileText = typer.Option( | |||
None, "--raw-file", "-r", help="File containing raw text" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a bit confusing: the user is passing raw text but the flag is raw file. It should be consistent. Consider renaming the flag to simply --text
@@ -182,6 +198,7 @@ def generate_podcast( | |||
Args: | |||
urls (Optional[List[str]]): List of URLs to process. | |||
url_file (Optional[str]): Path to a file containing URLs, one per line. | |||
raw_text (Optional[str]): Text to process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about calling it simply text, instead of raw_text
else: | ||
combined_content = "" # Empty string if no URLs provided | ||
|
||
# Generate Q&A content | ||
random_filename = f"transcript_{uuid.uuid4().hex}.txt" | ||
transcript_filepath = os.path.join(config.get('output_directories')['transcripts'], random_filename) | ||
qa_content = content_generator.generate_qa_content( | ||
combined_content, image_file_paths=image_paths or [], output_filepath=transcript_filepath | ||
combined_content, | ||
#image_file_paths=image_paths or [], # FIXME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't commenting this prevent podcastfy from generating audio from images?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor suggested update in CLI / package args
major question regarding line removed that would prevent podcastfy from generating audio from images
I've added the ability to use raw text file as the input as it makes podcastfy much more versatile.
Before we could either use a list of urls, for which the content generator was used, or the final transcript. There was to ability to use raw text for the content generator. This PR adds
raw_text
param to the process_content function andraw_file
to the CLI (I've tried to keep the naming convention).