Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring enabling the creation of a lower level API with the Podcast Class #80

Open
wants to merge 54 commits into
base: main
Choose a base branch
from

Conversation

brumar
Copy link
Collaborator

@brumar brumar commented Oct 17, 2024

Creation of the Podcast allowing a fine level of control in the podcast creation as a step machine.

Works with tts and llm engines abiding to simple Abstract Based Classes to ease up future integrations.

tts engines can be both async or sync. If an async tts engine is used, then audio processing takes place in parallel (and in thread for sync engines). It's possible to control the number of jobs in both case.

Introduction to other classes such as Character, Transcript, AudioManager

The behavior of process_content is unchanged and the tests pass.

Created unit tests for the Podcast class (with mocked tts and llm engines)
Added tests for transcript handling, saving, loading and cleaning.

Some code is made obsolete :

  • text_to_speech.py for which the business logic has been put in different places
  • process_links that would be replaced by the body of process_links_v2 which uses the Podcast class). I preferred to wait to the last moment to commit their deletion.

Many things could be improved, but this PR is already too large and should be under discussion beforehand anyway.

@souzatharsis
Copy link
Owner

Further, did you fetch from main before submitting the PR?

@brumar
Copy link
Collaborator Author

brumar commented Oct 17, 2024

Further, did you fetch from main before submitting the PR?

I just merged and push again.

@brumar
Copy link
Collaborator Author

brumar commented Oct 17, 2024

Hi @brumar thanks for this incredible PR. Two pytests are not passing, see above. Could you please fix it before we can Merge?

Thank you !

I see only one failing, but can't really work on it on my computer because I always reach "podcastfy.client:client.py:334 An error occurred: 429 Resource has been exhausted".

But maybe it's a side effect of the very same problem that make this test fail? I don't mind help on this one :).
Edit: I placed a breakpoint on my local tests jjust before self.chain.invoke to see if the content was empty or very short (because the failed tests on github action was saying smthg along these lines), but found the prompt_params was correctly filled.

logger = logging.getLogger(__name__)


class DefaultPodcastifyTranscriptEngine(LLMBackend):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DefaultPodcastifyTranscriptEngine is a class 'hardcoded' in a file named 'gemini_langchain.py'

What if we decide for another base llm model as default?

Further the logic implemented by this class has nothing to do with Gemini nor langchain even though it's in gemini_langchain.py

It does sound like this file is here to be backward compatible with the current version in main.py when instead we should move to a unified version such that LLM generic logic should reside under aiengines>llm and podcast content generation logic (post-llm) should live in content_generator.py

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the phone right now, but it seems that currently it's all about langchain and gemini here? Yes it's absolutely being backward compatible and not forcing other abstractions on the project. I do think you want an abstraction at an intermediate level to easily swap the llm engine but by keeping most of the business logic in this class. But is it something we can do post merge? The current naming and design of this class is not good for sure. The real question is maybe about if you accept or not the current lowest level api for the engines defined by the ABC. There will be another very interesting layer beneath for sure !

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it does not make sense to merge a refactor that we already know will need to be refactored.
Let's merge into main small but frequent PRs that are complete.


logger = setup_logger(__name__)

app = typer.Typer()

def create_characters(config: Dict[str, Any]) -> List[Character]:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Characters are not considering input conversation_config! This is moving us backwards from a functionality perspective.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a function to make this character things backward compatible. The Character primitive is very promising for this project IMO.
But I don't think I really understand this comment, why does it remove current functionalities ?

communicate = edge_tts.Communicate(text, config.voice)
await communicate.save(str(output_path))

# register
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't you register OpenAI Async as well as EdgeTTS Sync?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thank you!

return [host, guest]


def create_tts_backends(config: Config) -> List[TTSBackend]:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should we instantiate all available backends and then later filter out all but the one the user wants? Shouldn't we instead only instantiate what the user needs? I recommend a TTS Factory design pattern.

Copy link
Owner

@souzatharsis souzatharsis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for the massive PR!

I've added a couple of inline comments. But here's the summary recommendation:

This is a large PR, I would say this is not a Merge but instead a full re-rewrite. Besides solving minor comments added now, I'd recommend the following steps:

  1. Sync up with main: There were several critical bugs solved in main that are still present in dev. Further, several tests were added which will increase our confidence with this re-write. Let's sync up the branches first.

  2. Let's break your changes down into smaller PRs so we can incrementally re-write the repo safely. Recommended components in order:

  • First, core + related tests. Here we make sure the new core abstraction works. This update is safe because it won't break the original code.
  • Second, LLMs + related tests + client.py
  • Third, TTS + related tests + client.py

Feel free to recommend a different way to break up the PR since at this point you know better this new proposed version than myself.

With that approach we can more safely re-write the entire package.

What do you think?

@souzatharsis
Copy link
Owner

Also, I'd say step 1, i.e. core/ objects are really the true value add here in terms of abstraction. If we get that done, it's already a major release. It would be important to write a short python notebook showcasing how developers can take advantage of that level of abstraction. It will be helpful even to myself to learn this new way. After that it will be straight-forward to accomplish step 2 and 3, next.

raise ValueError(f"TTS backend '{tts_name}' not configured for this character")
self.preferred_tts = tts_name

def to_prompt(self) -> str:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be the opposite. Character shouldn't even be aware there exists LLMs around them. LLMs are just a technical solution. Instead, when we compose the prompt in the ai engine, character information should be passed to dynamically compose the prompt.

self.role = role
self.tts_configs = tts_configs
self.default_description_for_llm = default_description_for_llm
self.preferred_tts = next(iter(tts_configs.keys()), None) # Set first TTS as default, can be None
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why should a character be aware of TTS models?
A Character should describe attributes and behavior of a Podcast participant.
I'd argue perhaps only the Transcript generation should be aware of Character information.



class AudioManager:
def __init__(self, tts_backends: Dict[str, TTSBackend], audio_format, n_jobs: int = 4, file_prefix: str = "", audio_temp_dir: str = None) -> None:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default values should live in config.yaml

n_jobs: int = 4
file_prefix: str = ""
audio_temp_dir: str = None

def _get_tts_backend(self, segment):
tts_backend = self.tts_backends.get(segment.speaker.preferred_tts)
if tts_backend is None:
# Take the first available TTS backend
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why take the first and not the default or user defined?

"""
self.content = content
self.llm_backend = llm_backend
self.characters: Dict[str, Character] = {char.name: char for char in (characters or [Character("Host", "Podcast host", {}), Character("Guest", "Expert guest", {})])}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should take from conversation_config (default or user provided) instead of static values

@brumar
Copy link
Collaborator Author

brumar commented Oct 29, 2024

Hello @souzatharsis thanks a lot for the review! I do agree with most of your points, including the fact that it should have been done in multiple PRs.
But I am kind of burned and I am scared with the continuing drift. I need a way to close this source of anxiety. So I kind of need to cut the loss if that makes senses. That could be label this PR as an interesting but unmergeable experiment and I could go on with my life.

You can also pick and choose parts that that you think are interesting at your own pace.

The last option would be to bite the bullet, merging this knowing that improvements are to be made, including some important changes in the abstractions suggested in this PR. I did good efforts to add tests and ensure that this branch is backward compatible and that tests are passing (including new ones). If we go this route I can spend (synchronously with you or not) a burst of energy to fix what you see high priority stuff and rebase again on merge.

But I won't be able to sustain the 3 branchs thing, I am totally certain of that. I am not sure I can really divide the work in multiple branchs and focus with a single one with the fact that this Character things is used almost everywhere. But if you see a path that works for you, I could help.

Whatever the option you pick, no ill feelings on my side, working on it was a blast. I hope you will understand my position. I am sorry about the added workload that my frenziness has created on your side.

@souzatharsis
Copy link
Owner

I have an idea. Let me create the mini PRs based on your code. And then you are my reviewer / approver.

That should reduce the burden on you while taking advantage of the great work you did.

What do you think?

@brumar
Copy link
Collaborator Author

brumar commented Oct 29, 2024

Yeah, I think that one of the most comfortable options for me right now, thank you! Edit: If you have made your mind on this, we could change the status of this one as draft maybe?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants