Refactoring enabling the creation of a lower level API with the Podcast Class #80

brumar · 2024-10-17T20:37:16Z

Creation of the Podcast allowing a fine level of control in the podcast creation as a step machine.

Works with tts and llm engines abiding to simple Abstract Based Classes to ease up future integrations.

tts engines can be both async or sync. If an async tts engine is used, then audio processing takes place in parallel (and in thread for sync engines). It's possible to control the number of jobs in both case.

Introduction to other classes such as Character, Transcript, AudioManager

The behavior of process_content is unchanged and the tests pass.

Created unit tests for the Podcast class (with mocked tts and llm engines)
Added tests for transcript handling, saving, loading and cleaning.

Some code is made obsolete :

text_to_speech.py for which the business logic has been put in different places
process_links that would be replaced by the body of process_links_v2 which uses the Podcast class). I preferred to wait to the last moment to commit their deletion.

Many things could be improved, but this PR is already too large and should be under discussion beforehand anyway.

Lower level api final

souzatharsis · 2024-10-17T20:54:11Z

Further, did you fetch from main before submitting the PR?

brumar · 2024-10-17T22:12:30Z

Further, did you fetch from main before submitting the PR?

I just merged and push again.

brumar · 2024-10-17T22:18:57Z

Hi @brumar thanks for this incredible PR. Two pytests are not passing, see above. Could you please fix it before we can Merge?

Thank you !

I see only one failing, but can't really work on it on my computer because I always reach "podcastfy.client:client.py:334 An error occurred: 429 Resource has been exhausted".

But maybe it's a side effect of the very same problem that make this test fail? I don't mind help on this one :).
Edit: I placed a breakpoint on my local tests jjust before self.chain.invoke to see if the content was empty or very short (because the failed tests on github action was saying smthg along these lines), but found the prompt_params was correctly filled.

souzatharsis · 2024-10-27T12:28:14Z

podcastfy/aiengines/llm/gemini_langchain.py

+logger = logging.getLogger(__name__)
+
+
+class DefaultPodcastifyTranscriptEngine(LLMBackend):


DefaultPodcastifyTranscriptEngine is a class 'hardcoded' in a file named 'gemini_langchain.py'

What if we decide for another base llm model as default?

Further the logic implemented by this class has nothing to do with Gemini nor langchain even though it's in gemini_langchain.py

It does sound like this file is here to be backward compatible with the current version in main.py when instead we should move to a unified version such that LLM generic logic should reside under aiengines>llm and podcast content generation logic (post-llm) should live in content_generator.py

On the phone right now, but it seems that currently it's all about langchain and gemini here? Yes it's absolutely being backward compatible and not forcing other abstractions on the project. I do think you want an abstraction at an intermediate level to easily swap the llm engine but by keeping most of the business logic in this class. But is it something we can do post merge? The current naming and design of this class is not good for sure. The real question is maybe about if you accept or not the current lowest level api for the engines defined by the ABC. There will be another very interesting layer beneath for sure !

To me it does not make sense to merge a refactor that we already know will need to be refactored.
Let's merge into main small but frequent PRs that are complete.

souzatharsis · 2024-10-27T12:29:06Z

podcastfy/client.py


 logger = setup_logger(__name__)

 app = typer.Typer()

+def create_characters(config: Dict[str, Any]) -> List[Character]:


Characters are not considering input conversation_config! This is moving us backwards from a functionality perspective.

This is just a function to make this character things backward compatible. The Character primitive is very promising for this project IMO.
But I don't think I really understand this comment, why does it remove current functionalities ?

souzatharsis · 2024-10-27T12:30:07Z

podcastfy/aiengines/tts/tts_backends.py

+        communicate = edge_tts.Communicate(text, config.voice)
+        await communicate.save(str(output_path))
+
+# register


Shouldn't you register OpenAI Async as well as EdgeTTS Sync?

Yes, thank you!

souzatharsis · 2024-10-27T12:31:38Z

podcastfy/client.py

+    return [host, guest]
+
+
+def create_tts_backends(config: Config) -> List[TTSBackend]:


Why should we instantiate all available backends and then later filter out all but the one the user wants? Shouldn't we instead only instantiate what the user needs? I recommend a TTS Factory design pattern.

souzatharsis

Hi, thanks for the massive PR!

I've added a couple of inline comments. But here's the summary recommendation:

This is a large PR, I would say this is not a Merge but instead a full re-rewrite. Besides solving minor comments added now, I'd recommend the following steps:

Sync up with main: There were several critical bugs solved in main that are still present in dev. Further, several tests were added which will increase our confidence with this re-write. Let's sync up the branches first.
Let's break your changes down into smaller PRs so we can incrementally re-write the repo safely. Recommended components in order:

First, core + related tests. Here we make sure the new core abstraction works. This update is safe because it won't break the original code.
Second, LLMs + related tests + client.py
Third, TTS + related tests + client.py

Feel free to recommend a different way to break up the PR since at this point you know better this new proposed version than myself.

With that approach we can more safely re-write the entire package.

What do you think?

souzatharsis · 2024-10-27T12:52:10Z

Also, I'd say step 1, i.e. core/ objects are really the true value add here in terms of abstraction. If we get that done, it's already a major release. It would be important to write a short python notebook showcasing how developers can take advantage of that level of abstraction. It will be helpful even to myself to learn this new way. After that it will be straight-forward to accomplish step 2 and 3, next.

souzatharsis · 2024-10-27T13:00:56Z

podcastfy/core/character.py

+            raise ValueError(f"TTS backend '{tts_name}' not configured for this character")
+        self.preferred_tts = tts_name
+
+    def to_prompt(self) -> str:


I think it should be the opposite. Character shouldn't even be aware there exists LLMs around them. LLMs are just a technical solution. Instead, when we compose the prompt in the ai engine, character information should be passed to dynamically compose the prompt.

souzatharsis · 2024-10-27T13:03:16Z

podcastfy/core/character.py

+        self.role = role
+        self.tts_configs = tts_configs
+        self.default_description_for_llm = default_description_for_llm
+        self.preferred_tts = next(iter(tts_configs.keys()), None)  # Set first TTS as default, can be None


why should a character be aware of TTS models?
A Character should describe attributes and behavior of a Podcast participant.
I'd argue perhaps only the Transcript generation should be aware of Character information.

souzatharsis · 2024-10-27T13:04:32Z

podcastfy/core/audio.py

+
+
+class AudioManager:
+    def __init__(self, tts_backends: Dict[str, TTSBackend], audio_format, n_jobs: int = 4, file_prefix: str = "", audio_temp_dir: str = None) -> None:


default values should live in config.yaml

n_jobs: int = 4
file_prefix: str = ""
audio_temp_dir: str = None

souzatharsis · 2024-10-27T13:05:07Z

podcastfy/core/audio.py

+    def _get_tts_backend(self, segment):
+        tts_backend = self.tts_backends.get(segment.speaker.preferred_tts)
+        if tts_backend is None:
+            # Take the first available TTS backend


why take the first and not the default or user defined?

souzatharsis · 2024-10-27T13:06:42Z

podcastfy/core/podcast.py

+        """
+        self.content = content
+        self.llm_backend = llm_backend
+        self.characters: Dict[str, Character] = {char.name: char for char in (characters or [Character("Host", "Podcast host", {}), Character("Guest", "Expert guest", {})])}


should take from conversation_config (default or user provided) instead of static values

brumar · 2024-10-29T09:02:48Z

Hello @souzatharsis thanks a lot for the review! I do agree with most of your points, including the fact that it should have been done in multiple PRs.
But I am kind of burned and I am scared with the continuing drift. I need a way to close this source of anxiety. So I kind of need to cut the loss if that makes senses. That could be label this PR as an interesting but unmergeable experiment and I could go on with my life.

You can also pick and choose parts that that you think are interesting at your own pace.

The last option would be to bite the bullet, merging this knowing that improvements are to be made, including some important changes in the abstractions suggested in this PR. I did good efforts to add tests and ensure that this branch is backward compatible and that tests are passing (including new ones). If we go this route I can spend (synchronously with you or not) a burst of energy to fix what you see high priority stuff and rebase again on merge.

But I won't be able to sustain the 3 branchs thing, I am totally certain of that. I am not sure I can really divide the work in multiple branchs and focus with a single one with the fact that this Character things is used almost everywhere. But if you see a path that works for you, I could help.

Whatever the option you pick, no ill feelings on my side, working on it was a blast. I hope you will understand my position. I am sorry about the added workload that my frenziness has created on your side.

souzatharsis · 2024-10-29T09:13:13Z

I have an idea. Let me create the mini PRs based on your code. And then you are my reviewer / approver.

That should reduce the burden on you while taking advantage of the great work you did.

What do you think?

brumar · 2024-10-29T09:15:14Z

Yeah, I think that one of the most comfortable options for me right now, thank you! Edit: If you have made your mind on this, we could change the status of this one as draft maybe?

souzatharsis and others added 30 commits October 13, 2024 18:10

small steps

afd2300

small steps

fa67e7f

some progress but not yet

36bb5e9

update

386c9fc

black and one renaming

7b625c5

fix transcript parsing

c1adb9b

fix eleven labs issues

d06b93c

fix person names

1e15851

add edge default values

1141724

fix multiple issues with audio

c44139b

commit before merge

8d68930

catch up with multimodality

163fb60

support for local and ad other compat elements

fa83fc1

ending message

08cccc1

two fixes

0eed1d4

fix threads

cd1141c

Merge pull request #61 from brumar/lower-level-api-final

32c7838

Lower level api final

fix incorrect default path for configs

38db311

better naming and fix an import

54e046b

fix argument type

a33e2f8

more compat

afbe769

add interogation

267a359

fix test

6084e41

add todo temp

91b726b

add todo temp

5e633aa

Update must_do_before_merge.txt

96e7db4

Merge remote-tracking branch 'upstream/main' into dev

b6a4599

tests the podcast class

9703997

add compat with transcript saving

317c731

fix bug and signature of TTS

8fb7aa3

brumar added 2 commits October 18, 2024 00:02

update the API to put a more prominent place

c361a0e

Merge remote-tracking branch 'upstream/main' into dev

f32bba2

brumar added 11 commits October 18, 2024 00:32

remove temp file

61c42af

rework audio tests and add pytest-asyncio in the dependencies

17c1472

clean unused module, merge back into client.py

a2f9c1e

Merge branch 'main' into dev

eb9bbe0

fix inccorect merge

83854a0

fix incorrect merge

d6679d2

fix attempt

c6b7876

correct filepaths

1640f32

remove dead code

6f480e3

fix empty segments

c5ab289

a fix and one improvement

0b7882a

souzatharsis reviewed Oct 27, 2024

View reviewed changes

souzatharsis requested changes Oct 27, 2024

View reviewed changes

souzatharsis reviewed Oct 27, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactoring enabling the creation of a lower level API with the Podcast Class #80

Refactoring enabling the creation of a lower level API with the Podcast Class #80

brumar commented Oct 17, 2024 •

edited

Loading

souzatharsis commented Oct 17, 2024

brumar commented Oct 17, 2024

brumar commented Oct 17, 2024 •

edited

Loading

souzatharsis Oct 27, 2024

brumar Oct 27, 2024

souzatharsis Oct 27, 2024

souzatharsis Oct 27, 2024

brumar Oct 29, 2024

souzatharsis Oct 27, 2024

brumar Oct 29, 2024

souzatharsis Oct 27, 2024

souzatharsis left a comment

souzatharsis commented Oct 27, 2024

souzatharsis Oct 27, 2024

souzatharsis Oct 27, 2024

souzatharsis Oct 27, 2024

souzatharsis Oct 27, 2024

souzatharsis Oct 27, 2024

brumar commented Oct 29, 2024

souzatharsis commented Oct 29, 2024

brumar commented Oct 29, 2024 •

edited

Loading

		logger = logging.getLogger(__name__)


		class DefaultPodcastifyTranscriptEngine(LLMBackend):

		return [host, guest]


		def create_tts_backends(config: Config) -> List[TTSBackend]:



		class AudioManager:
		def __init__(self, tts_backends: Dict[str, TTSBackend], audio_format, n_jobs: int = 4, file_prefix: str = "", audio_temp_dir: str = None) -> None:

Refactoring enabling the creation of a lower level API with the Podcast Class #80

Are you sure you want to change the base?

Refactoring enabling the creation of a lower level API with the Podcast Class #80

Conversation

brumar commented Oct 17, 2024 • edited Loading

souzatharsis commented Oct 17, 2024

brumar commented Oct 17, 2024

brumar commented Oct 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

souzatharsis left a comment

Choose a reason for hiding this comment

souzatharsis commented Oct 27, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brumar commented Oct 29, 2024

souzatharsis commented Oct 29, 2024

brumar commented Oct 29, 2024 • edited Loading

brumar commented Oct 17, 2024 •

edited

Loading

brumar commented Oct 17, 2024 •

edited

Loading

brumar commented Oct 29, 2024 •

edited

Loading