Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storyboarding agent #90

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Storyboarding agent #90

wants to merge 2 commits into from

Conversation

0xrohitgarg
Copy link
Collaborator

@0xrohitgarg 0xrohitgarg commented Nov 29, 2024

  • Storyboarding agent
  • LLM default timeout to 120s

Summary by CodeRabbit

  • New Features

    • Introduced the StoryboardingAgent, enabling users to generate storyboard videos using generative AI models.
    • Added functionality to retrieve images from the video database.
  • Improvements

    • Extended the default timeout for language model requests from 30 to 120 seconds, enhancing responsiveness.
  • Documentation

    • Updated the get_audio method to include a descriptive docstring for clarity.

LLM default timeout to 120s
Copy link
Contributor

coderabbitai bot commented Nov 29, 2024

Walkthrough

The changes introduce a new StoryboardingAgent class in storyboarding.py, designed to create storyboard videos using generative AI models. This class includes methods for generating voiceover scripts, images, and combining assets into a video. Additionally, the ChatHandler is updated to incorporate the new agent. The default timeout for LLM requests is extended in base.py, and a new method for retrieving images is added to the VideoDBTool class in videodb_tool.py, alongside a docstring update for an existing method.

Changes

File Path Change Summary
backend/director/agents/storyboarding.py - Added StoryboardingAgent class.
- Implemented methods: run, generate_voiceover_scripts, prompt_voiceover_scripts, generate_image_flux, prompt_image_generation, combine_assets.
backend/director/handler.py - Imported StoryboardingAgent and added it to self.agents in ChatHandler.
backend/director/llm/base.py - Updated timeout default value from 30 to 120 in BaseLLMConfig.
backend/director/tools/videodb_tool.py - Added get_image method to VideoDBTool.
- Updated get_audio method with a docstring.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant ChatHandler
    participant StoryboardingAgent
    participant VideoDBTool
    participant LLM
    participant AudioService
    participant VideoDatabase

    User->>ChatHandler: Request storyboard generation
    ChatHandler->>StoryboardingAgent: Run storyboard generation
    StoryboardingAgent->>LLM: Generate voiceover scripts
    LLM-->>StoryboardingAgent: Return scripts
    StoryboardingAgent->>VideoDBTool: Generate images
    VideoDBTool-->>StoryboardingAgent: Return images
    StoryboardingAgent->>AudioService: Generate audio
    AudioService-->>StoryboardingAgent: Return audio
    StoryboardingAgent->>VideoDatabase: Combine assets
    VideoDatabase-->>StoryboardingAgent: Return video URL
    StoryboardingAgent-->>ChatHandler: Return video URL
    ChatHandler-->>User: Provide video URL
Loading

🐰 "In the meadow where dreams take flight,
A storyboard blooms, oh what a sight!
With voice and image, tales unfold,
A rabbit's magic, bright and bold.
From scripts to videos, all in a dance,
Join us, dear friend, in this joyful chance!" 🌼


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Outside diff range and nitpick comments (7)
backend/director/llm/base.py (1)

45-45: Consider making timeout configurable per operation type

While increasing the timeout to 120s might be necessary for storyboarding, this change affects all LLM operations since it's in the base configuration. Consider:

  1. Making timeout configurable per operation type or agent
  2. Using a reasonable default timeout (e.g., 30s) and allowing specific agents to override it
  3. Adding a separate configuration for long-running operations like storyboarding

This would provide better flexibility and prevent potential resource issues from unnecessarily long timeouts in simpler operations.

class BaseLLMConfig(BaseSettings):
    # ... other fields ...
-    timeout: int = 120
+    timeout: int = 30
+    operation_timeouts: Dict[str, int] = {"storyboarding": 120}
+
+    def get_timeout(self, operation_type: str = None) -> int:
+        """Get timeout for specific operation type or default timeout."""
+        return self.operation_timeouts.get(operation_type, self.timeout)
backend/director/handler.py (1)

LLM timeout configuration needs error handling improvement

The default timeout of 120 seconds is properly configured in backend/director/llm/base.py and propagated to LLM implementations. However, the chat handler's error handling needs improvement:

  • The current generic except Exception block in handler.py doesn't specifically handle timeout scenarios
  • Add specific handling for timeout exceptions before the generic catch-all to provide better error messages and potentially implement retry logic

Example locations that need attention:

  • backend/director/handler.py: Add specific timeout exception handling in the chat method's try-except block
  • Consider catching openai.error.Timeout and other LLM-specific timeout exceptions
🔗 Analysis chain

Line range hint 89-106: Verify LLM timeout configuration.

The PR objective mentions setting default LLM timeout to 120 seconds. Let's verify this configuration is properly propagated to the StoryboardingAgent:

The chat method should handle potential timeout scenarios gracefully. Consider adding specific error handling for timeout exceptions in the global try-catch block.


Let me verify the error handling for timeout scenarios by checking the exception handling in the codebase.


Let me check the exception handling in the chat handler specifically.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify LLM timeout configuration

# Look for timeout configuration in base agent or LLM wrapper
rg -A 5 "timeout.*=.*120|DEFAULT_TIMEOUT"

# Check for any existing timeout configurations
rg "timeout" --type py

Length of output: 734


Script:

#!/bin/bash
# Check for timeout-related exception handling
rg -B 2 -A 5 "except.*Timeout|except.*timeout" --type py

# Look for general exception handling in the chat method
ast-grep --pattern 'try {
  $$$
} except $_ as $_: {
  $$$
}'

Length of output: 122


Script:

#!/bin/bash
# Look for exception handling in handler.py
rg -B 5 -A 10 "try:" backend/director/handler.py

# Check for any error handling related to LLM calls
rg -B 2 -A 5 "except" backend/director/handler.py

Length of output: 1055

backend/director/agents/storyboarding.py (5)

96-96: Replace print statements with logging

Using print statements for debugging is not recommended in production code. Consider replacing print(steps) with a logging statement.

Apply this diff:

-print(steps)
+logger.debug(steps)

130-130: Replace print statements with logging

Using print statements for debugging is not recommended in production code. Consider replacing print(steps) with a logging statement.

Apply this diff:

-print(steps)
+logger.debug(steps)

144-146: Parameterize the voice_id in audio generation

The voice_id is hardcoded, which limits flexibility and reusability. Consider making the voice_id configurable or passing it as a parameter to support different voice options.

Apply this diff:

-            config={
-                "voice_id": "IsEXLHzSvLH9UMB6SLHj",
-            },
+            config={
+                "voice_id": self.voice_id,
+            },

And initialize self.voice_id in the __init__ method or pass it as an argument.


162-162: Replace print statements with logging

Using print statements for debugging is not recommended in production code. Consider replacing print(steps) with a logging statement.

Apply this diff:

-print(steps)
+logger.debug(steps)

284-286: Replace hardcoded base video URL with a configurable parameter

The base video URL is hardcoded, which may limit flexibility. Consider making the base video URL a configurable parameter or passing it as an argument to the method.

Apply this diff:

-        base_video = self.videodb_tool.upload(
-            "https://www.youtube.com/watch?v=4dW1ybhA5bM"
-        )
+        base_video_url = kwargs.get("base_video_url", "default_url")
+        base_video = self.videodb_tool.upload(base_video_url)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 0908ce3 and 535a9e7.

📒 Files selected for processing (4)
  • backend/director/agents/storyboarding.py (1 hunks)
  • backend/director/handler.py (2 hunks)
  • backend/director/llm/base.py (1 hunks)
  • backend/director/tools/videodb_tool.py (2 hunks)
🧰 Additional context used
🪛 Ruff (0.8.0)
backend/director/agents/storyboarding.py

147-147: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)


219-219: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)


261-261: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)


290-290: Loop control variable index not used within loop body

Rename unused index to _index

(B007)


336-336: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

🔇 Additional comments (5)
backend/director/llm/base.py (1)

45-45: Verify the timeout requirements

The 120-second timeout seems unusually high for API calls. Let's verify:

  1. If this timeout value is based on actual measurements
  2. If there are any underlying performance issues that could be addressed
  3. The impact on system resources if multiple requests timeout at 120s
✅ Verification successful

120-second timeout is appropriate for LLM API calls

The timeout value of 120 seconds is actually reasonable in this context because:

  1. It's used for LLM API calls (OpenAI, Anthropic, VideoDB) which can take longer than typical web API calls, especially for:
    • Complex completions with large context windows
    • Streaming responses
    • Models that require more processing time
  2. The implementation includes proper error handling across all LLM clients
  3. The timeout is defined in the base class and consistently used across different LLM implementations
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for timeout-related patterns and configurations

# Look for timeout-related error handling or retry logic
echo "Checking for timeout handling patterns..."
rg -A 3 "timeout|ConnectionError|RequestException" 

# Look for any existing timeout configurations
echo "Checking for other timeout configurations..."
rg -A 2 "timeout.*=|TIMEOUT" 

# Look for comments or documentation about timeouts
echo "Checking for timeout-related documentation..."
rg -A 2 "timeout" -g "*.md"

Length of output: 1863


Script:

#!/bin/bash
# Let's check for any LLM-related error handling and actual API calls
# to understand the context of this timeout better

# Check for LLM API call implementations and error handling
echo "Checking LLM API call patterns..."
ast-grep --pattern 'async def $FUNC($$$) {
    $$$
    response = $$$
    $$$
}'

# Look for any retry mechanisms or error handling specifically in LLM files
echo "Checking retry mechanisms..."
rg -A 5 "retry|backoff|sleep|wait" backend/director/llm/

# Check for any configuration files that might explain the timeout value
echo "Looking for config files..."
fd -e yaml -e json -e toml -e ini . backend/director/ --exec rg -l "timeout"

Length of output: 427


Script:

#!/bin/bash
# Let's examine the actual LLM API calls to understand the timeout usage

# Look for completion/chat completion implementations
echo "Checking completion implementations..."
rg -A 10 "async def (chat_completion|completion)" backend/director/llm/

# Check for any HTTP client configurations or actual API calls
echo "Checking HTTP client usage..."
rg -A 5 "(aiohttp|httpx|requests|client\.)" backend/director/llm/

# Look for any comments or docstrings explaining the timeout
echo "Checking for timeout documentation in code..."
rg -B 2 -A 1 "timeout.*=.*120" backend/director/llm/

Length of output: 1925

backend/director/handler.py (3)

25-25: LGTM! Clean import addition.

The import statement follows the established pattern and is placed logically with other agent imports.


64-64: LGTM! Agent registration follows existing pattern.

The StoryboardingAgent is properly registered in the agents list.


Line range hint 36-65: Verify agent initialization and error handling.

The new StoryboardingAgent will be initialized along with other agents in the chat method. Let's verify its integration:

Consider implementing the following best practices:

  1. Ensure proper timeout handling in the StoryboardingAgent's run/execute methods
  2. Add appropriate error handling for storyboard generation failures
  3. Consider implementing a retry mechanism for transient failures
✅ Verification successful

StoryboardingAgent implementation follows proper error handling and base agent patterns

The StoryboardingAgent properly extends BaseAgent and implements comprehensive error handling:

  • Follows the base agent pattern with proper initialization and run method implementation
  • Implements thorough error handling with try-catch blocks at multiple levels:
    • Main run method has a top-level try-catch that handles any unexpected errors
    • Individual steps (voiceover scripts, image generation, audio generation, asset combination) have their own error handling
    • Each error case properly updates the output message status and returns appropriate AgentResponse
  • Includes proper status updates through OutputMessage for progress tracking
  • Validates required environment variables before proceeding
  • Returns standardized AgentResponse objects with appropriate status and messages

The implementation aligns well with other agents in the codebase like VideoGenerationAgent and ImageGenerationAgent, following the same patterns for error handling and status updates.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify StoryboardingAgent implementation and error handling

# Check if StoryboardingAgent properly extends BaseAgent
ast-grep --pattern 'class StoryboardingAgent($_) {
  $$$
}'

# Check for proper error handling in the agent
rg -A 5 "class StoryboardingAgent" | rg -A 10 "def run|def execute"

# Look for any existing error handling patterns in other agents
rg -l "try.*except" --type py

Length of output: 163


Script:

#!/bin/bash
# Let's try a different approach to verify the StoryboardingAgent implementation

# First, let's find where StoryboardingAgent is defined
fd -e py | xargs rg "class StoryboardingAgent"

# Check for imports of StoryboardingAgent
rg "from.*StoryboardingAgent|import.*StoryboardingAgent"

# Look for BaseAgent pattern to understand the expected structure
fd -e py | xargs rg "class.*BaseAgent"

# Check for common error handling patterns in agents
fd -e py | xargs rg "class.*Agent.*:.*\n.*def (run|execute).*\n.*try:"

Length of output: 2284


Script:

#!/bin/bash
# Let's check the StoryboardingAgent implementation and error handling patterns

# Check the StoryboardingAgent implementation
cat backend/director/agents/storyboarding.py

# Check the BaseAgent implementation to understand the required structure
cat backend/director/agents/base.py

# Let's look at a few other agents for error handling patterns
cat backend/director/agents/video_generation.py
cat backend/director/agents/image_generation.py

Length of output: 26906

backend/director/tools/videodb_tool.py (1)

77-83: Verify integration with StoryboardingAgent.

Let's verify how this method is used by the StoryboardingAgent to ensure the returned fields are sufficient for its needs.

✅ Verification successful

Let me gather more context about the ImageAsset usage to ensure we have a complete picture.


Integration with StoryboardingAgent is verified and working correctly

The minimal fields returned by get_image() are sufficient for the StoryboardingAgent's needs. The agent only uses the id field from the returned image object to create an ImageAsset, which is exactly what's being provided by the current implementation.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for usage of get_image in StoryboardingAgent
ast-grep --pattern 'get_image($_)' 

# Search for any direct references to VideoDBTool and image handling
rg -A 5 'VideoDBTool.*image|get_image' 

Length of output: 1575


Script:

#!/bin/bash
# Search for ImageAsset class definition and usage
ast-grep --pattern 'class ImageAsset'

# Search for additional context around ImageAsset instantiation
rg -B 5 -A 10 'ImageAsset\('

Length of output: 2613

Comment on lines +77 to +83
def get_image(self, image_id):
"""Get a image by ID."""
image = self.collection.get_image(image_id)
return {
"id": image.id,
"name": image.name,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider enhancing the image metadata and error handling.

The implementation could be improved in several ways:

  1. Add error handling for invalid image_id and uninitialized collection
  2. Return additional useful metadata fields (url, collection_id) for consistency with other getter methods

Consider applying this enhancement:

 def get_image(self, image_id):
     """Get a image by ID."""
+    if not self.collection:
+        raise ValueError("Collection not initialized")
     image = self.collection.get_image(image_id)
     return {
         "id": image.id,
         "name": image.name,
+        "collection_id": image.collection_id,
+        "url": image.url
     }

Committable suggestion skipped: line range outside the PR's diff.

return None
image_url = flux_output[0].url
return image_url
except Exception as e:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Remove unused variable e in exception handling

The variable e is assigned but never used in the except block. You can remove as e to clean up the code.

Apply this diff:

-        except Exception as e:
+        except Exception:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
except Exception as e:
except Exception:
🧰 Tools
🪛 Ruff (0.8.0)

261-261: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

# Parse the response
step_data = json.loads(llm_response.content)["steps"]
return step_data
except Exception as e:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Remove unused variable e in exception handling

The variable e is assigned but never used in the except block. You can remove as e to clean up the code.

Apply this diff:

-            except Exception as e:
+            except Exception:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
except Exception as e:
except Exception:
🧰 Tools
🪛 Ruff (0.8.0)

219-219: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

"voice_id": "IsEXLHzSvLH9UMB6SLHj",
},
)
except Exception as e:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Remove unused variable e in exception handling

The variable e is assigned but never used in the except block. You can remove as e to clean up the code.

Apply this diff:

-                except Exception as e:
+                except Exception:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
except Exception as e:
except Exception:
🧰 Tools
🪛 Ruff (0.8.0)

147-147: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

# Generate the video
video_url = timeline.generate_stream()
return video_url
except Exception as e:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Remove unused variable e in exception handling

The variable e is assigned but never used in the except block. You can remove as e to clean up the code.

Apply this diff:

-        except Exception as e:
+        except Exception:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
except Exception as e:
except Exception:
🧰 Tools
🪛 Ruff (0.8.0)

336-336: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

base_video_id = base_video["id"]

seeker = 0
for index, step in enumerate(steps):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Unused loop variable index

The loop variable index is not used within the loop body. You can remove it to clean up the code.

Apply this diff:

-        for index, step in enumerate(steps):
+        for step in steps:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for index, step in enumerate(steps):
for step in steps:
🧰 Tools
🪛 Ruff (0.8.0)

290-290: Loop control variable index not used within loop body

Rename unused index to _index

(B007)

Comment on lines +269 to +272
consistent_part = "Create a stippling black ballpoint pen illustration of a Nigerian woman with a tight afro, living in her minimalist New York apartment. Keep the illustration simple with minimal elements. Generate less than 600 characters"
variable_part = f"This illustration is a part of a storyboard to explain the user journey of an app built for {app_description}. This image will portray the '{step['step']}' stage in the app. Step description: {step['step_description']}. This illustration is meant for professional storyboarding, so understand the requirements accordingly and create a suitable illustration with the woman as a central character in the frame, but include other supporting props that can indicate that she's in the '{step['step']}' step in the user flow."
prompt = f"{consistent_part}\n- {variable_part}"
return prompt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Make image prompt generation more dynamic

The consistent_part of the prompt includes hardcoded details about a specific character and setting, which may not be appropriate for all applications. Consider making the prompt more dynamic to account for different user scenarios and app descriptions.

For example, you could incorporate character details as parameters or generate them based on the app_description.

-        consistent_part = "Create a stippling black ballpoint pen illustration of a Nigerian woman with a tight afro, living in her minimalist New York apartment. Keep the illustration simple with minimal elements. Generate less than 600 characters"
+        consistent_part = f"Create a stippling black ballpoint pen illustration relevant to {app_description}. Keep the illustration simple with minimal elements."

Committable suggestion skipped: line range outside the PR's diff.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (2)
backend/director/agents/storyboarding.py (2)

144-145: Move hardcoded voice_id to configuration

The voice_id should be configurable rather than hardcoded to allow for different voices without code changes.

+ VOICE_ID = os.getenv("ELEVENLABS_VOICE_ID", "IsEXLHzSvLH9UMB6SLHj")

def run(self, ...):
    # ...
    self.audio_gen_tool.text_to_speech(
        text=step["voiceover_script"],
        save_at=voiceover_path,
        config={
-           "voice_id": "IsEXLHzSvLH9UMB6SLHj",
+           "voice_id": VOICE_ID,
        },
    )

197-241: Consider making prompt templates configurable

The prompt structure is currently hardcoded. Consider moving it to a template file or configuration to allow for easier modifications and potential A/B testing of different prompt structures.

+ VOICEOVER_PROMPT_TEMPLATE = """
+ Generate a structured response for an app that {app_description}. Here are the steps involved in the user journey:
+ {steps_prompt}
+ Return a response in JSON format, with key 'steps', and value being a list of dicts, where each dict has two keys 'step_description' and 'voiceover_script', like this:
+ {json_format}
+ """

def prompt_voiceover_scripts(self, steps, app_description):
-   prompt = f"Generate a structured response..."
+   steps_prompt = self._generate_steps_prompt(steps)
+   return VOICEOVER_PROMPT_TEMPLATE.format(
+       app_description=app_description,
+       steps_prompt=steps_prompt,
+       json_format=self._get_json_format_example()
+   )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 535a9e7 and 70cd370.

📒 Files selected for processing (1)
  • backend/director/agents/storyboarding.py (1 hunks)
🔇 Additional comments (1)
backend/director/agents/storyboarding.py (1)

1-37: LGTM! Clean initialization with proper inheritance

The imports and class initialization are well-structured with appropriate base class inheritance and initialization of required attributes.

Comment on lines +38 to +196

# Step 2: Generate images using Flux
self.output_message.actions.append("Generating images...")
self.output_message.push_update()

for index, step in enumerate(steps):
image_url = self.generate_image_flux(step, app_description)
if not image_url:
video_content.status = MsgStatus.error
video_content.status_message = (
f"Error in generating image for step {index+1}."
)
self.output_message.publish()
error_message = f"Agent failed with error in generating image for step {index+1}."
return AgentResponse(
status=AgentStatus.ERROR, message=error_message
)

# Download webp image and convert to PNG
response = requests.get(image_url)
if response.status_code == 200:
image = Image.open(io.BytesIO(response.content))
os.makedirs(DOWNLOADS_PATH, exist_ok=True)
png_path = f"{DOWNLOADS_PATH}/{str(uuid.uuid4())}.png"
image.save(png_path, "PNG")
media = self.videodb_tool.upload(
png_path, source_type="file_path", media_type="image"
)
steps[index]["image_id"] = media["id"]
else:
logger.error("Failed to download image")
return None

print(steps)

# Step 3: Generate voiceover audio using ElevenLabs
self.output_message.actions.append("Generating voiceover audio...")
self.output_message.push_update()

for index, step in enumerate(steps):
try:
os.makedirs(DOWNLOADS_PATH, exist_ok=True)
voiceover_path = f"{DOWNLOADS_PATH}/{str(uuid.uuid4())}.mp3"
self.audio_gen_tool.text_to_speech(
text=step["voiceover_script"],
save_at=voiceover_path,
config={
"voice_id": "IsEXLHzSvLH9UMB6SLHj",
},
)
except Exception as e:
video_content.status = MsgStatus.error
video_content.status_message = (
f"Error in generating audio for step {step['step']}."
)
self.output_message.publish()
error_message = f"Agent failed with error in generating audio for step {index+1}. Error: {str(e)}"
return AgentResponse(
status=AgentStatus.ERROR, message=error_message
)
media = self.videodb_tool.upload(
voiceover_path, source_type="file_path", media_type="audio"
)
steps[index]["audio_id"] = media["id"]

print(steps)

# # Step 4: Combine assets using VideoDB
self.output_message.actions.append(
"Combining assets into storyboard video..."
)
self.output_message.push_update()

video_url = self.combine_assets(steps)
if not video_url:
video_content.status = MsgStatus.error
video_content.status_message = "Error in combining assets into video."
self.output_message.publish()
error_message = (
"Agent failed with error in combining assets into video."
)
return AgentResponse(status=AgentStatus.ERROR, message=error_message)

video_content.video = VideoData(stream_url=video_url)
video_content.status = MsgStatus.success
video_content.status_message = "Here is your generated storyboard video."
self.output_message.publish()
except Exception as e:
logger.exception(f"Error in {self.agent_name}")
video_content.status = MsgStatus.error
video_content.status_message = "Error in generating storyboard."
self.output_message.publish()
error_message = f"Agent failed with error {e}"
return AgentResponse(status=AgentStatus.ERROR, message=error_message)
return AgentResponse(
status=AgentStatus.SUCCESS,
message=f"Agent {self.agent_name} completed successfully.",
data={"video_url": video_url},
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider breaking down the run method into smaller, focused methods

The method is handling multiple responsibilities (script generation, image generation, audio generation, and video creation). Consider extracting these into separate private methods for better maintainability and testing.

Example refactor:

def run(self, collection_id: str, app_description: str, raw_steps: List[str], *args, **kwargs) -> AgentResponse:
-   # Current implementation
+   try:
+       self._initialize_tools(collection_id)
+       steps = self._prepare_steps(raw_steps)
+       
+       if not self._generate_scripts(steps, app_description):
+           return self._error_response("Error in generating voiceover scripts.")
+           
+       if not self._generate_images(steps, app_description):
+           return self._error_response("Error in generating images.")
+           
+       if not self._generate_audio(steps):
+           return self._error_response("Error in generating audio.")
+           
+       video_url = self._combine_assets(steps)
+       if not video_url:
+           return self._error_response("Error in combining assets.")
+           
+       return self._success_response(video_url)
+   except Exception as e:
+       return self._handle_exception(e)

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +266 to +324
def combine_assets(self, steps):
"""
Combines images and audio into a storyboard video using VideoDB.
:param steps: List of dicts containing image and audio URLs.
:return: URL of the generated video.
"""
# Initialize VideoDB client
timeline = self.videodb_tool.get_and_set_timeline()
# Upload base video
base_video = self.videodb_tool.upload(
"https://www.youtube.com/watch?v=4dW1ybhA5bM"
)
base_video_id = base_video["id"]

seeker = 0
for step in steps:
# Upload image and audio
audio = self.videodb_tool.get_audio(step["audio_id"])
image = self.videodb_tool.get_image(step["image_id"])

audio_duration = float(audio["length"])

image_asset = ImageAsset(
asset_id=image["id"],
duration=audio_duration,
x="(main_w-overlay_w)/2",
y="(main_h-overlay_h)/2",
height="w=iw/3",
width="h=ih/3",
)
audio_asset = AudioAsset(asset_id=audio["id"], disable_other_tracks=True)
text_asset = TextAsset(
step["step"],
duration=audio_duration,
style=TextStyle(
x="(w-text_w)/2",
y="(h-text_h*2)",
font="League Spartan",
fontsize="(h/20)",
fontcolor="Snow",
boxcolor="OrangeRed",
boxborderw=10,
),
)

# Add assets to timeline
timeline.add_overlay(seeker, image_asset)
timeline.add_overlay(seeker, audio_asset)
timeline.add_overlay(seeker, text_asset)

seeker += audio_duration

# Add base video to timeline
base_vid_asset = VideoAsset(base_video_id, end=seeker)
timeline.add_inline(base_vid_asset)

# Generate the video
video_url = timeline.generate_stream()
return video_url
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Multiple improvements needed in asset combination

  1. The base video URL is hardcoded
  2. Text styling contains magic numbers
  3. Missing validation for asset existence
+ BASE_VIDEO_URL = os.getenv("STORYBOARD_BASE_VIDEO_URL", "https://www.youtube.com/watch?v=4dW1ybhA5bM")
+ TEXT_STYLE_CONFIG = {
+     "x": "(w-text_w)/2",
+     "y": "(h-text_h*2)",
+     "font": "League Spartan",
+     "fontsize": "(h/20)",
+     "fontcolor": "Snow",
+     "boxcolor": "OrangeRed",
+     "boxborderw": 10,
+ }

def combine_assets(self, steps):
+   def validate_asset(asset, asset_type):
+       if not asset:
+           raise ValueError(f"Missing {asset_type} asset")
+       return asset

    timeline = self.videodb_tool.get_and_set_timeline()
    base_video = self.videodb_tool.upload(
-       "https://www.youtube.com/watch?v=4dW1ybhA5bM"
+       BASE_VIDEO_URL
    )
    
    for step in steps:
-       audio = self.videodb_tool.get_audio(step["audio_id"])
-       image = self.videodb_tool.get_image(step["image_id"])
+       audio = validate_asset(self.videodb_tool.get_audio(step["audio_id"]), "audio")
+       image = validate_asset(self.videodb_tool.get_image(step["image_id"]), "image")

        text_asset = TextAsset(
            step["step"],
            duration=audio_duration,
-           style=TextStyle(
-               x="(w-text_w)/2",
-               y="(h-text_h*2)",
-               font="League Spartan",
-               fontsize="(h/20)",
-               fontcolor="Snow",
-               boxcolor="OrangeRed",
-               boxborderw=10,
-           ),
+           style=TextStyle(**TEXT_STYLE_CONFIG),
        )

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +242 to +264
def generate_image_flux(self, step, app_description):
"""
Generates an image using Flux.
:param step: Dict containing step information.
:param app_description: Description of the app.
:return: URL of the generated image.
"""
prompt = self.prompt_image_generation(step, app_description)
flux_output = flux_dev(prompt)
if not flux_output:
logger.error("Error in generating image with Flux")
return None
image_url = flux_output[0].url
return image_url

def prompt_image_generation(self, step, app_description):
"""
Generates the prompt for image generation.
"""
consistent_part = "Create a stippling black ballpoint pen illustration of a Nigerian woman with a tight afro, living in her minimalist New York apartment. Keep the illustration simple with minimal elements. Generate less than 600 characters"
variable_part = f"This illustration is a part of a storyboard to explain the user journey of an app built for {app_description}. This image will portray the '{step['step']}' stage in the app. Step description: {step['step_description']}. This illustration is meant for professional storyboarding, so understand the requirements accordingly and create a suitable illustration with the woman as a central character in the frame, but include other supporting props that can indicate that she's in the '{step['step']}' step in the user flow."
prompt = f"{consistent_part}\n- {variable_part}"
return prompt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Make character description configurable and improve error handling

  1. The character description is hardcoded, limiting the versatility of the agent.
  2. The flux_dev call lacks proper error handling.
+ CHARACTER_DESCRIPTION = os.getenv(
+     "STORYBOARD_CHARACTER_DESCRIPTION",
+     "Create a stippling black ballpoint pen illustration of a Nigerian woman with a tight afro, living in her minimalist New York apartment."
+ )

def generate_image_flux(self, step, app_description):
    prompt = self.prompt_image_generation(step, app_description)
-   flux_output = flux_dev(prompt)
-   if not flux_output:
+   try:
+       flux_output = flux_dev(prompt)
+       if not flux_output:
+           raise ValueError("Empty response from Flux")
+       return flux_output[0].url
+   except Exception as e:
        logger.error("Error in generating image with Flux")
+       logger.exception(e)
        return None
-   image_url = flux_output[0].url
-   return image_url

def prompt_image_generation(self, step, app_description):
-   consistent_part = "Create a stippling black ballpoint pen illustration..."
+   consistent_part = f"{CHARACTER_DESCRIPTION} Keep the illustration simple with minimal elements."

Committable suggestion skipped: line range outside the PR's diff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant