Merge pull request #6 from thibaultyou/feature/ai-powered-metadata-an…

…d-prompt-organization ♻️ Restructure library and improve metadata generation
thibaultyou · Sep 28, 2024 · 493c35a · 493c35a
2 parents dd8b724 + a97402a
commit 493c35a
Show file tree

Hide file tree

Showing 21 changed files with 734 additions and 268 deletions.
diff --git a/...t_analyzer_and_output_generator/prompt.md → ...t_analyzer_and_output_generator/prompt.md b/...t_analyzer_and_output_generator/prompt.md → ...t_analyzer_and_output_generator/prompt.md
@@ -1,11 +1,11 @@
 You are an AI assistant tasked with analyzing an AI prompt and producing specific outputs related to it. The prompt will be provided to you, and you should generate the following:
 
-1. A filename for storing the prompt as a markdown file
-2. A list of tags
-3. A one-line concise description
-4. A quick description
-5. A markdown link for referencing the prompt
-6. A commit message for version control
+1. A directory name for storing the prompt
+2. A category in snake_case format
+3. A list of tags
+4. A one-line concise description
+5. A quick description
+6. A markdown link for referencing the prompt
 7. A list of variables that require user input
 
 Here's the AI prompt you need to analyze:
@@ -16,16 +16,18 @@ Here's the AI prompt you need to analyze:
 
 Now, follow these steps to generate the required outputs:
 
-1. Filename:
-Generate a filename for the prompt using the following convention:
+1. Directory name:
+Generate a directory name for the prompt using the following convention:
 
 - Convert the prompt's main topic or purpose to lowercase
 - Replace spaces with underscores
 - Remove any special characters
-- Add the .md extension
-- The filename should be concise but descriptive, ideally not exceeding 50 characters
+- The directory name should be concise but descriptive, ideally not exceeding 50 characters
 
-2. Tags:
+2. Category:
+Determine a simple and clear category for the prompt, formatted in snake_case.
+
+3. Tags:
 Create a list of 3-5 relevant tags for the prompt. These tags should:
 
 - Be single words or short phrases
@@ -34,48 +36,37 @@ Create a list of 3-5 relevant tags for the prompt. These tags should:
 - Accurately represent the main themes or applications of the prompt
 - Be useful for categorizing and searching for the prompt
 
-3. One-line description:
+4. One-line description:
 Write a concise, one-line description of the prompt that:
 
 - Captures the main purpose or function of the prompt
 - Is no longer than 100 characters
 - Starts with a verb in the present tense (e.g., "Creates," "Generates," "Analyzes")
 
-4. Quick description:
+5. Quick description:
 Provide a brief description of the prompt that:
 
 - Expands on the one-line description
 - Explains the key features or capabilities of the prompt
 - Is 2-3 sentences long
 - Gives the reader a clear understanding of what the prompt does
 
-5. Markdown link:
+6. Markdown link:
 Create a markdown link that can be used to reference the prompt:
 
 - Use the one-line description as the link text
-- Use the filename as the link URL
-- Format it as: [One-line description](filename)
-
-6. Commit message:
-Create a commit message for version control with the following format:
-
-- Start with an emoji that relates to the content or purpose of the prompt
-- Follow with a short, descriptive message about the addition or change
-- Use present tense and imperative mood
-- Keep it under 50 characters if possible
-Example: "✨ Add AI prompt analyzer and output generator"
+- Use the directory name as the link URL
+- Format it as: [One-line description](directory_name)
 
 7. User input variables:
 List all variables in the prompt that require user input or replacement. These should be in the format {{VARIABLE_NAME}} and listed one per line.
 
 Present your final output in the following format:
 
 <output>
-## metadata.yml
-
-```yml
 title: [Prompt's main topic or purpose]
-category: [Your determined category]
+category: [Your determined category in snake_case]
+directory: [Your generated directory name]
 tags:
   - [Tag 1]
   - [Tag 2]
@@ -87,17 +78,6 @@ variables:
   - "{{VARIABLE_1}}"
   - "{{VARIABLE_2}}"
   [Add more variables if necessary]
-additional_info:
-  filename: [Your generated filename]
-  commit_message: [Your commit message]
-```
-
-## prompt.md
-
-```md
-[The provided prompt]
-```
-
 </output>
 
 Remember to be accurate, concise, and consistent in your analysis and output generation.
diff --git a/.github/scripts/generate_metadata.py b/.github/scripts/generate_metadata.py
@@ -0,0 +1,227 @@
+import hashlib
+import logging
+import os
+import shutil
+import yaml
+from anthropic import Anthropic
+
+# Set up logging
+logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
+logger = logging.getLogger(__name__)
+
+# Path to the analyzer prompt file
+ANALYZER_PROMPT_PATH = '.github/prompts/ai_prompt_analyzer_and_output_generator/prompt.md'
+
+def load_analyzer_prompt():
+    """Load the content of the analyzer prompt file."""
+    logger.info(f"Loading analyzer prompt from {ANALYZER_PROMPT_PATH}")
+    with open(ANALYZER_PROMPT_PATH, 'r') as f:
+        content = f.read()
+    logger.info(f"Analyzer prompt loaded, length: {len(content)} characters")
+    return content
+
+def generate_metadata(prompt_content):
+    """Generate metadata for a given prompt content using the Anthropic API."""
+    logger.info("Starting metadata generation")
+
+    # Check for the presence of the API key
+    api_key = os.environ.get("ANTHROPIC_API_KEY")
+    if not api_key:
+        logger.error("ANTHROPIC_API_KEY is not set in the environment.")
+        raise EnvironmentError("ANTHROPIC_API_KEY is not set in the environment.")
+    else:
+        logger.info("ANTHROPIC_API_KEY is set.")
+
+    # Initialize the Anthropic client
+    client = Anthropic(api_key=api_key)
+    logger.info("Anthropic client initialized")
+
+    # Load the analyzer prompt
+    analyzer_prompt = load_analyzer_prompt()
+
+    # Create a message using the Anthropic API
+    logger.info("Sending request to Anthropic API")
+    message = client.messages.create(
+        model="claude-3-5-sonnet-20240620",
+        max_tokens=1000,
+        messages=[
+            {
+                "role": "user",
+                "content": analyzer_prompt.replace("{{PROMPT}}", prompt_content)
+            }
+        ]
+    )
+    logger.info("Received response from Anthropic API")
+
+    # Log the structure of the response
+    logger.info(f"Response structure: {type(message)}")
+    logger.info(f"Content structure: {type(message.content)}")
+
+    # Extract the YAML content from the AI response
+    content = message.content[0].text if isinstance(message.content, list) else message.content
+    logger.info(f"Extracted content: {content[:100]}...") # Log first 100 characters
+
+    output_start = content.find("<output>")
+    output_end = content.find("</output>")
+    if output_start != -1 and output_end != -1:
+        yaml_content = content[output_start + 8:output_end].strip()
+        logger.info(f"Extracted YAML content: {yaml_content[:100]}...") # Log first 100 characters
+        metadata = yaml.safe_load(yaml_content)
+        logger.info("YAML content parsed successfully")
+    else:
+        logger.error("Could not find metadata output in AI response")
+        raise ValueError("Could not find metadata output in AI response")
+
+    logger.info("Metadata generation completed successfully")
+    return metadata
+
+def should_update_metadata(prompt_file, metadata_file):
+    """Check if metadata should be updated based on content hash."""
+    # Generate hash of the prompt file content
+    with open(prompt_file, 'rb') as f:
+        prompt_content = f.read()
+    prompt_hash = hashlib.md5(prompt_content).hexdigest()
+
+    # If metadata file doesn't exist, update is needed
+    if not os.path.exists(metadata_file):
+        logger.info(f"Metadata file {metadata_file} does not exist. Update needed.")
+        return True, prompt_hash
+
+    # Read the stored hash from metadata file
+    with open(metadata_file, 'r') as f:
+        metadata_content = f.read()
+
+    # Extract the stored hash
+    stored_hash_line = next((line for line in metadata_content.split('\n') if line.startswith('content_hash:')), None)
+
+    if stored_hash_line:
+        stored_hash = stored_hash_line.split(':', 1)[1].strip()
+
+        # Compare the hashes
+        if prompt_hash != stored_hash:
+            logger.info(f"Content hash mismatch for {prompt_file}. Update needed.")
+            return True, prompt_hash
+    else:
+        # If no hash found in metadata, update is needed
+        logger.info(f"No content hash found in {metadata_file}. Update needed.")
+        return True, prompt_hash
+
+    logger.info(f"Content hash match for {prompt_file}. No update needed.")
+    return False, prompt_hash
+
+def update_metadata_hash(metadata_file, new_hash):
+    """Update or append the content hash in the metadata file."""
+    if os.path.exists(metadata_file):
+        with open(metadata_file, 'r') as f:
+            lines = f.readlines()
+
+        # Update or add the hash line
+        hash_updated = False
+        for i, line in enumerate(lines):
+            if line.startswith('content_hash:'):
+                lines[i] = f'content_hash: {new_hash}\n'
+                hash_updated = True
+                break
+
+        if not hash_updated:
+            lines.append(f'content_hash: {new_hash}\n')
+
+        with open(metadata_file, 'w') as f:
+            f.writelines(lines)
+        logger.info(f"Content hash updated in {metadata_file}")
+    else:
+        # If metadata file doesn't exist, create it with the hash
+        with open(metadata_file, 'w') as f:
+            f.write(f'content_hash: {new_hash}\n')
+        logger.info(f"New metadata file created with content hash: {metadata_file}")
+
+def update_prompt_metadata():
+    """Update metadata for all prompts in the 'prompts' directory."""
+    logger.info("Starting update_prompt_metadata process")
+    prompts_dir = 'prompts'
+
+    # Process the main prompt.md file in the prompts directory
+    main_prompt_file = os.path.join(prompts_dir, 'prompt.md')
+    if os.path.exists(main_prompt_file):
+        logger.info("Processing main prompt.md file")
+        with open(main_prompt_file, 'rb') as f:
+            prompt_content = f.read()
+        metadata = generate_metadata(prompt_content.decode('utf-8'))
+        new_dir_name = metadata['directory']
+        new_dir_path = os.path.join(prompts_dir, new_dir_name)
+
+        # Create new directory and move the prompt file
+        logger.info(f"Creating new directory: {new_dir_path}")
+        os.makedirs(new_dir_path, exist_ok=True)
+        new_prompt_file = os.path.join(new_dir_path, 'prompt.md')
+        logger.info(f"Moving {main_prompt_file} to {new_prompt_file}")
+        shutil.move(main_prompt_file, new_prompt_file)
+
+        # Save metadata
+        metadata_path = os.path.join(new_dir_path, 'metadata.yml')
+        logger.info(f"Saving metadata to {metadata_path}")
+        with open(metadata_path, 'w') as f:
+            yaml.dump(metadata, f, sort_keys=False)
+
+        # Update content hash
+        new_hash = hashlib.md5(prompt_content).hexdigest()
+        update_metadata_hash(metadata_path, new_hash)
+
+    # Process subdirectories
+    for item in os.listdir(prompts_dir):
+        item_path = os.path.join(prompts_dir, item)
+
+        if os.path.isdir(item_path):
+            logger.info(f"Processing directory: {item}")
+            prompt_file = os.path.join(item_path, 'prompt.md')
+            metadata_file = os.path.join(item_path, 'metadata.yml')
+
+            if os.path.exists(prompt_file):
+                should_update, new_hash = should_update_metadata(prompt_file, metadata_file)
+                if should_update:
+                    logger.info(f"Updating metadata for {item}")
+                    with open(prompt_file, 'r') as f:
+                        prompt_content = f.read()
+
+                    metadata = generate_metadata(prompt_content)
+                    new_dir_name = metadata['directory']
+
+                    # Rename directory if necessary
+                    if new_dir_name != item:
+                        new_dir_path = os.path.join(prompts_dir, new_dir_name)
+                        logger.info(f"Renaming directory from {item} to {new_dir_name}")
+                        if os.path.exists(new_dir_path):
+                            logger.warning(f"Directory {new_dir_name} already exists. Updating contents.")
+                            for file in os.listdir(item_path):
+                                src = os.path.join(item_path, file)
+                                dst = os.path.join(new_dir_path, file)
+                                if os.path.isfile(src):
+                                    shutil.copy2(src, dst)
+                            shutil.rmtree(item_path)
+                        else:
+                            os.rename(item_path, new_dir_path)
+                        item_path = new_dir_path  # Update item_path for the new location
+
+                    # Save updated metadata
+                    metadata_path = os.path.join(item_path, 'metadata.yml')
+                    logger.info(f"Saving updated metadata to {metadata_path}")
+                    with open(metadata_path, 'w') as f:
+                        yaml.dump(metadata, f, sort_keys=False)
+
+                    # Update content hash
+                    update_metadata_hash(metadata_path, new_hash)
+                else:
+                    logger.info(f"Metadata for {item} is up to date")
+            else:
+                logger.warning(f"No prompt.md file found in {item_path}")
+
+    logger.info("update_prompt_metadata process completed")
+
+if __name__ == '__main__':
+    logger.info("Script started")
+    try:
+        update_prompt_metadata()
+        logger.info("Script completed successfully")
+    except Exception as e:
+        logger.exception(f"An error occurred: {str(e)}")
+        raise
diff --git a/.github/scripts/requirements.txt b/.github/scripts/requirements.txt
@@ -1,2 +1,3 @@
 pyyaml==6.0.2
-jinja2==3.1.4
+jinja2==3.1.4
+anthropic==0.34.2