The Startup Success Forecasting Framework (SSFF) is a pioneering approach designed to automate the evaluation of startup success potential. Leveraging a blend of traditional machine learning models, Large Language Models (LLMs), and real-time market data analysis, SSFF aims to transform the landscape of venture capital investment by providing deep, actionable insights into the viability of early-stage startups.
Link to Paper: https://arxiv.org/abs/2405.19456
project_root/
│
├── .streamlit/ # Configuration for Streamlit app
├── agents/ # Core agent logic (founder, market, product, etc.)
├── algorithms/ # ML algorithms, embeddings, similarity calculations
├── data/ # Raw and processed datasets (e.g., Merged_Successful_V2.csv)
│ ├── Successful/
│ └── Unsuccessful/
├── EDA/ # Exploratory Data Analysis scripts and notebooks
├── experiments/ # Experiment configurations, logs, and raw results
│ ├── charts/
│ └── results/
├── models/ # Trained model artifacts (e.g., .keras, .joblib)
├── plots/ # Generated plots and visualizations
├── rebuttal_supplements/ # Supplementary materials, including statistical summaries
│ └── M1_Statistical_Summary/ # Detailed descriptive statistics and CSV outputs
├── utils/ # Utility scripts, configuration, API wrappers
│
├── app.py # Main Streamlit application for web interface
├── main.py # Alternative main execution script (if applicable)
├── overallPipeline.py # Core pipeline orchestration script
├── baseline_framework.py # Scripts for baseline model implementations
├── ssff_framework.py # Core SSFF framework logic
│
├── requirements.txt # Project dependencies
├── .env # Environment variables (API keys - create from .env.example if provided)
├── README.md # This file
└── SETUP_GUIDE.md # Detailed setup instructions
To set up the environment for this project, follow these steps:
-
Ensure you have Python 3.7+ installed on your system.
-
Clone the repository:
git clone https://github.com/your-username/Startup-Success-Forecasting-Framework.git cd Startup-Success-Forecasting-Framework
-
Create a virtual environment:
python -m venv myenv
-
Activate the virtual environment:
- On Windows:
myenv\Scripts\activate
- On macOS and Linux:
source myenv/bin/activate
- On Windows:
-
Install the required packages:
pip install -r requirements.txt
-
Create a
.env
file in the project root (you can copyenv.example
if it exists) and add your API keys:OPENAI_API_KEY=your_openai_api_key_here SERPAPI_API_KEY=your_serpapi_api_key_here
-
To deactivate the virtual environment when you're done:
deactivate
- Data Exploration Files: Scripts and notebooks for exploring and understanding the dataset used in model training (see
EDA/
andrebuttal_supplements/M1_Statistical_Summary/
). - Model Training Files: Code for training the SSFF's underlying machine learning models, including random forest classifiers and neural networks (see
algorithms/
and specific model scripts). - Pipeline Building Files: Implementation of the SSFF pipeline, integrating LLM analysis, feature extraction, and prediction models for comprehensive startup evaluation (see
overallPipeline.py
,ssff_framework.py
, andagents/
). - Experiment Results: Outputs from various experimental runs can be found in
experiments/results/
and summarized inrebuttal_supplements/
.
An image depicting the SSFF architecture. (Note: Original image URL may contain user-specific identifiers).
A web interface has been developed! Very easy to interact with:
(Note: Original image URL may contain user-specific identifiers).- Launch a Demo Interface
- Upgrade to Version 2
To execute the main pipeline and analyze a startup, you can run the Streamlit web application:
streamlit run app.py
Alternatively, use the main pipeline script (consult main.py
or overallPipeline.py
for direct script execution details):
python overallPipeline.py # Example, verify correct script and arguments
- Simple Mode: Provides a quick evaluation based on predefined criteria.
- Advanced Mode: Offers an in-depth analysis incorporating external market data, founder-level segmentation, and custom LLM prompts for nuanced insights.
For a detailed understanding of the methodologies and technologies underpinning SSFF, refer to our accompanying paper titled "An Automated Startup Evaluation Pipeline: Startup Success Forecasting Framework (SSFF)". This paper discusses the challenges of early-stage startup evaluation and presents our novel framework as a solution that combines machine learning, natural language processing, and real-time data analysis.
- Startup Evaluation Pipeline: Exploration of traditional and emerging approaches to startup evaluation, emphasizing the integration of qualitative assessments.
- LLM Agent and Prompting Techniques: Insight into how Large Language Models and sophisticated prompting techniques can enhance the evaluation process.
- Founder Level Segmentation: A novel approach to categorizing founders based on their experience and potential impact on startup success.
- External Knowledge Block: Utilization of real-time market data to enrich the analysis and provide a current view of market conditions and trends.
The SSFF represents a significant step forward in automating and enhancing the accuracy of startup success evaluations. By combining cutting-edge AI technologies with a deep understanding of the startup ecosystem, SSFF empowers investors, entrepreneurs, and researchers with a powerful tool for making informed decisions.
This work was supported by contributions from project members and a partnering firm. Their invaluable insights and expertise have been instrumental in the development of the SSFF.