🚀 DataHorse is an open-source tool and Python library that simplifies data science for everyone. It lets users interact with data in plain English 📝, without needing technical skills or watching tutorials 🎥 to learn how to use it. With DataHorse, you can create graphs 📊, modify data 🛠️, and even create smart systems called machine learning models 🤖 to get answers or make predictions. It’s designed to help businesses and individuals 💼 regardless of knowledge background to quickly understand their data and make smart, data-driven decisions, all with ease. ✨
pip install datahorse
We’re using the Iris flower dataset as an example to demonstrate how DataHorse simplifies data analysis. This example showcases how our tool can handle real-world data, making it easier to work with and understand.
Setup and usage examples are available in this Google Colab notebook.
import datahorse
df = datahorse.read('https://raw.githubusercontent.com/plotly/datasets/master/iris-data.csv')
df = df.chat('convert species names to numeric codes')
seed=int
: Ensures that the generated function is reproducible across different runs.cache_req=True
: Enables caching for the API request, ensuring that identical prompts won't trigger unnecessary API calls.
df = df.chat('convert species names to numeric codes', seed=int, cache_req=True)
df.chat('train a classification model and save the model')
datahorse.test("path of the saved model",[["list of testing features"]])
git clone https://github.com/DeDolphins/DataHorse.git
cd DataHorseUI
pip install -r requirements.text
streamlit run app.py
⭐️ Star DataHorse to increase our visibility
Found a bug or have an improvement in mind? Fantastic!
Got a solution ready? That's even better!
Ready to share it with us? We're all ears!
Start at the contributing guide!