compute_host is a process based on SocketIO communication. By starting the host_agent, the machine will be registered as a real worker. Subsequent jobs can be assigned and executed for this worker.
Install dependencies
pip install -r requirements.txt
export CLUSTROAI_API_KEY=Your APi KEY
Run host_agent, it will create a temporary worker and communicate with the server.
There are three optional arguments:
- --server_url default http://api.clustro.ai:5000
- --local_service_port default 8000
- --gpu_limit default 999
python host_agent.py
docker build -t host_agent:v1 .
docker run -d -e CLUSTROAI_API_KEY=Your APi KEY host_agent:v1
Under server, there are backend, frontend, and landingPage.
- backend is the project backend, primarily designed with technologies such as flask, postgres, and redis.
- frontend is the frontend project primarily designed with technologies like react, umijs/max, and ant-design.
- landingPage is the old frontend project, which will be phased out over time and can be ignored.
You need to have a postgres database and redis. It's recommended to install locally using Docker.
docker pull postgres
docker run -d --name=postgres -e POSTGRES_PASSWORD=your password -p 5432:5432 postgres
docker pull redis
docker run -d --name=redis -p 6379:6379 redis
export DB_USER=postgres
export DB_PASSWORD=your password
export DB_HOST=localhost
python run.py
docker build -t backend:v1 .
docker run -d -e DB_USER=postgres,DB_PASSWORD=your password,DB_HOST=localhost backend:v1
Configure environment variables:
cp env_template .env
Modify REACT_APP_BACKEND_SERVER_URL to your backend's starting URL.
npm install
npm run start
docker-compose up -d frontend
Visit URL http://localhost:3000 to view the page.
# Run all test cases
DB_USER=postgres DB_PASSWORD=your password pytest
# Run a single test case
DB_USER=postgres DB_PASSWORD=your password pytest tests/test_worker_routes.py::test_update_worker
When the backend starts, it launches a SocketIO server. When worker_agent starts, it connects to the backend's SocketIO and creates a namespace for communication. This facilitates the allocation of the inference job to this worker. The specific process is as follows:
- worker_agent connects to the SocketIO server and initiates communication.
- The SocketIO server sends a request_worker_id request.
- worker_agent sends a provide_worker_id request.
- The SocketIO server sends a worker_session_established request.
- worker_agent initiates a request to start communication.
- The SocketIO server sends prepare_model. At this point, the worker_agent begins to clone code from the git repository and starts the Flask app.
- The inference job is assigned to worker_agent.
- worker_agent begins execute_invocation and sends the results back to the SocketIO server.
- job_matching automatically assigns inference jobs to available idle workers.
- The auto_scaler periodically scans the inference job, ensuring that workers that have completed their jobs are in an idle state.