Skip to content

ldos-project/C3

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ce0ec13 · Feb 20, 2025

History

1 Commit
Feb 20, 2025
Feb 20, 2025
Feb 20, 2025
Feb 20, 2025
Feb 20, 2025
Feb 20, 2025
Feb 20, 2025
Feb 20, 2025
Feb 20, 2025
Feb 20, 2025
Feb 20, 2025
Feb 20, 2025
Feb 20, 2025
Feb 20, 2025
Feb 20, 2025
Feb 20, 2025
Feb 20, 2025
Feb 20, 2025
Feb 20, 2025
Feb 20, 2025
Feb 20, 2025
Feb 20, 2025
Feb 20, 2025
Feb 20, 2025
Feb 20, 2025
Feb 20, 2025
Feb 20, 2025
Feb 20, 2025
Feb 20, 2025
Feb 20, 2025
Feb 20, 2025
Feb 20, 2025
Feb 20, 2025
Feb 20, 2025

Repository files navigation

C3 Distributed

This serves as a codebase for the distributed version of C3 based on Orca.

  1. Create a cloudlab job with 17 nodes (16 nodes for actors, 1 node for learner) using the orca profile (which will have the linux-learner image preinstalled once the nodes start)'
  2. Run ./cloudlab/config.sh to setup the nodes (run this from YOUR machine).
  3. The next thing you will have to do is ./cloudlab/setup_params.sh to generate rl-module/params_distributed.json. (run this on node0 on cloudlab)
  4. Once this is done, you should use v9_multi_train.sh to start the training job - do this within a tmux session. (run this on cloudlab node0)
  5. ssh into the actor nodes and look at ~/actor_logs/ to see the stdout (and stderr) of each actors. Look inside ~/ConstrainedOrca/rl-module/training_log for the train log.
  6. Once training is done, use scripts/collate_train_files.sh to collect everything into one place. Backup the folder generated by this for the future.

Eval

  1. Move the checkpoint you desire into ~/ConstrainedOrca/rl-module/train_dir/seed0/. When you do ls inside this seed0, it should show you one directory that looks something like learner0-v9_actorNum256_multi_lambda0.0_ksymbolic5_k1_raw-sym_threshold25_seed0/.
  2. Run ./scripts/eval_orca.sh <model_name> <trace_dir> <results_dir> <start_run> <end_run> <constraints_id>.
  3. <trace_dir> is /proj/verifiedmlsys-PG0/ConstrainedOrca/sage_traces/traces for SAGE traces.
  4. <results_dir> is /proj/verifiedmlsys-PG0/sigcomm_results/new_result_dir/constraint_id_*.

Plotting and analysis

  1. Use cd scripts && ./process_down_file.sh to trim stuff.
  2. Use ./scripts/plots/plot_thr.py for motivation figures
  3. Use ./scripts/plots/plot_thr_delay.py for thr vs delay plots.

Other useful info

  1. baseline_v4 is the baseline used for NSDI submission.

About

No description, website, or topics provided.

Resources

License

MIT, MIT licenses found

Licenses found

MIT
LICENSE
MIT
copyright

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published