1 single-GPU machine: for RM service only Multi-node, multi-GPU systems: for RL training (In our experiments, we used 3 nodes, each equipped with 8 A100 GPUs.) ...