mirror of
https://github.com/vale981/ray
synced 2025-03-09 04:46:38 -04:00
![]() * add marvil policy graph * fix typo * add offline optimizer and enable running marwil * fix loss function * add maintaining the moving average of advantage norm * use sync replay optimizer for unifying * remove offline optimizer and use sync replay optimizer * format by yapf * add imitation learning objective * fix according to eric's review * format by yapf * revise * add test data * marwil |
||
---|---|---|
.. | ||
multi_node_tests | ||
multi_node_docker_test.py | ||
run_asv.sh | ||
run_multi_node_tests.sh | ||
run_rllib_asv.sh |