Commit graph

3783 commits

Author SHA1 Message Date
Yunzhi Zhang
816b84808d [Dashboard] Display memory usage of nodes and core workers (#6671) 2020-01-03 20:12:42 -08:00
micafan
fd379934b6 rm DirectActorTable (#6684) 2020-01-03 16:28:26 -08:00
Harrison Feng
ca876c1ecb Make sure dashboard link can be clicked directly. (#6683) 2020-01-03 16:17:16 -08:00
Robert Nishihara
80e77f7025 Revert accidental changes to test file. (#6681) 2020-01-03 14:23:45 -08:00
fangfengbin
b8669bc06c Add node resources methods to gcs server node info handler (#6685) 2020-01-03 20:06:49 +08:00
Ujval Misra
5b40408678 [tune] Remove py2.7-specific code (#6665)
* Remove backwards compatability py2.7 code.

* Use exists_ok=True in ray

* nit

* nit

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-01-03 01:03:13 -08:00
micafan
970cd78701 [GCS] refactor the GCS Client Dynamic Resource Interface (#6266) 2020-01-03 14:07:37 +08:00
Ujval Misra
ca651af1d7 [tune] Async restores and S3/GCP-capable trial FT (#6376)
* Initial commit for asynchronous save/restore

* Set stage for cloud checkpointable trainable.

* Refactor log_sync and sync_client.

* Add durable trainable impl.

* Support delete in cmd based client

* Fix some tests and such

* Cleanup, comments.

* Use upload_dir instead.

* Revert files belonging to other PR in split.

* Pass upload_dir into trainable init.

* Pickle checkpoint at driver, more robust checkpoint_dir discovery.

* Cleanup trainable helper functions, fix tests.

* Addressed comments.

* Fix bugs from cluster testing, add parameterized cluster tests.

* Add trainable util test

* package_ref

* pbt_address

* Fix bug after running pbt example (_save returning dir).

* get cluster tests running, other bug fixes.

* raise_errors

* Fix deleter bug, add durable trainable example.

* Fix cluster test bugs.

* filelock

* save/restore bug fixes

* .

* Working cluster tests.

* Lint, revert to tracking memory checkpoints.

* Documentation, cleanup

* fixinitialsync

* fix_one_test

* Fix cluster test bug

* nit

* lint

* Revert tune md change

* Fix basename bug for directories.

* lint

* fix_tests

* nit_fix

* Add __init__ file.

* Move to utils package

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-01-02 20:40:53 -08:00
Harrison Feng
57061a15cf [docs] configure.rst with --num-cpus (#6678)
--num-cpus -> --num-gpus

Signed-off-by: Harrison Feng <feng.harrison@gmail.com>
2020-01-02 20:33:41 -08:00
Robert Nishihara
92e44a5dc8
Deprecate redis_address argument in favor of address. (#6654) 2020-01-02 20:18:34 -08:00
Jing Ge
d39e76f2ce rename interface and class for task assigner based on suitable pattern. (#6664) 2020-01-03 11:13:36 +08:00
Simon Mo
9fe90cdafc
Fix async actor recursion limitation (#6672)
* Do not start threadpool when using async

* Turn function_executor into a generator

* Add new test for high concurrency and bump the default

* Set direct call
2020-01-02 19:45:13 -06:00
Robert Nishihara
39a3459886 Remove (object) from class declarations. (#6658) 2020-01-02 17:42:13 -08:00
Sven
f1b56fa5ee PG unify/cleanup tf vs torch and PG functionality test cases (tf + torch). (#6650)
* Unifying the code for PGTrainer/Policy wrt tf vs torch.
Adding loss function test cases for the PGAgent (confirm equivalence of tf and torch).

* Fix LINT line-len errors.

* Fix LINT errors.

* Fix `tf_pg_policy` imports (formerly: `pg_policy`).

* Rename tf_pg_... into pg_tf_... following <alg>_<framework>_... convention, where ...=policy/loss/agent/trainer.
Retire `PGAgent` class (use PGTrainer instead).

* - Move PG test into agents/pg/tests directory.
- All test cases will be located near the classes that are tested and
  then built into the Bazel/Travis test suite.

* Moved post_process_advantages into pg.py (from pg_tf_policy.py), b/c
the function is not a tf-specific one.

* Fix remaining import errors for agents/pg/...

* Fix circular dependency in pg imports.

* Add pg tests to Jenkins test suite.
2020-01-02 16:08:03 -08:00
Robert Nishihara
d206445caf Use Travis deploy v2. (#6674) 2020-01-02 16:00:51 -08:00
Yunzhi Zhang
8a0a30b5f0 [Dashboard] display actor status and infeasible tasks (#6652)
* expose actor status and protobuf message of infeasible tasks

* move infeasible tasks into actor tree

* add pytest for displaying infeasible tasks info

* fix base64 decoding

* fix race condition after #6629 merged
2020-01-02 14:27:59 -08:00
Eric Liang
895f2727fb
Add experimental parallel iterators API (#6644) 2020-01-02 13:45:26 -08:00
Ion
3dddbef6d9 Release cpu blocked (#6611) 2020-01-02 13:43:25 -08:00
chenk008
3a2a4335b6 Ray operator go.mod file (#6660)
* change .gitignore for go.mod

* change gitignore and add go.mod for ray-operator
2020-01-02 11:55:16 -06:00
fangfengbin
a13781d70e Add actor checkpoint methods to gcs server actor info handler (#6663) 2020-01-02 19:31:54 +08:00
micafan
a7e9d63979 [GCS] Add actor checkpoint related methods to accessor (#6605) 2020-01-02 12:36:52 +08:00
fangfengbin
255aa0796a Add heartbeat methods to gcs server node info handler (#6647) 2020-01-02 12:36:23 +08:00
Robert Nishihara
9baa002069
Remove deprecated global state. (#6655) 2019-12-31 22:40:47 -08:00
chenk008
4150d444a1 ray-operator support bazel build (#6639)
* support bazel build

* add bazel gazelle script in README
2019-12-31 22:28:51 -08:00
Zhijun Fu
91a98d2295 [rpc] refactor GRPC client (#6637)
* refactor RPC client

* remove unused code

* format

* fix

* resolve comments

* format

* update

* fix

* fix python pb build failure

* lint
2019-12-31 22:28:25 -08:00
mehrdadn
f4b29dae9c Perform Bazel install directly in Windows CI (#6653) 2019-12-31 20:48:08 -08:00
Robert Nishihara
480206eef8
Remove some Python 2 compatibility code. (#6624) 2019-12-31 17:14:58 -08:00
Philipp Moritz
ecddaafd94
Add actor table to global state API (#6629) 2019-12-31 15:11:59 -08:00
mehrdadn
a4d64de39a Perform LLVM install directly inside Windows CI (#6588)
* Perform LLVM install directly inside Windows CI

* Pin the LLVM download version

Co-authored-by: GitHub Web Flow <noreply@github.com>
2019-12-31 13:23:19 -08:00
Robert Nishihara
d2c6457832
Remove public facing references to --redis-address. (#6631) 2019-12-31 13:21:53 -08:00
Michael Luo
1cb335487e SAC for Mujoco Environments (#6642) 2019-12-31 00:16:54 -08:00
micafan
cdc1ce4ebf [GCS]Add heartbeat methods to NodeInfoAccessor (#6604) 2019-12-31 14:17:35 +08:00
Yunzhi Zhang
65acb54553 [Dashboard] Logical view backend for dashboard (#6590) 2019-12-30 13:08:08 -08:00
Sven
8b16847c02 Get utils ready for better Agent torch support. (#6561) 2019-12-30 12:27:32 -08:00
Philipp Moritz
735f282494
Use 0.9.0.dev0 as the version tag (#6630) 2019-12-30 10:14:07 -08:00
Richard Liaw
646643a588
[doc] remove redundant PS example (#6634) 2019-12-29 20:54:42 -08:00
Edward Oakes
2a66529fb7
Add multiprocessing.Pool API (#6194) 2019-12-29 21:40:58 -06:00
Eric Liang
e2bc489a18
Port webui nits from original pr that enables it (#6628)
* backport changes

* Update test_webui.py
2019-12-29 19:19:43 -08:00
Mitchell Stern
3e0f07468f Make JSON schema for projects more explicit (#6550) 2019-12-29 16:41:53 -08:00
Qstar
10338fde0c Ray operator: controller code and guide to use (#6501) 2019-12-29 10:14:47 -06:00
Eric Liang
7c1e0e5715
Implement wait_local for wait (#6524) 2019-12-28 17:40:49 -08:00
Eric Liang
677004ee3d
Add 'ray stat' command for debugging (#6622)
* wip

* wip

* wip

* iterate

* move

* fix thread safety
2019-12-28 14:40:32 -08:00
Robert Nishihara
92db13023c Fix unused variable compilation error. (#6625) 2019-12-28 12:50:14 -08:00
Eric Liang
022954ac09 [rllib] Tuple action dist tensors not reduced properly in eager mode (#6615) 2019-12-28 09:51:09 -08:00
fangfengbin
8a51efebfb Add gcs server object info handler (#6621) 2019-12-28 22:44:27 +08:00
Robert Nishihara
ff82613b66
Fix test_actor.py test_kill. (#6623) 2019-12-27 22:39:17 -08:00
alindkhare
a76fadb899 [Serve] Adding BackendConfig (#6541) 2019-12-27 23:34:50 -06:00
Robert Nishihara
96f2f8ff10 Stop testing Python 2.7 and building Python 2.7 wheels. (#6601) 2019-12-27 20:47:49 -08:00
Robert Nishihara
8724e5ffd5 Start WebUI by default. (#6493) 2019-12-27 13:49:07 -08:00
Zhijun Fu
088ce2d1e1 Fix hang on actor creation task failure (#6617) 2019-12-27 10:48:17 -08:00