Lixin Wei
859dbad155
Fix estimate_available_memory() in utils.py ( #6302 )
2020-01-08 15:22:47 +08:00
Michał Słapek
aaeb3c44a5
[tune] Add _change_working_directory to RayTrialExecutor ( #6228 ) ( #6320 )
...
* [tune] Add _switch_working_directory to RayTrialExecutor (#6228 )
* Make _switch_working_directory before warn_if_slow
* Rename _switch_working_directory to _change_working_directory
2020-01-07 01:51:04 -08:00
Robert Nishihara
5e43b25e8c
Document fault tolerance behavior. ( #6698 )
2020-01-06 22:34:06 -08:00
Ujval Misra
20ba7ef647
[tune] Move util to utils package ( #6682 )
...
* Move util.py to utils
* Fix import
2020-01-06 18:11:02 -08:00
Edward Oakes
2a4d2c6e9e
Basic reference counting & pinning ( #6554 )
2020-01-06 17:30:26 -06:00
Yunzhi Zhang
816b84808d
[Dashboard] Display memory usage of nodes and core workers ( #6671 )
2020-01-03 20:12:42 -08:00
Harrison Feng
ca876c1ecb
Make sure dashboard link can be clicked directly. ( #6683 )
2020-01-03 16:17:16 -08:00
Robert Nishihara
80e77f7025
Revert accidental changes to test file. ( #6681 )
2020-01-03 14:23:45 -08:00
Ujval Misra
5b40408678
[tune] Remove py2.7-specific code ( #6665 )
...
* Remove backwards compatability py2.7 code.
* Use exists_ok=True in ray
* nit
* nit
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-01-03 01:03:13 -08:00
Ujval Misra
ca651af1d7
[tune] Async restores and S3/GCP-capable trial FT ( #6376 )
...
* Initial commit for asynchronous save/restore
* Set stage for cloud checkpointable trainable.
* Refactor log_sync and sync_client.
* Add durable trainable impl.
* Support delete in cmd based client
* Fix some tests and such
* Cleanup, comments.
* Use upload_dir instead.
* Revert files belonging to other PR in split.
* Pass upload_dir into trainable init.
* Pickle checkpoint at driver, more robust checkpoint_dir discovery.
* Cleanup trainable helper functions, fix tests.
* Addressed comments.
* Fix bugs from cluster testing, add parameterized cluster tests.
* Add trainable util test
* package_ref
* pbt_address
* Fix bug after running pbt example (_save returning dir).
* get cluster tests running, other bug fixes.
* raise_errors
* Fix deleter bug, add durable trainable example.
* Fix cluster test bugs.
* filelock
* save/restore bug fixes
* .
* Working cluster tests.
* Lint, revert to tracking memory checkpoints.
* Documentation, cleanup
* fixinitialsync
* fix_one_test
* Fix cluster test bug
* nit
* lint
* Revert tune md change
* Fix basename bug for directories.
* lint
* fix_tests
* nit_fix
* Add __init__ file.
* Move to utils package
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-01-02 20:40:53 -08:00
Robert Nishihara
92e44a5dc8
Deprecate redis_address argument in favor of address. ( #6654 )
2020-01-02 20:18:34 -08:00
Simon Mo
9fe90cdafc
Fix async actor recursion limitation ( #6672 )
...
* Do not start threadpool when using async
* Turn function_executor into a generator
* Add new test for high concurrency and bump the default
* Set direct call
2020-01-02 19:45:13 -06:00
Robert Nishihara
39a3459886
Remove (object) from class declarations. ( #6658 )
2020-01-02 17:42:13 -08:00
Sven
f1b56fa5ee
PG unify/cleanup tf vs torch and PG functionality test cases (tf + torch). ( #6650 )
...
* Unifying the code for PGTrainer/Policy wrt tf vs torch.
Adding loss function test cases for the PGAgent (confirm equivalence of tf and torch).
* Fix LINT line-len errors.
* Fix LINT errors.
* Fix `tf_pg_policy` imports (formerly: `pg_policy`).
* Rename tf_pg_... into pg_tf_... following <alg>_<framework>_... convention, where ...=policy/loss/agent/trainer.
Retire `PGAgent` class (use PGTrainer instead).
* - Move PG test into agents/pg/tests directory.
- All test cases will be located near the classes that are tested and
then built into the Bazel/Travis test suite.
* Moved post_process_advantages into pg.py (from pg_tf_policy.py), b/c
the function is not a tf-specific one.
* Fix remaining import errors for agents/pg/...
* Fix circular dependency in pg imports.
* Add pg tests to Jenkins test suite.
2020-01-02 16:08:03 -08:00
Yunzhi Zhang
8a0a30b5f0
[Dashboard] display actor status and infeasible tasks ( #6652 )
...
* expose actor status and protobuf message of infeasible tasks
* move infeasible tasks into actor tree
* add pytest for displaying infeasible tasks info
* fix base64 decoding
* fix race condition after #6629 merged
2020-01-02 14:27:59 -08:00
Eric Liang
895f2727fb
Add experimental parallel iterators API ( #6644 )
2020-01-02 13:45:26 -08:00
Ion
3dddbef6d9
Release cpu blocked ( #6611 )
2020-01-02 13:43:25 -08:00
Robert Nishihara
9baa002069
Remove deprecated global state. ( #6655 )
2019-12-31 22:40:47 -08:00
Zhijun Fu
91a98d2295
[rpc] refactor GRPC client ( #6637 )
...
* refactor RPC client
* remove unused code
* format
* fix
* resolve comments
* format
* update
* fix
* fix python pb build failure
* lint
2019-12-31 22:28:25 -08:00
Robert Nishihara
480206eef8
Remove some Python 2 compatibility code. ( #6624 )
2019-12-31 17:14:58 -08:00
Philipp Moritz
ecddaafd94
Add actor table to global state API ( #6629 )
2019-12-31 15:11:59 -08:00
Robert Nishihara
d2c6457832
Remove public facing references to --redis-address. ( #6631 )
2019-12-31 13:21:53 -08:00
Yunzhi Zhang
65acb54553
[Dashboard] Logical view backend for dashboard ( #6590 )
2019-12-30 13:08:08 -08:00
Philipp Moritz
735f282494
Use 0.9.0.dev0 as the version tag ( #6630 )
2019-12-30 10:14:07 -08:00
Edward Oakes
2a66529fb7
Add multiprocessing.Pool API ( #6194 )
2019-12-29 21:40:58 -06:00
Eric Liang
e2bc489a18
Port webui nits from original pr that enables it ( #6628 )
...
* backport changes
* Update test_webui.py
2019-12-29 19:19:43 -08:00
Mitchell Stern
3e0f07468f
Make JSON schema for projects more explicit ( #6550 )
2019-12-29 16:41:53 -08:00
Eric Liang
7c1e0e5715
Implement wait_local for wait ( #6524 )
2019-12-28 17:40:49 -08:00
Eric Liang
677004ee3d
Add 'ray stat' command for debugging ( #6622 )
...
* wip
* wip
* wip
* iterate
* move
* fix thread safety
2019-12-28 14:40:32 -08:00
Robert Nishihara
ff82613b66
Fix test_actor.py test_kill. ( #6623 )
2019-12-27 22:39:17 -08:00
alindkhare
a76fadb899
[Serve] Adding BackendConfig ( #6541 )
2019-12-27 23:34:50 -06:00
Robert Nishihara
96f2f8ff10
Stop testing Python 2.7 and building Python 2.7 wheels. ( #6601 )
2019-12-27 20:47:49 -08:00
Robert Nishihara
8724e5ffd5
Start WebUI by default. ( #6493 )
2019-12-27 13:49:07 -08:00
Zhijun Fu
088ce2d1e1
Fix hang on actor creation task failure ( #6617 )
2019-12-27 10:48:17 -08:00
Eric Liang
46acb02aa4
Fix verbose shutdown error and test_env_with_subprocesses ( #6614 )
2019-12-26 22:43:39 -08:00
Eric Liang
d3db9e9c1e
By default, reconstruction should only be enabled for actor creation. ( #6613 )
...
* wip
* fix
* fix
2019-12-26 19:57:50 -08:00
zhu-eric
65297e65f0
Experimental Actor Pool ( #6055 )
...
* mod_table
* Example fix for gallery
* lint
* nit
* nit
* fix
* gallery
* remove table for now
* training, object store, tune, actors, advanced
* start tf code
* first cut tf
* yapf
* pytorch
* add torch example
* torch
* parallel
* tune
* tuning
* reviewsready
* finetune
* fix
* move_code
* update conf
* compile
* init hyperparameter
* Start images
* overview
* extra
* fix
* works
* update-ps-example
* param_actor
* fix
* examples
* simple
* simplify_pong
* flake8 and run hyperopt
* add comments
* add comments
* add suggestion
* add suggestion
* suggestions
* add suggestion
* add suggestions
* fixed in wrong area
* last edit
* finish changes
* add line
* format
* reset
* tests and docs
* fix tests
* bazelify
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2019-12-26 14:35:10 -08:00
inventormc
0dd8a60679
[tune] Usability errors PBT ( #5972 )
...
* update with upstream master
* check for function args in hyperparam_mutations pbt
* fix style for pbt
* remove_checkpoint
* Update pbt.py
* Update pbt.py
* fix
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2019-12-26 14:27:07 -08:00
Zhijun Fu
d2bba596ab
Fix actor reconstruction with direct call ( #6570 )
2019-12-26 10:59:50 +08:00
Yuhao Yang
be23b3ac41
[sgd] show training result for examples ( #6552 )
2019-12-26 02:15:43 +01:00
Yuhao Yang
df4533c649
[tune] demo exporting trained models in pbt examples ( #6533 )
2019-12-26 02:14:49 +01:00
Richard Liaw
93e8c85e72
[tune] Avoid duplication in TrialRunner execution ( #6598 )
...
* avoid_duplication
* Update python/ray/tune/ray_trial_executor.py
Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>
Co-authored-by: Kristian Hartikainen <kristian.hartikainen@gmail.com>
2019-12-26 02:13:55 +01:00
Yuhao Yang
8707a721d9
[tune] update params for optimizer in reset_config ( #6522 )
...
* reset config update lr
* add default
* Update pbt_dcgan_mnist.py
* Update pbt_convnet_example.py
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2019-12-26 02:10:09 +01:00
Richard Liaw
aa7b861332
[minor][tune] Support Type Hinting for py3 ( #6571 )
...
* fullargspec for new pyversion
* fi
2019-12-25 08:15:33 +01:00
Robert Nishihara
f89d81896a
Fix flaky test_gpu_ids test. ( #6579 )
2019-12-24 14:26:44 -08:00
Robert Nishihara
2f57391595
Fix bug when failing to import remote functions or actors with args and kwargs. ( #6577 )
2019-12-24 13:23:48 -08:00
Edward Oakes
6b1a57542e
Add actor.__ray_kill__()
to terminate actors immediately ( #6523 )
2019-12-23 23:12:57 -06:00
Yunzhi Zhang
bac6f3b61e
[Dashboard] Collecting worker stats in node manager and implement webui display in the backend ( #6574 )
2019-12-22 17:50:23 -08:00
mehrdadn
50fb26de68
Fix FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use arr[tuple(seq)]
instead of arr[seq]
. In the future this will be interpreted as an array index, arr[np.array(seq)]
, which will result either in an error or a different result. ( #6568 )
2019-12-22 13:02:34 -08:00
Chaokun Yang
7bbfa85c66
[Streaming] Streaming data transfer java ( #6474 )
2019-12-22 10:56:05 +08:00