Simon Mo
5f527816fe
Fix async actor high cpu utilization when idle ( #6877 )
2020-01-22 16:07:08 -08:00
Simon Mo
4dd41844d0
Ignore blocking ray.wait if timeout is zero ( #6891 )
2020-01-22 16:05:34 -08:00
Richard Liaw
2b0e93586f
[autoscaler] Auto-replace "DEFAULT" with most recent DLAMI ( #6848 )
...
* try_this
* fix
* actual fix
* default
2020-01-21 13:54:04 -08:00
Richard Liaw
4edfaf2f38
[tune] Support callable objects in variant generation ( #6849 )
...
* minorcallable
* format
2020-01-21 10:24:25 -08:00
Stephanie Wang
815cd0e39a
Task and actor fate sharing with the owner process ( #6818 )
...
* Add test
* Kill workers leased by failed workers
* merge
* shorten test
* Add node failure test case
* Fix FromBinary for nil IDs, add assertions
* Test
* Fate sharing on node removal, fix owner address bug
* lint
* Update src/ray/raylet/node_manager.cc
Co-Authored-By: Zhijun Fu <37800433+zhijunfu@users.noreply.github.com>
* fix
* Remove unneeded test
* fix IDs
Co-authored-by: Zhijun Fu <37800433+zhijunfu@users.noreply.github.com>
2020-01-20 16:44:04 -08:00
Philipp Moritz
96e2c1ae74
[Projects] Add small tutorial for projects ( #6641 )
2020-01-20 09:33:41 -08:00
Robert Nishihara
c2cbb85a43
Fix flaky test test_feature_flag ( #6850 )
2020-01-19 20:59:03 -08:00
Richard Liaw
341ddd0a09
[tune] Default to TensorboardX and include in requirements. ( #6836 )
2020-01-19 01:49:33 -08:00
Richard Liaw
8a9bd18606
[tune] Remove keras dependency ( #6827 )
2020-01-18 23:24:42 -08:00
Yuhao Yang
9b1d2953de
[tune] set correct path when deleting checkpoint folder ( #6758 )
2020-01-17 23:11:03 -08:00
Mitchell Stern
763818b476
[Dashboard] Add static assets for speedscope v1.5.3 ( #6822 )
2020-01-17 20:53:53 -08:00
Yunzhi Zhang
3acf3c7675
[Dashboard] Add actor task counter ( #6820 )
2020-01-17 15:43:56 -08:00
Simon Mo
8f246c17b5
Initialize async plasma for async actors ( #6813 )
...
* Initialize async plasma for async actors
* Address comment
2020-01-17 14:58:06 -08:00
Ameer Haj Ali
9f9c3f5026
adding context parameter for pool with a warning for not being supported ( #6776 )
2020-01-17 16:57:18 -06:00
Edward Oakes
30776450a3
num_cpus=1 by default in Pool ( #6812 )
2020-01-17 13:28:25 -06:00
Qstar
0f3205af0b
[Projects] Delete pods associated with the project when running ray session stop ( #6787 )
2020-01-17 10:42:30 -08:00
Mitchell Stern
9f96091aef
[Dashboard] Add logical view displaying actor tree ( #6810 )
...
* [Dashboard] Add logical view displaying actor tree
* Fix key error in test_raylet_info_endpoint
2020-01-17 10:25:27 -08:00
Yuhao Yang
5f36e6eacb
[tune] get checkpoints paths for a trial after tuning ( #6643 )
2020-01-17 10:15:04 -08:00
Mitchell Stern
8e8b66a4b8
Add route for /favicon.ico to fix missing favicon ( #6815 )
2020-01-16 21:03:21 -06:00
Richard Liaw
232be5a058
[sgd] fault tolerance for pytorch + revamp documentation ( #6465 )
2020-01-16 18:38:27 -08:00
Mitchell Stern
05674c219f
Accept any port in test_get_webui in test_webui.py ( #6804 )
2020-01-15 23:16:35 -06:00
mehrdadn
fb8e3615d5
Use Boost.Process instead of pid_t ( #6510 )
...
* Use Boost.Process instead of pid_t
This will let us handle child processes (mostly) uniformly across platforms.
TODO: There is no SIGTERM on Windows; achieving something equivalent is fairly involved.
2020-01-15 20:05:02 -08:00
Ziyad Edher
c480d1d1e4
Treat static methods as class methods instead of instance methods in actors ( #6756 )
...
* Treat static methods as class methods rather than instance methods
* Add tests for static methods in actors
* Revert formatting changes
* Readd future imports
* Restructure static method check
* Documentation enhancements
* Fix linting issues
2020-01-15 19:38:41 -06:00
Edward Oakes
4227fd1b60
fix flaky test_wait ( #6791 )
2020-01-14 14:43:16 -06:00
Edward Oakes
3ea3b56eb1
Hotfix missing fields in multiprocessing.Pool ( #6784 )
2020-01-13 16:39:33 -06:00
Sven Mika
4ee566129f
Ignore io.UnsupportedOperation error when "Enabling nice stack traces on SIGSEGV etc." in worker.py::connect()
. ( #6771 )
...
- Fixes RLlib tf-eager test cases for all agents when run locally on Ubuntu and Mac.
2020-01-13 14:31:13 -08:00
Philipp Moritz
a26431f587
Upgrade react-scripts to fix #6739 ( #6769 )
2020-01-13 11:58:21 -08:00
Edward Oakes
a950e95c7d
Use exit() in __kill_actor__ ( #6760 )
2020-01-13 11:37:59 -06:00
chaokunyang
4097d076d4
Package ray java jars into wheels ( #6600 )
2020-01-10 11:41:00 +08:00
Sven
60d4d5e1aa
Remove future imports ( #6724 )
...
* Remove all __future__ imports from RLlib.
* Remove (object) again from tf_run_builder.py::TFRunBuilder.
* Fix 2xLINT warnings.
* Fix broken appo_policy import (must be appo_tf_policy)
* Remove future imports from all other ray files (not just RLlib).
* Remove future imports from all other ray files (not just RLlib).
* Remove future import blocks that contain `unicode_literals` as well.
Revert appo_tf_policy.py to appo_policy.py (belongs to another PR).
* Add two empty lines before Schedule class.
* Put back __future__ imports into determine_tests_to_run.py. Fails otherwise on a py2/print related error.
2020-01-09 00:15:48 -08:00
Eric Liang
69c5a2bc3c
Warn if OMP_NUM_THREADS is set ( #6729 )
2020-01-08 14:59:07 -08:00
Eric Liang
a745886242
Disable HTTP proxy for gRPC connections ( #6744 )
...
* disable http proxy for grpc
* add test
2020-01-08 09:23:22 -08:00
Lixin Wei
859dbad155
Fix estimate_available_memory() in utils.py ( #6302 )
2020-01-08 15:22:47 +08:00
Michał Słapek
aaeb3c44a5
[tune] Add _change_working_directory to RayTrialExecutor ( #6228 ) ( #6320 )
...
* [tune] Add _switch_working_directory to RayTrialExecutor (#6228 )
* Make _switch_working_directory before warn_if_slow
* Rename _switch_working_directory to _change_working_directory
2020-01-07 01:51:04 -08:00
Robert Nishihara
5e43b25e8c
Document fault tolerance behavior. ( #6698 )
2020-01-06 22:34:06 -08:00
Ujval Misra
20ba7ef647
[tune] Move util to utils package ( #6682 )
...
* Move util.py to utils
* Fix import
2020-01-06 18:11:02 -08:00
Edward Oakes
2a4d2c6e9e
Basic reference counting & pinning ( #6554 )
2020-01-06 17:30:26 -06:00
Yunzhi Zhang
816b84808d
[Dashboard] Display memory usage of nodes and core workers ( #6671 )
2020-01-03 20:12:42 -08:00
Harrison Feng
ca876c1ecb
Make sure dashboard link can be clicked directly. ( #6683 )
2020-01-03 16:17:16 -08:00
Robert Nishihara
80e77f7025
Revert accidental changes to test file. ( #6681 )
2020-01-03 14:23:45 -08:00
Ujval Misra
5b40408678
[tune] Remove py2.7-specific code ( #6665 )
...
* Remove backwards compatability py2.7 code.
* Use exists_ok=True in ray
* nit
* nit
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-01-03 01:03:13 -08:00
Ujval Misra
ca651af1d7
[tune] Async restores and S3/GCP-capable trial FT ( #6376 )
...
* Initial commit for asynchronous save/restore
* Set stage for cloud checkpointable trainable.
* Refactor log_sync and sync_client.
* Add durable trainable impl.
* Support delete in cmd based client
* Fix some tests and such
* Cleanup, comments.
* Use upload_dir instead.
* Revert files belonging to other PR in split.
* Pass upload_dir into trainable init.
* Pickle checkpoint at driver, more robust checkpoint_dir discovery.
* Cleanup trainable helper functions, fix tests.
* Addressed comments.
* Fix bugs from cluster testing, add parameterized cluster tests.
* Add trainable util test
* package_ref
* pbt_address
* Fix bug after running pbt example (_save returning dir).
* get cluster tests running, other bug fixes.
* raise_errors
* Fix deleter bug, add durable trainable example.
* Fix cluster test bugs.
* filelock
* save/restore bug fixes
* .
* Working cluster tests.
* Lint, revert to tracking memory checkpoints.
* Documentation, cleanup
* fixinitialsync
* fix_one_test
* Fix cluster test bug
* nit
* lint
* Revert tune md change
* Fix basename bug for directories.
* lint
* fix_tests
* nit_fix
* Add __init__ file.
* Move to utils package
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-01-02 20:40:53 -08:00
Robert Nishihara
92e44a5dc8
Deprecate redis_address argument in favor of address. ( #6654 )
2020-01-02 20:18:34 -08:00
Simon Mo
9fe90cdafc
Fix async actor recursion limitation ( #6672 )
...
* Do not start threadpool when using async
* Turn function_executor into a generator
* Add new test for high concurrency and bump the default
* Set direct call
2020-01-02 19:45:13 -06:00
Robert Nishihara
39a3459886
Remove (object) from class declarations. ( #6658 )
2020-01-02 17:42:13 -08:00
Sven
f1b56fa5ee
PG unify/cleanup tf vs torch and PG functionality test cases (tf + torch). ( #6650 )
...
* Unifying the code for PGTrainer/Policy wrt tf vs torch.
Adding loss function test cases for the PGAgent (confirm equivalence of tf and torch).
* Fix LINT line-len errors.
* Fix LINT errors.
* Fix `tf_pg_policy` imports (formerly: `pg_policy`).
* Rename tf_pg_... into pg_tf_... following <alg>_<framework>_... convention, where ...=policy/loss/agent/trainer.
Retire `PGAgent` class (use PGTrainer instead).
* - Move PG test into agents/pg/tests directory.
- All test cases will be located near the classes that are tested and
then built into the Bazel/Travis test suite.
* Moved post_process_advantages into pg.py (from pg_tf_policy.py), b/c
the function is not a tf-specific one.
* Fix remaining import errors for agents/pg/...
* Fix circular dependency in pg imports.
* Add pg tests to Jenkins test suite.
2020-01-02 16:08:03 -08:00
Yunzhi Zhang
8a0a30b5f0
[Dashboard] display actor status and infeasible tasks ( #6652 )
...
* expose actor status and protobuf message of infeasible tasks
* move infeasible tasks into actor tree
* add pytest for displaying infeasible tasks info
* fix base64 decoding
* fix race condition after #6629 merged
2020-01-02 14:27:59 -08:00
Eric Liang
895f2727fb
Add experimental parallel iterators API ( #6644 )
2020-01-02 13:45:26 -08:00
Ion
3dddbef6d9
Release cpu blocked ( #6611 )
2020-01-02 13:43:25 -08:00
Robert Nishihara
9baa002069
Remove deprecated global state. ( #6655 )
2019-12-31 22:40:47 -08:00