micafan
91a3fa0157
[GCS]access task reconstruction in TaskInfoAccessor ( #6688 )
...
* add task lease interface to TaskInfoAccessor
* impl of task lease
* support accessing task lease in TaskInfoAccessor
* update raylet usage of task lease
* add comment
* fix lint
* fix UT of TaskDependencyManager
* fix UT of ReconstructionPolicy
* rm useless code from UT
* add task reconstruction methods to gcs
* fix ut of RedisGcsClient
* update test
* update comments
2020-01-08 16:59:06 +08:00
Lixin Wei
859dbad155
Fix estimate_available_memory() in utils.py ( #6302 )
2020-01-08 15:22:47 +08:00
fangfengbin
303d1a959b
Add task lease method to task info handler ( #6710 )
...
* add task lease methods to task info handler
* rebase master
2020-01-08 14:25:55 +08:00
Tianyi Chen
9dacebec1a
[Streaming] Add configuration with owner config. ( #6687 )
2020-01-08 11:19:01 +08:00
Frithjof
872a3522aa
Add machinable to list of projects using Tune ( #6737 )
2020-01-07 15:10:17 -08:00
Edward Oakes
5f843cd998
Clean up stress_testing_config.yaml ( #6738 )
...
* Clean up stress_testing_config.yaml
* comment
2020-01-07 17:05:07 -06:00
Eric Liang
a6c8c342b7
Better document guarantees provided by par iter API ( #6726 )
...
* update
* Update doc/source/iter.rst
Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>
* Update doc/source/iter.rst
Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
2020-01-07 14:41:50 -08:00
Zhijun Fu
329b9440ba
fix missing override for HandleWaitForObjectEviction ( #6733 )
2020-01-07 13:20:35 -08:00
Zhijun Fu
72335dbe46
[rpc] refactor RPC server code ( #6661 )
...
* refactor RPC client
* remove unused code
* format
* fix
* resolve comments
* format
* update
* refactor rpc server
* update
* format
* fix
* fix
* Update src/ray/rpc/worker/core_worker_server.h
Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>
* resolve comments
* format
* update
* update
* add a comment
* fix
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
2020-01-07 22:03:42 +08:00
Michał Słapek
aaeb3c44a5
[tune] Add _change_working_directory to RayTrialExecutor ( #6228 ) ( #6320 )
...
* [tune] Add _switch_working_directory to RayTrialExecutor (#6228 )
* Make _switch_working_directory before warn_if_slow
* Rename _switch_working_directory to _change_working_directory
2020-01-07 01:51:04 -08:00
Robert Nishihara
5e43b25e8c
Document fault tolerance behavior. ( #6698 )
2020-01-06 22:34:06 -08:00
Ujval Misra
20ba7ef647
[tune] Move util to utils package ( #6682 )
...
* Move util.py to utils
* Fix import
2020-01-06 18:11:02 -08:00
Edward Oakes
78d6290a65
Add kubectl to autoscaler docker image ( #6721 )
2020-01-06 17:30:51 -06:00
Edward Oakes
2a4d2c6e9e
Basic reference counting & pinning ( #6554 )
2020-01-06 17:30:26 -06:00
mehrdadn
c9855c9769
Remove std::move<std::shared_ptr>(...) to avoid bugs ( #6720 )
2020-01-06 17:17:26 -06:00
Eric Liang
63363e19be
Update bug_report.md ( #6704 )
2020-01-06 10:55:04 -08:00
Zhijun Fu
5bb20f6ac9
remove unused params in grpc macros ( #6677 )
...
* remove unused params in grpc macros
* format
* fix
* format
* fix
2020-01-06 21:35:40 +08:00
mehrdadn
76c986bdc7
Windows compatibility stubs ( #6706 )
2020-01-05 21:21:17 -08:00
mehrdadn
e6165cb14b
Fix master as it seems to have been broken via these conflicting commits: ( #6708 )
...
c51fbfb453
2228079481
Co-authored-by: GitHub Web Flow <noreply@github.com>
2020-01-06 12:29:21 +08:00
fangfengbin
1000e3322d
Add gcs server task info handler ( #6695 )
2020-01-06 11:09:32 +08:00
Lingxuan Zuo
c51fbfb453
[streaming] Message bundle use inplacement instance ( #6606 )
...
* streaming message bundle use inplacement instance
* fix typo & enable common test
* fix compiler warning
* block copy for serilization
* add reference
* remove streaming common test to travis script
2020-01-06 11:04:29 +08:00
mehrdadn
2228079481
Fix missing overrides ( #6703 )
2020-01-05 17:00:23 -08:00
Philipp Moritz
e15bd8ff1a
Run core worker tests in thread sanitizer and fix thread safety issues ( #6701 )
2020-01-05 16:18:21 -08:00
micafan
cc110ff1e3
[GCS]Add task lease methods to TaskInfoAccessor ( #6645 )
2020-01-05 13:54:33 +08:00
Simon Mo
6285851743
Add sphinx copy button ( #6694 )
...
* Add sphinx copy button
* Update requirements-doc.txt
Co-authored-by: Robert Nishihara <robertnishihara@gmail.com>
2020-01-04 19:31:49 -06:00
Yunzhi Zhang
816b84808d
[Dashboard] Display memory usage of nodes and core workers ( #6671 )
2020-01-03 20:12:42 -08:00
micafan
fd379934b6
rm DirectActorTable ( #6684 )
2020-01-03 16:28:26 -08:00
Harrison Feng
ca876c1ecb
Make sure dashboard link can be clicked directly. ( #6683 )
2020-01-03 16:17:16 -08:00
Robert Nishihara
80e77f7025
Revert accidental changes to test file. ( #6681 )
2020-01-03 14:23:45 -08:00
fangfengbin
b8669bc06c
Add node resources methods to gcs server node info handler ( #6685 )
2020-01-03 20:06:49 +08:00
Ujval Misra
5b40408678
[tune] Remove py2.7-specific code ( #6665 )
...
* Remove backwards compatability py2.7 code.
* Use exists_ok=True in ray
* nit
* nit
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-01-03 01:03:13 -08:00
micafan
970cd78701
[GCS] refactor the GCS Client Dynamic Resource Interface ( #6266 )
2020-01-03 14:07:37 +08:00
Ujval Misra
ca651af1d7
[tune] Async restores and S3/GCP-capable trial FT ( #6376 )
...
* Initial commit for asynchronous save/restore
* Set stage for cloud checkpointable trainable.
* Refactor log_sync and sync_client.
* Add durable trainable impl.
* Support delete in cmd based client
* Fix some tests and such
* Cleanup, comments.
* Use upload_dir instead.
* Revert files belonging to other PR in split.
* Pass upload_dir into trainable init.
* Pickle checkpoint at driver, more robust checkpoint_dir discovery.
* Cleanup trainable helper functions, fix tests.
* Addressed comments.
* Fix bugs from cluster testing, add parameterized cluster tests.
* Add trainable util test
* package_ref
* pbt_address
* Fix bug after running pbt example (_save returning dir).
* get cluster tests running, other bug fixes.
* raise_errors
* Fix deleter bug, add durable trainable example.
* Fix cluster test bugs.
* filelock
* save/restore bug fixes
* .
* Working cluster tests.
* Lint, revert to tracking memory checkpoints.
* Documentation, cleanup
* fixinitialsync
* fix_one_test
* Fix cluster test bug
* nit
* lint
* Revert tune md change
* Fix basename bug for directories.
* lint
* fix_tests
* nit_fix
* Add __init__ file.
* Move to utils package
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-01-02 20:40:53 -08:00
Harrison Feng
57061a15cf
[docs] configure.rst with --num-cpus ( #6678 )
...
--num-cpus -> --num-gpus
Signed-off-by: Harrison Feng <feng.harrison@gmail.com>
2020-01-02 20:33:41 -08:00
Robert Nishihara
92e44a5dc8
Deprecate redis_address argument in favor of address. ( #6654 )
2020-01-02 20:18:34 -08:00
Jing Ge
d39e76f2ce
rename interface and class for task assigner based on suitable pattern. ( #6664 )
2020-01-03 11:13:36 +08:00
Simon Mo
9fe90cdafc
Fix async actor recursion limitation ( #6672 )
...
* Do not start threadpool when using async
* Turn function_executor into a generator
* Add new test for high concurrency and bump the default
* Set direct call
2020-01-02 19:45:13 -06:00
Robert Nishihara
39a3459886
Remove (object) from class declarations. ( #6658 )
2020-01-02 17:42:13 -08:00
Sven
f1b56fa5ee
PG unify/cleanup tf vs torch and PG functionality test cases (tf + torch). ( #6650 )
...
* Unifying the code for PGTrainer/Policy wrt tf vs torch.
Adding loss function test cases for the PGAgent (confirm equivalence of tf and torch).
* Fix LINT line-len errors.
* Fix LINT errors.
* Fix `tf_pg_policy` imports (formerly: `pg_policy`).
* Rename tf_pg_... into pg_tf_... following <alg>_<framework>_... convention, where ...=policy/loss/agent/trainer.
Retire `PGAgent` class (use PGTrainer instead).
* - Move PG test into agents/pg/tests directory.
- All test cases will be located near the classes that are tested and
then built into the Bazel/Travis test suite.
* Moved post_process_advantages into pg.py (from pg_tf_policy.py), b/c
the function is not a tf-specific one.
* Fix remaining import errors for agents/pg/...
* Fix circular dependency in pg imports.
* Add pg tests to Jenkins test suite.
2020-01-02 16:08:03 -08:00
Robert Nishihara
d206445caf
Use Travis deploy v2. ( #6674 )
2020-01-02 16:00:51 -08:00
Yunzhi Zhang
8a0a30b5f0
[Dashboard] display actor status and infeasible tasks ( #6652 )
...
* expose actor status and protobuf message of infeasible tasks
* move infeasible tasks into actor tree
* add pytest for displaying infeasible tasks info
* fix base64 decoding
* fix race condition after #6629 merged
2020-01-02 14:27:59 -08:00
Eric Liang
895f2727fb
Add experimental parallel iterators API ( #6644 )
2020-01-02 13:45:26 -08:00
Ion
3dddbef6d9
Release cpu blocked ( #6611 )
2020-01-02 13:43:25 -08:00
chenk008
3a2a4335b6
Ray operator go.mod file ( #6660 )
...
* change .gitignore for go.mod
* change gitignore and add go.mod for ray-operator
2020-01-02 11:55:16 -06:00
fangfengbin
a13781d70e
Add actor checkpoint methods to gcs server actor info handler ( #6663 )
2020-01-02 19:31:54 +08:00
micafan
a7e9d63979
[GCS] Add actor checkpoint related methods to accessor ( #6605 )
2020-01-02 12:36:52 +08:00
fangfengbin
255aa0796a
Add heartbeat methods to gcs server node info handler ( #6647 )
2020-01-02 12:36:23 +08:00
Robert Nishihara
9baa002069
Remove deprecated global state. ( #6655 )
2019-12-31 22:40:47 -08:00
chenk008
4150d444a1
ray-operator support bazel build ( #6639 )
...
* support bazel build
* add bazel gazelle script in README
2019-12-31 22:28:51 -08:00
Zhijun Fu
91a98d2295
[rpc] refactor GRPC client ( #6637 )
...
* refactor RPC client
* remove unused code
* format
* fix
* resolve comments
* format
* update
* fix
* fix python pb build failure
* lint
2019-12-31 22:28:25 -08:00