Commit graph

4401 commits

Author SHA1 Message Date
Amog Kamsetty
da6f28d777
[Release] Add multi-node, multi-GPU SGD release test (#16046) 2021-05-31 16:23:04 -07:00
SongGuyang
17b5f4dcaa
[C++ worker] support config from RayConfig and command line(gflag) (#16086) 2021-05-31 11:56:02 +08:00
zhuangzhuang131419
0429882bbf
[autoscaler] Implement node provider for aliyun (#15712)
Co-authored-by: Ian Rodney <ian.rodney@gmail.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: zhuang <zhengchicheng.zcc@alibaba-inc.com>
Co-authored-by: chenk008 <kongchen28@gmail.com>
Co-authored-by: wuhua.ck <wuhua.ck@alibaba-inc.com>
2021-05-29 00:56:32 -07:00
Amog Kamsetty
38b657cb65
[Tune] Place remote tune.run on node running the client server (#16034)
* force placement on persistent node

* address comments

* doc
2021-05-28 18:32:57 -07:00
Amog Kamsetty
cfa2997b86
[XGBoost] Add test with Ray Client (#16103) 2021-05-28 16:13:06 -07:00
Ian Rodney
5ca1b297e4
[RayClient][Proxy] BugFixes (#16040) 2021-05-28 10:24:48 -07:00
Ian Rodney
ec46794767
[Client] Add ray.client().disconnect() (#16021) 2021-05-28 10:15:44 -07:00
Eric Liang
7f2e16fe8f
Make the spill test on unstable filesystem not so verbose (#16119)
* less logs

* update

* update
2021-05-27 20:32:48 -07:00
architkulkarni
cc7cc4fb9f
[Core] Allow specifying runtime_env conda and pip via filepath (#16073) 2021-05-27 17:58:47 -05:00
Clark Zinzow
a8ac383760
Decrease the number of nodes and actors started on each node in test_actor_multiple_gpus_from_multiple_tasks. (#16124) 2021-05-27 15:58:20 -07:00
Clark Zinzow
cd71d5e8ac
[Test] Ignore psutil.AccessDenied when gathering per-process memory info upon an OOM. (#16123) 2021-05-27 15:40:44 -07:00
Eric Liang
9c73591a4e
Revert "Fix tracing bug when actors are defined before connecting to … (#16120)
This reverts commit 6c1ea66611.
2021-05-27 11:50:36 -07:00
Amog Kamsetty
5d3cb295bd
[Tune] Add find_free_port Tune util (#16098) 2021-05-27 11:27:28 -07:00
Edward Oakes
90a76ad558
[Serve] use placement group by default (#16113) 2021-05-27 11:03:29 -07:00
SangBin Cho
d0dc9abdfc
[Plasma store] Improve the OOM logging message. (#16051) 2021-05-27 10:09:58 -07:00
Yi Cheng
5d0b302121
[core] Trigger global gc when plasma store is under pressure. (#15775) 2021-05-27 10:07:59 -07:00
Kathryn Zhou
6c1ea66611
Fix tracing bug when actors are defined before connecting to cluster (#16069) 2021-05-27 09:28:11 -07:00
architkulkarni
65eab8f376
Revert "Revert "[Core] Add "env_vars" field to runtime_env"" (#16107) 2021-05-27 10:16:33 -05:00
DK.Pino
ea0ee86063
[Placement Group]Fix actor scheduling with Placement Group bug. (#16006) 2021-05-26 22:16:38 -07:00
Ian Rodney
69d0e8e4fe
[Docs][ClientBuilder] Add ray.client() and ray.ClientBuilder to Experimental API docs (#16058) 2021-05-26 21:05:47 -07:00
architkulkarni
7cfe7f840c
Revert "[Core] Add "env_vars" field to runtime_env (#16075)" (#16099)
This reverts commit 1e245005c9.
2021-05-26 16:27:04 -07:00
architkulkarni
1e245005c9
[Core] Add "env_vars" field to runtime_env (#16075) 2021-05-26 12:11:19 -05:00
Kai Yang
853d650e29
Revert "Revert "[Object spilling] Avoid worker crash when an object is spille… (#15964)" (#16012)
This reverts commit 29aa336a4d.
2021-05-25 23:48:24 -07:00
Ian Rodney
3dbdd4eb46
[Client][Proxy] Track Num Clients in the proxy (#16038) 2021-05-25 22:17:43 -07:00
Richard Liaw
08de5a36e1
[Horovod] Test with Ray Client (#15996)
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2021-05-25 20:21:58 -07:00
Simon Mo
2aee4ac40d
[Serve] Cleanup examples and tests (#16042) 2021-05-25 15:32:36 -07:00
Ian Rodney
113fd6e765
[Client][Proxy] Refactor RayClient Proxy to not use additional Threads. (#16057) 2021-05-25 10:07:19 -07:00
Eric Liang
ea6bdfb9c1
Prevent object store from allocating over the specified limit even if there is memory fragmentation (#15951) 2021-05-24 17:56:11 -07:00
Yi Cheng
643f739619
[flaky] Split test_failure_2 into 2 (#15688)
* split test failure

* format

* add bazel

* remove flaky tag

* format

* up

* rerun

Co-authored-by: Eric Liang <ekhliang@gmail.com>
2021-05-24 16:52:44 -07:00
Ian Rodney
033c22d73b
[Tracing] Add Otel to setup.py Observability (#16028) 2021-05-24 13:15:41 -07:00
Howard Lau
225b3fda91
[tune] Fix ddp_mnist_torch example argument type (#16017)
* fix ddp_mnist_torch example argument type

* add period
2021-05-24 12:05:33 -07:00
dependabot[bot]
0fecfe10b8
[tune](deps): Bump sigopt in /python/requirements/tune (#16004)
Bumps [sigopt](https://sigopt.com/) from 5.7.0 to 7.4.0.

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-05-24 12:03:56 -07:00
Amog Kamsetty
d0c137c7f9
[Tune] Pin minimum tensorboardX version (#16022) 2021-05-24 11:14:23 -07:00
Amog Kamsetty
71926ad043
[Docs] Fix Dependencies for Pong example (#15978) 2021-05-24 10:25:16 -07:00
Eric Liang
810f5c803a
Disable flaky object spilling test on OSX & adjust test timeouts (#15986)
* blacklist

* move it

* adjust according to bazel timeouts

* fix build

* move to large

* Update BUILD
2021-05-24 09:49:59 -07:00
dependabot[bot]
7a425be59f
[tune](deps): Bump smart-open in /python/requirements/tune (#16001)
Bumps [smart-open](https://github.com/piskvorky/smart_open) from 4.2.0 to 5.0.0.
- [Release notes](https://github.com/piskvorky/smart_open/releases)
- [Changelog](https://github.com/RaRe-Technologies/smart_open/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/piskvorky/smart_open/compare/v4.2.0...v5.0.0)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-05-22 10:24:46 -07:00
Alex Wu
b81bdd8aee
[client] Local client builder (#15935) 2021-05-22 10:11:41 -07:00
Eric Liang
ba45e41b4d
Mark runtime_env_complicated, test_object_spilling as flaky, unmark test_scheduling 2021-05-21 16:16:11 -07:00
Amog Kamsetty
47cd075829
[Tune] Ray Client support for examples & CI (#15932)
Out of the box Ray Client support for Tune examples. Also add key examples with Ray Client to run in CI
2021-05-21 15:33:43 -07:00
architkulkarni
02f21653cc
[Core] Automatically inject ray and python version in runtime_env (#15958) 2021-05-21 17:15:52 -05:00
Ian Rodney
9bdd2cbe49
Revert "Revert "[Core] minor refactor for usability of runtime_env (#15965)" (#15985)" (#15987) 2021-05-21 14:57:44 -07:00
Eric Liang
ec522a2d46
Revert "[Core] minor refactor for usability of runtime_env (#15965)" (#15985)
This reverts commit 0a7ba95e42.
2021-05-21 11:57:38 -07:00
architkulkarni
0a7ba95e42
[Core] minor refactor for usability of runtime_env (#15965)
* minor refactor for usability of runtime_env

* remove job_config changes
2021-05-21 09:02:32 -07:00
Edward Oakes
b6a79445fe
[serve] Fix some test sizes to avoid bazel warnings (#15959) 2021-05-21 10:11:46 -05:00
Ian Rodney
6add438929
[client] Start Specific Server's in separate Conda environments (#15926)
* conda support

* test multiple ray.init called

* additional testing

* test for proxy_manager

* better error message

* pass in session_dir

* unit tests

* fix test_runtime_env_complicated

* clean up proxier

* respond to comments

* try finally blocks

* fix up test_client_proxy

* small modifications to tests

* additional test

* fix tests

* lintfix
2021-05-21 01:01:57 -07:00
Eric Liang
29aa336a4d
Revert "[Object spilling] Avoid worker crash when an object is spille… (#15964)
This reverts commit 061e3fbde3.
2021-05-20 21:17:59 -07:00
Edward Oakes
d339c8734e
Fix duplicate log messages when ray.shutdown() and ray.init() are called repeatedly (#15957) 2021-05-20 22:27:54 -05:00
Edward Oakes
82410f20b2
[serve] Add warning + docstring for anonymous namespaces (#15921) 2021-05-20 22:27:15 -05:00
architkulkarni
64fdac83a7
[Core] Add minimal support for pip in runtime env (#15927) 2021-05-20 20:47:16 -05:00
Kai Yang
061e3fbde3
[Object spilling] Avoid worker crash when an object is spilled right after being restored (#15903)
* Fix check failure when memory pressure is high

* Add test

* lint
2021-05-20 18:36:11 -07:00