Commit graph

9357 commits

Author SHA1 Message Date
Guyang Song
f104a5aad7
[docs] Fix cpp wheel description (#18386) 2021-09-07 15:45:04 -05:00
Lixin Wei
4f6b50dc46
[Core] Fix ServerCall Leaking (#17863)
* fix backpressure bug

* update comments

* stash

* add test

* add basic tests

* add fixture

* stash

* fix

* draft

* fix

* test added

* fixed

* fixed

* lint

* Update src/ray/rpc/test/grpc_server_test.cc

Co-authored-by: SangBin Cho <rkooo567@gmail.com>

* add copyright

* move test service to saperate file

* add ClientCallManager timeout tests

* fix

* lint

* lint

* lint

* test windows CI

* fix

* lint

* lint

* retry windows

* retry windows

* fix mac

* lint

* lint

Co-authored-by: SangBin Cho <rkooo567@gmail.com>
2021-09-07 12:15:43 -07:00
xwjiang2010
64c2f86a22
[Tune] Respect default_resources during Trial.reset(). (#18209) 2021-09-07 19:14:44 +01:00
Clark Zinzow
26b2720915
Add test coverage for writing to fsspec filesystems. (#18394) 2021-09-07 10:16:59 -07:00
Ian Rodney
ec2110e470
[Codeowners] Add Chris & Mingwei to Ray Client proto (#18395) 2021-09-07 09:17:23 -07:00
Jiajun Yao
2740d28fad
[client] Increase timeout for ProxyManager.get_channel (#18350) 2021-09-07 11:06:17 -05:00
Qing Wang
d87441cda7
[Java] ConcurrencyGroup in Java local mode. (#18241)
* WIP

* Fix

* Fix test

* Refine

* Fix lint,

* WIP2

* WIP2

* Refine

* Put a default concurrency group.

* Fix submitting task with concurrency group name.

* Remove unnecessary changes.

* Update java/runtime/src/main/java/io/ray/runtime/task/LocalModeTaskSubmitter.java

Co-authored-by: Kai Yang <kfstorm@outlook.com>

Co-authored-by: Kai Yang <kfstorm@outlook.com>
2021-09-07 20:43:31 +08:00
Sven Mika
cabaa3b3c6
[RLlib Testing] Add A3C/APPO/BC/DDPPO/MARWIL/CQL/ES/ARS/TD3 to weekly learning tests. (#18381) 2021-09-07 11:48:41 +02:00
Jiajun Yao
64040a90a5
Datasets schema should match the columns selection for Parquet (#18361) 2021-09-07 00:41:26 -07:00
Sasha Sobol
f24ccf475e
[client] Add a grpc.ChannelCredentials argument to ray.init (#18365)
Co-authored-by: Thomas Desrosiers <thomas@anyscale.com>
2021-09-07 00:17:13 -07:00
Sven Mika
56f142cac1
[RLlib] Add support for evaluation_num_episodes=auto (run eval for as long as the parallel train step takes). (#18380) 2021-09-07 08:08:37 +02:00
Kai Fricke
f3a3a4bc92
[tune] Queue more than more actor/placement group (#18338) 2021-09-06 09:41:08 -07:00
Sven Mika
5292b70fc6
[RLlib] Add multi-GPU attention net tests to nightly test suite (+ R2D2 tests for LSTM and attention nets). (#18368) 2021-09-06 17:48:05 +02:00
Kai Fricke
d9552e6795
Update release process doc and checklist (#18336)
Co-authored-by: Qing Wang <kingchin1218@126.com>
2021-09-06 14:09:31 +01:00
Sven Mika
e3e6ed7aaa
[RLlib] Issues 17844, 18034: Fix n-step > 1 bug. (#18358) 2021-09-06 12:14:20 +02:00
Sven Mika
59f796edf3
[RLlib] Fix crash when using StochasticSampling exploration (most PG-style algos) w/ tf and numpy > 1.19.5 (#18366) 2021-09-06 12:14:00 +02:00
Guyang Song
5a89b47f56
[Event] support set event level (#18275)
Co-authored-by: Hao Chen <chenh1024@gmail.com>
2021-09-06 16:41:49 +08:00
Eric Liang
cbdafa0b63
[doc] Fix various workflow doc bugs (#18357) 2021-09-06 01:39:08 -07:00
Chen Shen
7c9d261dce
[Core][plasma] consolidate stats calculation for plasma store 2021-09-05 22:24:21 -07:00
Richard Liaw
0594deafdf
[tune] allow users to configure bootstrap for docker syncer (#17786) 2021-09-05 22:04:31 -07:00
Richard Liaw
93f7976215
[docs/deps] Clean up dependency ux/docs #18360
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2021-09-05 22:03:32 -07:00
qicosmos
1da05209b9
[C++ Worker]Add get actor API. (#17897)
* linkopts shared

* add get actor api

* fix

* improve

* reduce some duplicate code

* improve some
2021-09-06 11:46:46 +08:00
Sven Mika
ba58f5edb1
[RLlib] Strictly run evaluation_num_episodes episodes each evaluation run (no matter the other eval config settings). (#18335) 2021-09-05 15:37:05 +02:00
Sven Mika
a772c775cd
[RLlib] Set random seed (if provided) to Trainer process as well. (#18307) 2021-09-04 11:02:30 +02:00
Eric Liang
c4199a8054
Add more workflow comparisons (#18347) 2021-09-03 19:26:33 -07:00
Alex Wu
7912a8554c
[code oweners] Add Hao to autoscaler compatibility (#18218) 2021-09-03 18:55:09 -07:00
Yi Cheng
23e9af0601
[test] Add x nodes y actors test to nightly tests (#18291) 2021-09-03 18:54:23 -07:00
Chen Shen
cf4fb4edb3
[Core][plasma] fix the data race issue (#18312) 2021-09-03 18:51:27 -07:00
Simon Mo
e61160d514
[Dashboard] Move gcs health check to a separate thread to avoid crashing due to excessive CPU usage. (#18236) 2021-09-03 14:23:56 -07:00
Jiajun Yao
e049d52d29
Retry application-level error by default for datasets (#18296) 2021-09-03 14:21:38 -07:00
ellimac54
772d25cc38
Add Initial Windows Dockerfile (#17474) 2021-09-03 11:41:06 -07:00
matthewdeng
26f73ebb0b
[sgd] Implement resources_per_worker (#18327)
* [sgd] add support for additional resources per worker

* [sgd] add support for additional resources per worker

* update test

* lint

* update comments for case-sensitivity
2021-09-03 11:10:46 -07:00
xwjiang2010
01adf030ec
[Tune] Raise Error when there are insufficient resources. (#17957) 2021-09-03 10:49:54 -07:00
Kai Fricke
ac5d255c9c
[rllib/docker] silent unzip of atari roms (#18340) 2021-09-03 17:55:03 +01:00
Edward Oakes
a11978ea42
[runtime_env] Remove unused serialized-runtime-env from worker args (#18295) 2021-09-03 10:57:01 -05:00
Edward Oakes
1f6705d35d
[runtime_env] Centralize runtime_env logic into ray._private.runtime_env submodule (#18310) 2021-09-03 10:19:00 -05:00
Kai Fricke
6aa8a4eddc
[release] prettier output of release test results and artifacts (#18337) 2021-09-03 14:00:55 +01:00
Sven Mika
9a8ca6a69d
[RLlib] Fix Atari learning test regressions (2 bugs) and 1 minor attention net bug. (#18306) 2021-09-03 13:29:57 +02:00
Kai Fricke
fb38d06cfb
Move RLLib GPU release test dependencies to ml docker (#18208) 2021-09-03 09:35:18 +01:00
gjoliver
336e79956a
[RLlib] Make MultiAgentEnv inherit gym.Env to avoid direct class type manipulation (#18156) 2021-09-03 08:02:05 +02:00
qicosmos
72739462a9
[C++ Worker]Add some api of placement group part1. (#17925)
* linkopts shared

* add some pg api

* add Wait for PlacementGroup
2021-09-03 13:32:28 +08:00
Alex Wu
fa961032e1
[workflow] object ref integration (#18128)
* notes

* notes

* .

* seems to work?

* .

* seems to work

* needs tests

* needs tests

* parallelize uploads

* fixed

* fixed

* .

* dumb test

* .

* .

* fix festsg

* .

* works

* .:

* .

* .

* Update common.py

Co-authored-by: Alex Wu <alex@anyscale.com>
2021-09-02 19:59:45 -07:00
SangBin Cho
814095add6
Revert "Change instance type for some tests (#18248)" (#18320)
This reverts commit 34026a7bd5.
2021-09-02 17:45:02 -07:00
Amog Kamsetty
40b6d765df
[SGD] v2 tune checkpointing (#18179)
* wip

* wip

* wip

* wip

* fix test

* finish

* fix failing tests

* address comments

* wip

* address comments

* update

* fix

* fix fault tolerance checkpoint id

* lint

* updates

* updates

* add test

* updates

* update

* Update python/ray/util/sgd/v2/trainer.py

Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>

* Update python/ray/util/sgd/v2/trainer.py

Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>

* Update python/ray/util/sgd/v2/trainer.py

Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>

* Update python/ray/util/sgd/v2/backends/backend.py

Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>

* Update python/ray/util/sgd/v2/backends/backend.py

Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>

* Update python/ray/util/sgd/v2/backends/backend.py

Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>

* lint

* fix

* fix test

Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
2021-09-02 17:44:37 -07:00
Jiajun Yao
d9538a958b
Avoid duplicate exports of functions (#18284) 2021-09-02 17:36:52 -07:00
Eric Liang
7dcae690b9
Mark datasets as still in alpha for now (#18321) 2021-09-02 17:07:33 -07:00
SangBin Cho
9b9eae1e86
Change misleading documentation from the placement group (#18257)
* Modify a doc

* completed
2021-09-02 16:40:48 -07:00
wanxing
60f84fa051
Abstract plasma store get request queue (#18064)
* begin

* build

* add test

* add first test

* add test

* fix build

* lint bazel

* fix build

* fix build

* fix crash

* fix some comment

* revert shared_ptr ObjectLifecycleManager

* fix RemoveGetRequest lost

* no defer

* fix lots of comments

* fix build

* fix data race

* fix comments

* Revert "fix data race"

This reverts commit 8f58e3d70b73af864566e056211ff1b70cab870c.

* refine

* fix mac build

* fix unit test

* fix unit test
2021-09-02 14:16:50 -07:00
Edward Oakes
549a8fa948
[runtime_env] [ray_client] Remove PrepRuntimeEnv RPC, upload working_dir before calling ray.init in server (#18240) 2021-09-02 14:02:39 -05:00
Antoni Baum
4c95ea6d0a
[client] Improve Ray Client connection timeout information (#18281)
* Improve Ray Client connection timeout information

* fix lint issue.

Co-authored-by: Ameer Haj Ali <ameerh@berkeley.edu>
2021-09-02 16:34:11 +03:00