Edward Oakes
f0555f88d6
[runtime_env] Move worker process startup logic to context ( #18341 )
2021-09-08 17:08:27 -05:00
Antoni Baum
dd6abed6ce
[tune] Fix an edge case where DurableTrainable
would not delete checkpoints in remote storage ( #18318 )
...
Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
2021-09-08 15:00:09 -07:00
Ian Rodney
c91e0eb065
[Dashboard] Increase Actor Snapshot Size ( #18433 )
2021-09-08 12:06:33 -07:00
Sasha Sobol
f76f14fedf
[client] pass _credentials down from init ( #18425 )
2021-09-08 10:30:26 -07:00
Clark Zinzow
b30c41759d
[Datasets] Adds tensor column support (tensors-in-tables) via Pandas/Arrow extension types/arrays. ( #18301 )
2021-09-08 10:09:01 -07:00
mwtian
e427e4a467
Fix flakiness in test_proxy_manager_internal_kv ( #18416 )
2021-09-08 15:46:45 +03:00
Kai Fricke
dac3a8bc8e
[setup] Upstream conda patches ( #17575 )
...
Co-authored-by: Vasilij Litvinov <vasilij.n.litvinov@intel.com>
2021-09-08 10:37:17 +01:00
Edward Oakes
56adaa32f1
[serve] Better logging for exceptions in backend_state.update() ( #18402 )
2021-09-07 21:40:41 -05:00
Simon Mo
a29da81cfc
Revert "Revert "Fix tracing bug when actors are defined before connecting to …" ( #16122 )
2021-09-07 16:19:49 -07:00
Edward Oakes
f2afb08125
[runtime_env] Don't modify passed runtime_env dictionary when validating ( #18404 )
2021-09-07 16:14:28 -07:00
Lada Kunc
1a72c49009
[serve] Fix get_handle execution from threads ( #18198 )
2021-09-07 14:49:36 -07:00
Guyang Song
f104a5aad7
[docs] Fix cpp wheel description ( #18386 )
2021-09-07 15:45:04 -05:00
xwjiang2010
64c2f86a22
[Tune] Respect default_resources during Trial.reset(). ( #18209 )
2021-09-07 19:14:44 +01:00
Clark Zinzow
26b2720915
Add test coverage for writing to fsspec filesystems. ( #18394 )
2021-09-07 10:16:59 -07:00
Jiajun Yao
2740d28fad
[client] Increase timeout for ProxyManager.get_channel ( #18350 )
2021-09-07 11:06:17 -05:00
Sven Mika
cabaa3b3c6
[RLlib Testing] Add A3C/APPO/BC/DDPPO/MARWIL/CQL/ES/ARS/TD3 to weekly learning tests. ( #18381 )
2021-09-07 11:48:41 +02:00
Jiajun Yao
64040a90a5
Datasets schema should match the columns selection for Parquet ( #18361 )
2021-09-07 00:41:26 -07:00
Sasha Sobol
f24ccf475e
[client] Add a grpc.ChannelCredentials argument to ray.init ( #18365 )
...
Co-authored-by: Thomas Desrosiers <thomas@anyscale.com>
2021-09-07 00:17:13 -07:00
Kai Fricke
f3a3a4bc92
[tune] Queue more than more actor/placement group ( #18338 )
2021-09-06 09:41:08 -07:00
Eric Liang
cbdafa0b63
[doc] Fix various workflow doc bugs ( #18357 )
2021-09-06 01:39:08 -07:00
Richard Liaw
0594deafdf
[tune] allow users to configure bootstrap for docker syncer ( #17786 )
2021-09-05 22:04:31 -07:00
Richard Liaw
93f7976215
[docs/deps] Clean up dependency ux/docs #18360
...
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2021-09-05 22:03:32 -07:00
Eric Liang
c4199a8054
Add more workflow comparisons ( #18347 )
2021-09-03 19:26:33 -07:00
Simon Mo
e61160d514
[Dashboard] Move gcs health check to a separate thread to avoid crashing due to excessive CPU usage. ( #18236 )
2021-09-03 14:23:56 -07:00
Jiajun Yao
e049d52d29
Retry application-level error by default for datasets ( #18296 )
2021-09-03 14:21:38 -07:00
matthewdeng
26f73ebb0b
[sgd] Implement resources_per_worker
( #18327 )
...
* [sgd] add support for additional resources per worker
* [sgd] add support for additional resources per worker
* update test
* lint
* update comments for case-sensitivity
2021-09-03 11:10:46 -07:00
xwjiang2010
01adf030ec
[Tune] Raise Error when there are insufficient resources. ( #17957 )
2021-09-03 10:49:54 -07:00
Edward Oakes
a11978ea42
[runtime_env] Remove unused serialized-runtime-env from worker args ( #18295 )
2021-09-03 10:57:01 -05:00
Edward Oakes
1f6705d35d
[runtime_env] Centralize runtime_env logic into ray._private.runtime_env submodule ( #18310 )
2021-09-03 10:19:00 -05:00
Kai Fricke
fb38d06cfb
Move RLLib GPU release test dependencies to ml docker ( #18208 )
2021-09-03 09:35:18 +01:00
Alex Wu
fa961032e1
[workflow] object ref integration ( #18128 )
...
* notes
* notes
* .
* seems to work?
* .
* seems to work
* needs tests
* needs tests
* parallelize uploads
* fixed
* fixed
* .
* dumb test
* .
* .
* fix festsg
* .
* works
* .:
* .
* .
* Update common.py
Co-authored-by: Alex Wu <alex@anyscale.com>
2021-09-02 19:59:45 -07:00
Amog Kamsetty
40b6d765df
[SGD] v2 tune checkpointing ( #18179 )
...
* wip
* wip
* wip
* wip
* fix test
* finish
* fix failing tests
* address comments
* wip
* address comments
* update
* fix
* fix fault tolerance checkpoint id
* lint
* updates
* updates
* add test
* updates
* update
* Update python/ray/util/sgd/v2/trainer.py
Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
* Update python/ray/util/sgd/v2/trainer.py
Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
* Update python/ray/util/sgd/v2/trainer.py
Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
* Update python/ray/util/sgd/v2/backends/backend.py
Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
* Update python/ray/util/sgd/v2/backends/backend.py
Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
* Update python/ray/util/sgd/v2/backends/backend.py
Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
* lint
* fix
* fix test
Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
2021-09-02 17:44:37 -07:00
Jiajun Yao
d9538a958b
Avoid duplicate exports of functions ( #18284 )
2021-09-02 17:36:52 -07:00
Edward Oakes
549a8fa948
[runtime_env] [ray_client] Remove PrepRuntimeEnv RPC, upload working_dir before calling ray.init in server ( #18240 )
2021-09-02 14:02:39 -05:00
Antoni Baum
4c95ea6d0a
[client] Improve Ray Client connection timeout information ( #18281 )
...
* Improve Ray Client connection timeout information
* fix lint issue.
Co-authored-by: Ameer Haj Ali <ameerh@berkeley.edu>
2021-09-02 16:34:11 +03:00
xwjiang2010
9fa7951171
[core] Log once when get_gpu_ids is called on driver. ( #18282 )
2021-09-01 16:47:00 -07:00
Stephanie Wang
d43d297d9a
[core] Attach call site to ObjectRefs, print on error ( #17971 )
...
* Attach call site to ObjectRef
* flag
* Fix build
* build
* build
* build
* x
* x
* skip on windows
* lint
2021-09-01 15:29:05 -07:00
Chris K. W
1a10108765
[core] Release function actor lock while waiting for actor class to be loaded by import thread ( #18175 )
2021-09-01 12:59:48 -07:00
Amog Kamsetty
9c2e7ffd97
[SGD] v2 Fault Tolerance ( #18090 )
...
* wip
* wip
* wip
* wip
* update
* finish
* remove
* fix
* update
* update
* update comment
* handle backend failures
* bump test timeout
* address comments
* fix
* fix
* address comments
* formatting
* add comment
* address comment
* fix failing test
* update error message
* Update python/ray/util/sgd/v2/trainer.py
* wip
* fix failing test
* formatting
* fix
2021-09-01 12:43:10 -07:00
Edward Oakes
0326bbb30a
[serve] Skip test_standalone namespace test on windows ( #18277 )
2021-09-01 12:58:59 -05:00
Jiajun Yao
fbb3ac6a86
Retry application-level errors ( #18176 )
...
* Retry application-level errors
* Retry application-level errors
* Push retry message to the driver
2021-09-01 10:53:06 -07:00
Edward Oakes
673bf35c1f
Refactor BackendState to be per-backend instead of global ( #18255 )
2021-09-01 09:46:22 -05:00
mwtian
be50c13251
[Client] Use a single RPC to fetch ClientObjectRefs passed in a list ( #16944 )
2021-08-31 16:31:13 -07:00
Edward Oakes
5d122cf7b7
[runtime_env] Move working dir setup to the agent ( #18170 )
2021-08-31 17:22:49 -05:00
matthewdeng
a3123b6860
[SGD] v2 Horovod backend ( #18047 )
...
* [SGD] add Horovod backend
* address comments: set CUDA_VISIBLE_DEVICES, refactor code
* fix gpu test
* fix lint/test import
* address comments, add example cluster config
* delay horovod imports
2021-08-31 12:54:59 -07:00
Wesley Gifford
6133a561e9
Dataset from modin ( #18122 )
2021-08-31 11:19:35 -07:00
Nikita Vemuri
c5b99ab590
[serve] Start RayInternalKVStore in controller namespace ( #18164 )
2021-08-31 13:09:33 -05:00
Edward Oakes
17dded543c
Support passing gcs_client to internal_kv ( #18235 )
2021-08-31 12:46:41 -05:00
Ryan L. Melvin
c081c68de7
[tune] Conditional search space example using hyperopt ( #18130 )
...
Co-authored-by: Ryan Melvin <rmelvin@uabmc.edu>
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
2021-08-31 17:06:22 +02:00
Kai Fricke
a8dbc44f9a
[ci] minimal dependency install test ( #18071 )
2021-08-31 15:26:25 +02:00