Michael Luo
59ccbc0fc7
[RLlib] Model Annotations: Tensorflow ( #11964 )
2020-11-12 12:18:50 +01:00
Michael Luo
b2984d1c34
[RLlib] Model Annotations to Torch Models ( #9749 )
2020-11-12 12:16:12 +01:00
Tao Wang
3fbd8be851
[Placement Group]Do not really subtract resources, just count ( #11894 )
...
* [Placement Group]Do not really subtract resources, just count
* add todo
2020-11-12 00:01:19 -08:00
SangBin Cho
f80d812799
[Object Spilling] Introduce SpillWorker & RestoreWorker Pool to avoid IO worker deadlock. ( #11885 )
2020-11-11 18:20:14 -08:00
Barak Michener
de6df51bd2
[redis, docs]: Bump redis and docs/Pillow dependencies ( #11371 )
2020-11-11 18:15:27 -08:00
Max Fitton
f545418c3f
[Dashboard] Fix dashboard regression caused by logCount and errCount being removed from worker payload ( #11954 )
2020-11-11 14:55:54 -08:00
Edward Oakes
73a1cb702b
Split _get_node_provider_cls off from _get_node_provider ( #11949 )
2020-11-11 16:10:46 -06:00
Ameer Haj Ali
85197deece
[autoscaler] Remove legacy autoscaler ( #11802 )
2020-11-11 13:36:48 -08:00
Sven Mika
72fc79740c
[RLlib] Issue with pickle versions (breaks rollout test cases in RLlib). ( #11939 )
2020-11-11 21:52:21 +01:00
dHannasch
396ae0b7c2
Add docstring for find_redis_address ( #11884 )
2020-11-11 12:24:36 -06:00
Sven Mika
291c172d83
[RLlib] Support Simplex action spaces for SAC (torch and tf). ( #11909 )
2020-11-11 18:45:28 +01:00
Kai Yang
4735c032ed
[Core] Fix C++ worker test ( #11941 )
2020-11-11 09:04:45 -08:00
Tao Wang
92286660e4
[Core] Lazy create node manager clients, and destroy then ( #11928 )
2020-11-11 08:51:40 -08:00
SangBin Cho
7b8bd15702
[Stalebot] Fix issues. ( #11930 )
2020-11-11 00:28:02 -08:00
Siyuan (Ryans) Zhuang
b8dda0e3d0
[Serialization] Fix buffer alignment issues ( #11888 )
...
* fix buffer alignment issues
* remove unused fields
* aligned memory allocation
* windows compat
* license. fix compiler warnings
* fix compilation error
* reinterpret_cast
2020-11-10 23:44:16 -08:00
chaokunyang
1979ea9c0a
fix disable javadoc lint ( #11907 )
2020-11-11 13:40:50 +08:00
dHannasch
29cb32539e
[Core] If failed to connect to redis, try to say why. ( #11916 )
2020-11-10 18:22:10 -08:00
fangfengbin
433e4f32da
[GCS]Reduce get operations of worker table ( #11599 )
...
* [GCS]Reduce get operations of worker table
* fix ut bug
* fix ut bug
* fix review comment
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-11-10 18:11:25 -08:00
Alex Wu
8afd2acdc1
[Autoscaler] simulator placement groups ( #11777 )
2020-11-10 18:10:36 -08:00
Eric Liang
46f3652102
Remove repeat push timeout from object manager ( #11874 )
2020-11-10 16:26:53 -08:00
Keqiu Hu
0c1bdaef59
[tune] TensorFlow Distributed Trainable ( #11876 )
...
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-11-10 14:59:08 -08:00
Richard Liaw
50dbf1a307
[core] Support configurable number of "check for redis" attempts ( #11902 )
...
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2020-11-10 14:57:57 -08:00
Ian Rodney
1d158dda32
[serve] Rename to use replicas, not workers ( #11822 )
2020-11-10 11:36:15 -08:00
Eric Liang
9b8218aabd
[docs] Move all /latest links to /master ( #11897 )
...
* use master link
* remae
* revert non-ray
* more
* mre
2020-11-10 10:53:28 -08:00
fangfengbin
543f7809a6
[GCS]Add gcs dump log(Part1) ( #11727 )
...
* add part code
* fix compile bug
* Fix bug
* Add part code
* fix review comment
* fix review comment
* fix lint error
* fix review comment
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
2020-11-10 14:10:03 +08:00
Nikita Vemuri
aba9288615
[Autoscaler] Introduce callback system ( #11674 )
...
Co-authored-by: Nikita Vemuri <nikitavemuri@Nikitas-MacBook-Pro.local>
Co-authored-by: Xiayue Charles Lin <xcl@anyscale.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-11-09 20:03:15 -08:00
Eric Liang
ee2da0cf45
[Core] PushManager for reliable broadcast ( #11869 )
2020-11-09 18:01:47 -08:00
Benjamin Black
1999266bba
Updated pettingzoo env to acomidate api changes and fixes ( #11873 )
...
* Updated pettingzoo env to acomidate api changes and fixes
* fixed test failure
* fixed linting issue
* fixed test failure
2020-11-09 16:09:49 -08:00
Eric Liang
a9cf0141a0
[autoscaler] Fix semantics of request_resources ( #11820 )
2020-11-09 14:57:40 -08:00
Edward Oakes
1c132f2ff8
[serve] Improve DEBUG logging for understanding perf ( #11838 )
2020-11-09 14:10:42 -06:00
architkulkarni
adcaabcd64
[Serve] Reconfigure backend class at runtime ( #11709 )
2020-11-09 14:04:51 -06:00
Kai Fricke
287aba6dc3
[tune] schedulers: Add test for context finalization ( #11889 )
2020-11-09 11:37:05 -08:00
Richard Liaw
a09e49ee94
[core] Add retry for reading session name ( #11844 )
2020-11-09 11:22:50 -08:00
Kai Fricke
88be1ea20b
[tune] Handle infinite and NaN values ( #11835 )
2020-11-09 11:18:31 -08:00
Kai Yang
904f48ebd9
[Core] Multi-tenancy: Pass job ID from Raylet to worker via env variable ( #11829 )
...
* Pass job ID from Raylet to worker via env variable
* fix
* fix
* fix
* lint
* fix
* fix test_object_spilling
* address comments
* lint
* fix
2020-11-09 11:02:15 -08:00
Tao Wang
77e3163630
[GCS]Only pass node id to node failure detector ( #11886 )
...
* [GCS]Only pass node id to node failure detector
* rename
2020-11-09 10:52:33 -08:00
Max Fitton
368b14a0da
Stop dashboard from erroring when an actor does not have a corresponding core worker ( #11870 )
2020-11-09 11:36:34 -06:00
Edward Oakes
2feba4409c
[serve] Fix long running failure test ( #11805 )
2020-11-09 11:21:03 -06:00
fangfengbin
407a212816
[GCS]Fix TestActorTableResubscribe bug ( #11830 )
...
* fix compile bug
* [GCS]Fix TestActorTableResubscribe bug
* rm unused code
* fix lint error
* fix review comment
* fix ut bug
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-11-08 23:50:05 -08:00
dHannasch
64ca30c060
[doc] Troubleshooting --dashboard-port ( #11816 )
2020-11-08 15:53:50 -08:00
Eric Liang
0932320eb3
Move test_joblib back to new_scheduler_broken category ( #11872 )
2020-11-07 20:08:41 -08:00
Stephanie Wang
61e41257e7
[Object spilling] Queue failed object creation requests until objects have been spilled ( #11796 )
...
* Queue creation requests
* Cleanup disconnected clients
* Remove unused
* todo
* FIFO order for create requests, remove warmup for IO workers
* test and lint
* disable test
* lint
* Skip on windows
2020-11-06 18:22:19 -05:00
Amog Kamsetty
900a48c19c
[Tune] Better warnings/exceptions for fail_fast='raise' ( #11842 )
2020-11-06 15:01:55 -08:00
Aaron Miller
045fed5cd2
[examples] comment out rsync_
settings for K8S ( #11862 )
2020-11-06 14:35:21 -08:00
SangBin Cho
e0ecf5d79d
Revert "[GCS]Open light heartbeat by default ( #11689 )" ( #11861 )
...
This reverts commit 612ddb2dd1
.
2020-11-06 14:34:59 -08:00
Simon Mo
871cde989a
Re-Revert: [Serialization] Update CloudPickle to 1.6.0 ( #9694 ) ( #11837 )
2020-11-06 12:24:36 -08:00
Kishan Sagathiya
c5e6c90e1e
[Core] Add name of actor in the result of ray.actors()
( #11828 )
...
Added name field to `actor_info`
Fixes #11112
2020-11-06 10:45:44 -08:00
bermaker
12ae0f20c6
[Metrics] Fix prometheus configuration doc ( #11856 )
2020-11-06 10:34:33 -08:00
Eric Liang
6b7a4dfaa0
[rllib] Forgot to pass ioctx to child json readers ( #11839 )
...
* fix ioctx
* fix
2020-11-05 22:07:57 -08:00
Philipp Moritz
28e7439cf0
[doc] Add documentation for Ray debugger ( #11815 )
...
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-11-05 16:25:27 -08:00