Thomas Desrosiers
3e48df89f7
[Client] Fix mismatched debug log ID formats ( #17597 )
2021-08-13 13:28:20 -07:00
Huaiwei Sun
14365e111d
Update README.rst ( #17471 )
...
Add Slack channel info in the "Getting Involved" section
2021-08-13 13:25:51 -07:00
akern40
0cb2c602db
[rllib] Fixes typo in RolloutWorker.__init__ ( #17583 )
...
Fixes the typo in RolloutWorker.__init__, closes #17582
2021-08-13 13:17:36 -07:00
Amog Kamsetty
9f5dc5ec9f
[Docker] Downgrade to CUDA 11.0 ( #17806 )
2021-08-13 20:39:06 +02:00
architkulkarni
fcac416933
[Serve] [Dashboard] Add start times and replica tags to cluster snapshot ( #17749 )
2021-08-13 09:49:12 -07:00
Simon Mo
7d482fe099
[Doc] Update macos nightly wheel names ( #17813 )
2021-08-13 09:45:10 -07:00
Eric Liang
7ec52ca311
Make the namespace argument explicit instead of implicit in actor names ( #17758 )
2021-08-13 09:24:13 -07:00
Hasan Genc
a33dc75105
Shutdown clusters when large number of nodes ( #17642 )
...
* Allow clusters with over 1000 nodes to be shut down
* Add unit-test for terminating large number of nodes on AWS
* Fix lint
* Add max_terminate_nodes to the NodeProvider abstract class, and refactor terminate_nodes to reduce repetition
* lint
* Update comment
Co-authored-by: Ameer Haj Ali <ameerh@berkeley.edu>
* lint
* lint
* Unit test previously required internet access. This commit removes that requirement.
Co-authored-by: Ameer Haj Ali <ameerh@berkeley.edu>
2021-08-13 17:09:19 +03:00
Kai Fricke
96b620bc01
[docker] Pin matplotlib, fix docker build ( #17819 )
2021-08-13 14:59:50 +01:00
Qing Wang
9d5c68ff55
[Java] Better log message when failed to invoke task. ( #17737 )
2021-08-13 17:31:58 +08:00
xwjiang2010
0be9f06ab6
[tune] Output insufficent resources warning msg when trials are in pending for extended amount of time. ( #17533 )
2021-08-13 01:37:56 -07:00
Hao Zhang
61de23cbae
[Collective] silent the pygloo warning as it is not commonly used ( #17792 )
2021-08-13 00:47:45 -07:00
qicosmos
a2a1c46c83
[C++ Worker]Fix for mac ( #17633 )
...
* linkopts shared
* replace gflags with absl flags
* fix
* add test option
* fix
* add cpp worker to mac ci
* fix
* support empty redis password;mod arc argv
* add encoding
* test
* ignore example test on mac
* support mac
* fix
* fix and update doc
* fix
* fix run.sh
* fix init
* fix typo
* fix run.sh
* fix lint
Co-authored-by: 久龙 <guyang.sgy@antfin.com>
2021-08-13 12:22:37 +08:00
SangBin Cho
21635b32e5
[Core] Fix the segfault ( #17772 )
2021-08-12 18:17:50 -07:00
Simon Mo
242a5d1a8d
[Serve] Add support for root_url
( #17765 )
2021-08-12 17:54:53 -07:00
Simon Mo
22b030d79f
[Serve] Remove serve.start(http_*)
arguments ( #17762 )
2021-08-12 17:50:12 -07:00
Guyang Song
b97027ec64
[C++ API] support cpu gpu num 0 ( #17783 )
...
* support cpu gpu num 0
* support cpu gpu num 0
* fix
2021-08-13 08:45:33 +08:00
Robert Nishihara
f624ddae5f
Remove outdated link from readme ( #17788 )
2021-08-12 17:11:59 -07:00
Yi Cheng
e32d33f39c
Fix ray.init hanging due to failure. ( #17732 )
...
* up
* change to 30s
* up
* up
* format
2021-08-12 16:56:10 -07:00
wanxing
e4c8125c86
Make some function private ( #17729 )
...
* ReceiveObjectChunk
* more
2021-08-12 15:27:37 -07:00
Eric Liang
7fc62a1529
Support dataset union ( #17793 )
2021-08-12 14:01:40 -07:00
Lixin Wei
d287fc941b
[Core] Add Running Count to instrumented_io_context ( #17664 )
2021-08-12 13:56:40 -07:00
Chen Shen
9565fa549e
[Core][RFC] limit the total number of inlined bytes in task request rpc
...
Co-authored-by: Clark Zinzow <clarkzinzow@gmail.com>
2021-08-12 13:55:54 -07:00
Simon Mo
ec8409ff06
Add @architkulkarni to snapshot code owner ( #17785 )
2021-08-12 10:58:02 -07:00
Simon Mo
6879293b6b
[CI] Mark some tests exclusive ( #17650 )
2021-08-12 10:28:03 -07:00
Guyang Song
88b8de5904
[C++ API] support ray::IsInitialized ( #17780 )
...
* support ray::IsInitialized
* address comments
* fix
2021-08-13 00:51:26 +08:00
SangBin Cho
8fd7e025be
Skip raylet kill windows #17682 ( #17683 )
...
* Try fixing it?
* Done
* skip raylet signal
2021-08-12 09:35:44 -07:00
matthewdeng
55680a1f9e
[SGD] v2 initial checkpoint functionality ( #17632 )
...
* [SGD] initial checkpoint functionality
* remove thread implementation and merge with fetch_next_result
* Update comment
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
* address comments
* add additional tests
* fix imports
* load most recently saved checkpoint
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
2021-08-12 08:52:04 -07:00
Clark Zinzow
d6eeb5dc70
[Datasets] Add local and S3 filesystem test coverage for file-based datasources. ( #17158 )
2021-08-12 08:39:31 -07:00
Guyang Song
e53aeca6bb
[C++ API]support set resources in RayConfig ( #17779 )
2021-08-12 22:53:42 +08:00
Guyang Song
5713a0be6c
[C++ API] add C++ API docs ( #17743 )
2021-08-12 22:40:09 +08:00
mguarin0
3e010c5760
[rllib] bug fix for rllib pettingzoo pistonball_v4 example ( #17701 )
...
* bug fix for rllib pettingzoo pistonball_v4 example
* adding test for PR 17701
* ran scripts/format.sh
* ok
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-08-12 00:25:00 -07:00
architkulkarni
00f6b30684
[Serve] [Dashboard] Support nondetached and multiple Serve instances in cluster snapshot ( #17747 )
2021-08-11 22:26:54 -05:00
Eric Liang
ce171f10a1
Remove legacy plasma unlimited and pull manager pinning flag ( #17753 )
2021-08-11 20:19:12 -07:00
Guyang Song
63f9ba2858
[C++ API][Fix] support ray::Init without RayConfig ( #17733 )
2021-08-12 10:59:21 +08:00
Kai Yang
ab53c5fc93
[Java] Update rolling logging configuration ( #17741 )
2021-08-12 10:15:27 +08:00
Qing Wang
6d6a1ea43e
Support reading system configs from native in Java. ( #17703 )
...
* Support reading system configs from native in Java.
* Fix lint
* Lint cpp
* Fix Java cases.
* Address comments.
* Address comments.
2021-08-12 10:06:01 +08:00
Clark Zinzow
623db7c47b
[Datasets] Add support for reading partitioned Parquet datasets. ( #17716 )
2021-08-11 15:55:49 -07:00
Jiao
3c64a1a3c1
Add micro benchmark to releaser repo ( #17727 )
2021-08-11 15:15:33 -07:00
architkulkarni
9a70e83e90
[hotfix] pin tensorflow==2.5.1 ( #17760 )
...
* pin tensorflow==1.5.1
* Update python/requirements.txt
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-08-11 15:15:22 -07:00
Yi Cheng
aa96e59faf
[workflow] Examples of function chaining ( #17715 )
2021-08-11 13:15:51 -07:00
Eric Liang
71b3183038
Add implicit init note to Ray docs & dataset version note ( #17751 )
2021-08-11 13:13:22 -07:00
Yi Cheng
02e79f3fe5
Revert "[Observability] Export useful metrics ( #17578 )" ( #17752 )
...
This reverts commit bd4db53df2
.
2021-08-11 12:21:50 -07:00
Jiao
e38db5875b
Add serve external kv store ( #17622 )
2021-08-11 12:06:14 -07:00
Amog Kamsetty
ed24bae644
[SGD] Fail if num_workers is not greater than 0 ( #17723 )
2021-08-11 10:05:19 -07:00
Ian Rodney
97f7ae5e06
[Cluster Launcher] Allow attach/exec on uninitialized head node ( #17688 )
2021-08-11 09:43:23 -07:00
Sven Mika
7f2b3c0824
[RLlib] Issue 17667: CQL-torch + GPU not working (due to simple_optimizer=False; must use simple optimizer!). ( #17742 )
2021-08-11 18:30:21 +02:00
chenk008
f0fc26960d
[sgd] Wait for placement_group deletion when shutdown worker_group ( #17698 )
...
* fix
* fix ut
* delete sleep
* fix according to comment
* fix according to comment
* use pg in test_resize
* fix
2021-08-11 08:47:49 -07:00
Tricia Fu
24c4220bd7
[doc][serve] Update http-servehandle.rst ( #17680 )
2021-08-11 10:39:58 -05:00
Julius Frost
6891dee6ea
[RLlib] Better exceptions with traceback in TorchPolicy ( #17690 )
2021-08-11 15:01:07 +02:00