Yi Cheng
bd4db53df2
[Observability] Export useful metrics ( #17578 )
...
* up
* up
* up
* up
* up
* up
* up
* up
* up
* up
* up
* up
* up
* checkpoint
* up
* up
* up
* up
* fix
* up
* up
* up
* up
* up
* up
* up
* up
* up
* up
* add comments
* up
* up
* up
* up
* add tests
2021-08-10 17:14:42 -07:00
architkulkarni
0c2c99b951
[Dashboard] [Serve] Make serve import conditional ( #17713 )
2021-08-10 17:06:00 -07:00
SongGuyang
63c15d7ced
[core] make 'PopWorker' to be an async function ( #17202 )
...
* make 'PopWorker' to be an async function
* pop worker async works
* fix
* address comments
* bugfix
* fix cluster_task_manager_test
* fix
* bugfix of detached actor
* address comments
* fix
* address comments
* fix aioredis
* Revert "fix aioredis"
This reverts commit 041b983eac95b105ab0e853e84c4cf2647008431.
* bug fix
* fix
* fix test_step_resources test
* format
* add unit test
* fix
* add test case PopWorkerStatus
* address commit
* fix lint
* address comments
* add python test
* address comments
* make an independent function
* Update test_basic_3.py
Co-authored-by: Hao Chen <chenh1024@gmail.com>
2021-08-10 17:03:17 -07:00
SangBin Cho
a3c5cce834
Add prepare for dask on ray 1tb sort. ( #17708 )
2021-08-10 16:26:05 -07:00
xwjiang2010
932f038644
[tune] Type hint TrialExecutor. Use Abstract Base Class. ( #17584 )
2021-08-10 14:17:22 -07:00
Clark Zinzow
78d23434e6
[Datasets] Fix write_json so roundtrip writing + reading works. ( #17691 )
...
* Write out dataset blocks as newline-delimited JSON.
* Add roundtrip JSON reading + writing test.
* Formatting.
2021-08-10 13:24:33 -07:00
SangBin Cho
705a7192b3
Unflake multi node 3 ( #17694 )
2021-08-10 13:16:52 -07:00
architkulkarni
febe54f422
[serve] [dashboard] Change empty serve cluster snapshot from empty list to empty dict ( #17655 )
2021-08-10 13:35:00 -05:00
Amog Kamsetty
0b8489dcc6
Revert "[RLlib] Add support for multi-GPU to DDPG. ( #17586 )" ( #17707 )
...
This reverts commit 0eb0e0ff58
.
2021-08-10 10:50:21 -07:00
Amog Kamsetty
77f28f1c30
Revert "[RLlib] Fix Trainer.add_policy
for num_workers>0 (self play example scripts). ( #17566 )" ( #17709 )
...
This reverts commit 3b447265d8
.
2021-08-10 10:50:01 -07:00
SangBin Cho
6160c06c69
[Core] Fix a bug where get_actor crashes gcs if the actor is already killed. ( #17670 )
...
* Fix a bug where get_actor crashes gcs if the actor is already killed.
* Test the restart code path.
* Add an additional test
* Add a comment
* addressed code review.
2021-08-10 09:58:09 -07:00
yuduber
446ee1ad24
[autoscaler] Support Peloton node provider ( #17312 )
...
* modify updater to make it work with uber peloton node provider
* working solution for using NodeID as unique ID in peloton node provideer but need run ray.init
* working solution of using resource cmd to pass in node_id
* cleanup
* cleanup 2
* removed updater.py change to make sure of the disable_node_updaters flag
* add newliine to end of updater.py to undo all the change
* undo change in autoscaler.py
* use use_node_id_as_ip as field name in monitor
* lint
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
* fix-for-monitor-without-autoscaler
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
2021-08-10 12:48:11 -04:00
Antoni Baum
13f39b2cb7
[SGD] v2 JSON logger callback & callback groundwork ( #17619 )
...
* finish session
* finish
* formatting
* tests
* wip
* remove pdb
* remove import
* add tests
* raise from None
* Address comments
* Exception
* remove from None
* fix test
* address comments
* SGDv2 JSON Logging Callback
* Revert testing change
* Prefix autofilled metrics
* Move env var check to local
* Fix exception
* Improve docstrings, default filename
* Add unit test
* Implement feedback
* SGDLoggingCallback to SGDSingleFileLoggingCallback
* Use env_integer
Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>
2021-08-09 21:18:46 -07:00
Yi Cheng
473740b739
[gcs] Fix actor killing hang due to race condition ( #17634 )
...
* Revert "Revert "[gcs] Fix actor killing race condition (#17456 )" (#17599 )"
This reverts commit 381ffdb6d0
.
* update
* format
* up
2021-08-09 21:11:26 -07:00
qicosmos
05da724521
[C++ Worker] Replace Ray::xxx
with ray::xxx
and update namespaces ( #17388 )
2021-08-10 11:17:59 +08:00
SangBin Cho
d05571af2d
Fix a progress bar issue and add it to the nightly ( #17627 )
...
* Fix a progress bar issue and add it to the nightly
* Trial
* in progress
* Fix issues.
2021-08-09 19:31:47 -07:00
Richard Liaw
898eea9a84
patch ( #17681 )
...
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2021-08-09 18:51:00 -07:00
Dmitri Gekhtman
c1b9f921a6
[autoscaler] Add option of returning node metadata from non_terminated_nodes. ( #17273 )
...
* Optional return from nonterminated
* format
* terminated node signature
* add return
2021-08-09 20:23:31 -04:00
Ian Rodney
6475fe1b82
[Autoscaler][Docker] Warn if a file is passed in Docker File Mounts ( #16515 )
...
Co-authored-by: Ian Rodney <ilr@anyscale.com>
2021-08-09 15:13:58 -07:00
Richard Liaw
bde14f2de6
[tune] add developer/stability annotations ( #17442 )
...
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2021-08-09 14:50:59 -07:00
Dmitri Gekhtman
07a42a5bdb
resolve ( #17495 )
2021-08-09 17:33:16 -04:00
Siyuan (Ryans) Zhuang
68e884ee43
[workflow] Test fault tolerance with storage ( #17641 )
...
* new test
* update storage
* enhance test
* fix s3
2021-08-09 11:19:14 -07:00
wanxing
8312628c30
Remove unused Spill function ( #17607 )
2021-08-09 10:10:03 -07:00
Dmitri Gekhtman
b6443f9ec8
Revert "Added support for the imagePullSecrets in helm chart ( #17520 )" ( #17678 )
...
This reverts commit 208d997414
.
2021-08-09 12:11:58 -04:00
Simon Mo
7a0b8982f3
[serve] Return Client on serve.start() when connecting ( #17552 )
2021-08-09 10:55:05 -05:00
architkulkarni
bbcb06d45b
[doc] [runtime_env] Remove "experimental" label, add beta stability annotation ( #17651 )
2021-08-09 10:54:28 -05:00
SongGuyang
c62ce78be8
make C++ example more simpler ( #17609 )
2021-08-09 19:39:16 +08:00
Tao Wang
5990b60f8b
[Core]Cache named actor in local in case of getting them from GCS frequently. ( #17339 )
...
* [Core]Cach named actor in local in case of getting them from GCS frequently
* lint
* fix nullptr
* typo
* add namespace to cache
* lint
* lock, reference and others
* lint
* fix comments and add test
* lint
* lint
* optimize test
* add necessary fields in pub for caching
* add removing test
* fix test
2021-08-09 14:01:57 +08:00
SangBin Cho
1bcab9a7bb
[Object Spilling] Better error message for nightly test debugging ( #17645 )
...
* Fix
* Addressed code review.
* Addressed code review.
2021-08-08 20:44:49 -07:00
Hao Chen
0858f0e4f2
Change core worker C++ namespace to ray::core ( #17610 )
2021-08-08 23:34:25 +08:00
Simon Mo
c315596ed2
[Buildkite] Migrate macOS wheel builds ( #16913 )
2021-08-07 21:54:34 -07:00
SangBin Cho
654718902f
Fix ( #17660 )
2021-08-07 18:07:27 -07:00
Qing Wang
4cc34588db
[Core] Support ConcurrentGroup part1 ( #16795 )
...
* Core change and Java change.
* Fix void call.
* Address comments and fix cases.
* Fix asyncio
2021-08-07 22:41:33 +08:00
Kai Yang
9b3c0ad35b
Fix “argument type mismatch” when an exception occurs in chained tasks ( #17636 )
2021-08-07 17:47:43 +08:00
Tricia Fu
c415c26644
[serve] Update FastAPI documentation to make it runnable ( #17589 )
2021-08-06 17:46:19 -05:00
architkulkarni
f4c70be7f7
[Serve] Add replica tag to request counter and error counter ( #17613 )
2021-08-06 15:35:34 -07:00
architkulkarni
6d975b821b
[Serve] [Dashboard] Initial PR for exporting Serve data to cluster snapshot ( #17489 )
2021-08-06 15:03:29 -07:00
SangBin Cho
4616e8a03c
Fix wrong invariant pubsub ( #17620 )
...
* ip
* loose check failure
* Fix the bug properly.
* Fix comments.
2021-08-06 14:14:54 -07:00
Edward Oakes
57b190c987
[serve] Remove logic to automatically infer conda env name ( #17639 )
2021-08-06 13:27:23 -05:00
architkulkarni
b173b33934
[tests] Add runtime envs release test to nightly build script ( #17638 )
2021-08-06 13:18:25 -05:00
liuyang-my
12bd904594
[Serve] Define BackendConfig protobuf and adapt it in Java ( #17201 )
2021-08-06 09:50:45 -07:00
architkulkarni
ac9a1a20df
[core] [runtime_env] Use per-env async lock in agent ( #17542 )
...
Co-authored-by: Ed Oakes <ed.nmi.oakes@gmail.com>
2021-08-06 11:11:37 -05:00
Kai Fricke
2b520bafc5
[release/alert] less results ( #17637 )
2021-08-06 10:26:07 +01:00
Kai Fricke
bd2404e496
[release/rllib] fix learning test script ( #17635 )
2021-08-06 10:07:59 +01:00
Zhi Lin
82123123c4
[object store] Java API for Assign the object owner in Ray.put()
( #17237 )
...
Co-authored-by: Qing Wang <kingchin1218@126.com>
Co-authored-by: Kai Yang <kfstorm@outlook.com>
2021-08-06 15:26:59 +08:00
Amog Kamsetty
f0cca063ad
[SGD v2] Reduce time for HF smoke test ( #17623 )
...
* reduce
* switch back model
* Update python/ray/util/sgd/v2/BUILD
2021-08-05 21:04:34 -07:00
Stephanie Wang
a06d71477f
[core] Do not spill back tasks blocked on args to blocked nodes ( #17550 )
...
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2021-08-05 20:43:32 -07:00
Amog Kamsetty
add6ceb3ec
[Dependencies] Fix missing dependency UX ( #17420 )
...
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-08-05 20:18:42 -07:00
Amog Kamsetty
14b02c3341
Add ray.data symlink to setup-dev.py ( #17624 )
2021-08-05 19:51:15 -07:00
Chen Shen
920a4e3d56
[core] Improve fatal message for fallback allocation ( #17595 )
2021-08-05 17:58:45 -07:00