Commit graph

8886 commits

Author SHA1 Message Date
matthewdeng
3a1aed28b7
[torch] fix process group timedelta (#17468) 2021-07-30 15:47:33 -07:00
SangBin Cho
9a696cc66a
Pin aioredis version (#17472) 2021-07-30 12:04:14 -07:00
Jiao
d67c57007b
change placement group report size to 1k (#17216)
Co-authored-by: Jiao Dong <jiaodong@anyscale.com>
2021-07-30 11:29:41 -07:00
Chen Shen
32803b53b0
Fix potential dead-lock (#17396) 2021-07-30 11:28:49 -07:00
Alex Wu
9e79301d35
Split scalability envelope + smoke tests (#17455)
* .

* done?

* done?

* sang comments

* .

Co-authored-by: Alex Wu <alex@anyscale.com>
2021-07-30 10:20:19 -07:00
Kai Fricke
b0f00b1b4b
[default] pin aioredis < 2 (#17465) 2021-07-30 17:57:17 +01:00
Eric Liang
cd13059691
[dataset] Implement random_shuffle() and split(equal=True) (#17448) 2021-07-30 09:51:21 -07:00
Patrick Ames
131710f9f9
[autoscaler] Add support for EC2 launch templates. (#17236) 2021-07-30 08:05:59 -07:00
Antoni Baum
23bdad01be
Fix XGBoost-Ray and LightGBM-Ray docs properly (#17433) 2021-07-30 15:47:41 +01:00
wanxing
705248f4ee
[CoreWorker]Remove plasma_objects_only parameter (#17384) 2021-07-30 14:48:36 +08:00
Qing Wang
b8baac3cb0
[Java] Filter error log for intentional system exit. (#17289) 2021-07-30 13:17:38 +08:00
Chen Shen
d856abb70d
[Test] increase memory for 5000 partitions shuffle (#17429) 2021-07-29 21:56:16 -07:00
matthewdeng
58c4fe727c
[SGD] TrainerV2 API interface (#17447)
Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>
2021-07-29 19:39:39 -07:00
Eric Liang
0373c54b3e
Add warning if get_gpu_ids() is called on the driver. (#17436) 2021-07-29 19:39:22 -07:00
Siyuan (Ryans) Zhuang
17c25345d0
[Workflow] Virtual actor writer - Part 2 (#17336)
* virtual actor writer

pass step_type around

simplify readonly actor

return different thing for a virtual actor

return state and output

WorkflowExecutionResult

simplify workflow execution

initial virtual actor writer

workflow_ref deeper integration

resume a step of a workflow

cache step output

Support dynamic workflow ref

* fix recovery tests

* fix

* fix get_output

* better error message

* pressure test

* fix

* verbose error message

* verbose error message

* fix get_cached_step issue

* update tests

* simplify readonly virtual actor

* fix storage tests

* workflow.resume returns state of an actor

* fix verbose

* fix comment

* make it more clear by renaming

* comment

* test init error in virtual actor

* update docs

* update docs

* update test_actor_manager/list_all

* fix comment
2021-07-29 19:29:28 -07:00
Amog Kamsetty
ff04a923ea
[SGD] v2 prototype: BackendExecutor and TorchBackend implementation (#17357)
* wip

* formatting

* increase timeouts

* wip

* address comments

* comments

* fix

* address comments

* Update python/ray/util/sgd/v2/worker_group.py

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* Update python/ray/util/sgd/v2/worker_group.py

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* address comments

* formatting

* fix

* wip

* finish

* fix

* formatting

* remove reporting

* split TorchBackend

* fix tests

* address comments

* add file

* more fixes

* remove default value

* update run method doc

* add comment

* minor doc fixes

* lint

* add args to BaseWorker.execute

* address comments

* remove extra parentheses

* properly instantiate backend

* fix some of the tests

* fix torch setup

* fix type hint

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
2021-07-29 14:38:44 -07:00
Kai Fricke
2ae6b944a2
[release tests] limit number of results fetched for alerting (#17430) 2021-07-29 18:43:44 +01:00
Tao Wang
411c49746d
Remove deprecated HEARTBEAT table (#17405)
* Remove deprecated HEARTBEAT table

* incr by 1
2021-07-29 10:14:59 -07:00
Jiao
3dc49c0b79
[serve] Add multi deployment to serve nightly tests (#17411)
Co-authored-by: Jiao Dong <jiaodong@anyscale.com>
2021-07-29 11:47:58 -05:00
Kai Fricke
44d209dd5f
[tune] re-enable tensorboardx without torch installed (#17403) 2021-07-29 10:39:38 +01:00
kimikuri
93172b535f
[doc][sgd] Broken Link in SGD's page. (#17404) (#17423) 2021-07-29 01:13:23 -07:00
xwjiang2010
93d12b1b5e
[CLI] Fix ray submit when --stop is supplied. (#17385)
* [cli] Fix `ray submit` when --stop is supplied.

* syntax sugar.
2021-07-29 00:01:25 -07:00
Eric Liang
7ed62ea0ad
Initial implementation of Dataset pipelining and docs (#17309) 2021-07-28 21:12:01 -07:00
SangBin Cho
65c0c3f3a4
[Test] Fix releaser bug (#17418)
* Fix a bug

* done
2021-07-28 18:15:00 -07:00
Eric Liang
9b4bcb3bc2
[hotfix] Fix merge conflict that caused test_dataset to failed. 2021-07-28 14:58:31 -07:00
matthewdeng
a2f23e5433
[docs] update docs with pip requirements (#17317) 2021-07-28 14:26:40 -07:00
Edward Oakes
7007c6271d
[runtime_env] Gracefully fail tasks when an environment fails to be set up (#17249) 2021-07-28 15:25:02 -05:00
Yi Cheng
72abf81900
[gcs] Fix GCS related issues: ByteSizeLong and redis connection (#17373) 2021-07-28 13:01:54 -07:00
kk-55
a7f8dc9d77
[RLlib] New and changed version of parametric actions cartpole example + small suggested update in policy_client.py (#15664) 2021-07-28 15:25:09 -04:00
Eric Liang
4ffa549041
Support schema on read for csv/json (#17354) 2021-07-28 10:59:52 -07:00
Julius Frost
d7a5ec1830
[RLlib] SAC tuple observation space fix (#17356) 2021-07-28 12:39:28 -04:00
Jiao
2618236167
[serve] Fix single deployment nightly test (#17368) 2021-07-28 11:38:06 -05:00
Simon Mo
db126b24b9
[Serve] Fix response_model for class based view routes as well (#17376) 2021-07-28 09:31:02 -07:00
amavilla
f2d9b1f2b9
[docs] Link broken in Tune's page (#17394) (#17407) 2021-07-28 09:27:54 -07:00
Sven Mika
0d8fce8fd8
[RLlib] Discussion 2294: Custom vector env example and fix. (#16083) 2021-07-28 10:40:04 -04:00
Rohan138
f30b444bac
[Rllib] set self._allow_unknown_config (#17335)
Co-authored-by: Sven Mika <sven@anyscale.io>
2021-07-28 11:48:41 +01:00
Antoni Baum
1f35470560
[autoscaler] GCP TPU VM autoscaler (#17278) 2021-07-27 21:24:29 -07:00
Sven Mika
58da5c1c9b
[RLlib] Discussion 3001: Fix comment on internal state shape (must be [B x S=state dim]). (#17341) 2021-07-27 21:41:53 -04:00
Amog Kamsetty
d01e1c15c8
[SGD] v2 prototype: `WorkerGroup` implementation (#17330)
* wip

* formatting

* increase timeouts

* address comments

* comments

* fix

* address comments

* Update python/ray/util/sgd/v2/worker_group.py

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* Update python/ray/util/sgd/v2/worker_group.py

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* address comments

* formatting

* fix

* avoid race condition

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-07-27 17:36:38 -07:00
Simon Mo
4a4210a083
Support streaming output of runtime env setup to logger/driver (#17306) 2021-07-27 16:39:15 -07:00
Edward Oakes
7225f28fff
[serve] Add Ray API stability annotations (#17295) 2021-07-27 16:00:15 -05:00
DK.Pino
2699b0f3ab
[Placement Group] Fix resource index assignment between with bundle index and without bundle index pg (#17318) 2021-07-27 13:51:02 -07:00
SangBin Cho
e1cd8580a0
[Test] Add various fixes to the nightly dashboard to improve signals (#17351)
* Add various fixes to the nightly dashboard to improve signals

* Fix issues
2021-07-27 12:37:11 -07:00
Alex Wu
5879e3132e
[Dataset] Support compressed files (#17355)
* .

* lint

* .

Co-authored-by: Alex Wu <alex@anyscale.com>
2021-07-27 12:35:16 -07:00
Sven Mika
90b21ce27e
[RLlib] De-flake 3 test cases; Fix config.simple_optimizer and SampleBatch.is_training warnings. (#17321) 2021-07-27 14:39:06 -04:00
Eric Liang
e70d84953e
[hotfix] Dataset tests accidentally disabled 2021-07-27 10:40:15 -07:00
Jiao
9eb1bcd061
[serve] Multi & single deployment large scale test (#17310) 2021-07-27 10:46:45 -05:00
Frank Luan
a6e8497dc9
[Dataset] Sort (#17142) 2021-07-27 01:53:53 -07:00
fyrestone
57b9b1bb0f
[Dashboard] Use a dedicated RPC to check the GCS is alive (#16330)
* Dashboard check gcs is alive

* Fix dashboard hangs at exit

* ray health-check call GCS CheckAlive

* Minor fixes

Co-authored-by: 刘宝 <po.lb@antfin.com>
2021-07-27 14:05:44 +08:00
Richard Liaw
597dc08dfe
Revert "Revert "[core] remove opencensus/prometheus_exporter dependencies"" (#17254)
* Revert "Revert "[core] remove opencensus/prometheus_exporter dependencies" (#17251)"

This reverts commit 7b44dd8ecb.

* Lint

* Fix more imports

Co-authored-by: Kai Fricke <kai@anyscale.com>
2021-07-26 21:09:25 -07:00