Commit graph

4401 commits

Author SHA1 Message Date
Kai Yang
81be461ba2
[Core] Limit starting workers with maximum_startup_concurrency per worker type (#16214) 2021-06-09 13:11:53 +08:00
Dmitri Gekhtman
41b2e569fb
[autoscaler] Don't rsync cluster state with local node provider (#16281) 2021-06-08 12:27:06 -07:00
Sven Mika
4b8dadccbd
[RLlib] Fix PR 16162: Having added sleep to _NextValueNotReady causes TD3 tests to become flakey. (#16309) 2021-06-08 07:27:02 -07:00
Chris K. W
c8e3ed9eec
[core] Use function_actor_manager.lock when deserializing (#16278)
* use function_actor_manager.lock when deserializing

* add comment and todo

* better comment

* fix comment
2021-06-08 00:13:42 -07:00
Alex Wu
6f5064b7ef
Use pytest not unittest (#16265)
* .

* done

* done

* .

* .

Co-authored-by: Alex Wu <alex@anyscale.com>
2021-06-07 12:26:56 -07:00
Alex Wu
9f8f108e3f
[deflek] Split test failure into test failure 4 (#16264)
* .

* .

Co-authored-by: Alex Wu <alex@anyscale.com>
2021-06-07 11:54:55 -07:00
Edward Oakes
418dd1e8b9
fix serve start namespace issue and add test (#16291) 2021-06-07 11:30:31 -07:00
Siyuan (Ryans) Zhuang
480e5e822e
Inital workflow API implementation (#16174) 2021-06-07 10:00:15 -07:00
architkulkarni
b88163f010
[Core] [runtime env] Fix injection of ray[default] (#16275) 2021-06-05 17:32:50 -05:00
architkulkarni
b3a0b97737
Revert "[Core] [runtime env] Inject ray[default] into pip dependencies (#16268)" (#16274)
This reverts commit e5fad4bc2d.
2021-06-05 21:26:19 +03:00
Eric Liang
ca861ee47f
update (#16270) 2021-06-05 11:16:01 -07:00
Dmitri Gekhtman
7d1e7a0d4f
[autoscaler] Fix local node provider (#16202)
* Don't override resources for local node provider.

* Wip

* Local node provider prep logic

* ../python/ray/autoscaler/local/defaults.yaml

* wip

* Fix example-full

* defaults comment

* wip

* head type max workers

* sync-state

* No docker

* Fix

* external head ip option

* wip

* move external_ip out of tags

* Update examples

* Update comment

* Skip local defaults

* Config test

* Test external ip

* Change ray start commands to what they were before

* missing yamls

* Fix test

* Remove scary Docker

* Fixes

* Extra test

* address comments

* fixes pre-single-node-type-attempy

* rewrite comment a bit

* One type

* fix

* get rid of pdb

* no placeholders

* fix

* worker nodes and head node optional during launch

* fix

* fix again

* config comment fixes

* mock -> aws, not local

* Update python/ray/autoscaler/_private/local/config.py

Co-authored-by: Ian Rodney <ian.rodney@gmail.com>

* second pop fixed

* Explanatory comments for config logic

* deprecation comments

* Update python/ray/autoscaler/_private/local/config.py

Co-authored-by: Ameer Haj Ali <ameerh@berkeley.edu>

* update test

* fix

* More descriptive name for local provider check

* Remove external-ip from example minimal and add a more detailed doc string.

* Make clearer the equivalence between a ray restart and non-empty ray-start commands

* extra comment

* Update python/ray/autoscaler/_private/local/node_provider.py

* Update python/ray/autoscaler/_private/commands.py

* Update python/ray/autoscaler/_private/commands.py

* Update python/ray/autoscaler/_private/util.py

* lint

* Update python/ray/autoscaler/_private/local/node_provider.py

Co-authored-by: Ameer Haj Ali <ameerh@berkeley.edu>

Co-authored-by: Ian Rodney <ian.rodney@gmail.com>
Co-authored-by: Ameer Haj Ali <ameerh@berkeley.edu>
2021-06-05 19:29:19 +03:00
Chris K. W
2e11ac678f
[autoscaler] Additional Autoscaler Metrics (#16198) 2021-06-04 23:19:17 -07:00
architkulkarni
e5fad4bc2d
[Core] [runtime env] Inject ray[default] into pip dependencies (#16268)
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
2021-06-05 00:22:33 -05:00
architkulkarni
6c99267972
[Core] [runtime_env] add get_release_wheel_url() (#16267) 2021-06-04 22:00:17 -05:00
Eric Liang
472fe46a75
skip on win (#16256) 2021-06-04 15:36:55 -07:00
Ian Rodney
ba14b1c538
skip test_delay_in_rewriting_environment (#16255) 2021-06-04 09:50:02 -07:00
Ian Rodney
799af7d7c0
[client] Better Error Messages (#16163) 2021-06-04 00:32:21 -07:00
Yi Cheng
dea178caac
[core] Convert the log from exception to warning for setup function (#16225) 2021-06-03 23:53:29 -05:00
architkulkarni
6be5ec8f39
[Core] [runtime env] Fix test_get_master_wheel_url (#16234) 2021-06-03 23:09:43 -05:00
architkulkarni
2d26c6ea5d
[Core] suppress unpickle error for function broadcasting (#16223) 2021-06-03 22:59:24 -05:00
architkulkarni
0dcd996475
[core] [runtime_env] Remove opentelemetry from pip dependencies in test (#16233) 2021-06-03 22:18:47 -05:00
architkulkarni
4efbd9680e
[Core] [runtime env] use ray instead of ray[all] when adding ray dependency (#16229) 2021-06-03 21:29:49 -05:00
Eric Liang
608991999c
Fix release resources race that leads to extra worker launches (#16184) 2021-06-03 18:35:45 -07:00
Eric Liang
a9db4e62cb
Unlimited plasma allocations by falling back to a filesystem allocator (off by default) (#16097) 2021-06-03 18:35:09 -07:00
Simon Mo
b376fd3458
[Serve] Create default namespace when ray is not initialized (#16227) 2021-06-03 17:14:52 -07:00
Ian Rodney
acfb484f5c
[Tracing] Delay Opentelemetry Import to Avoid Pickling (#16177) 2021-06-03 12:07:12 -07:00
Ian Rodney
22bd7cebeb
[Client][Proxy] Prevent Logstream from Timing Out when Delays in DataClient (#16180) 2021-06-03 11:59:52 -07:00
Ian Rodney
206802b96f
[client] Fix ClientBuilder for Local Clusters (#16204)
* Fix client builder

* Make tests actually run in CI (required marking a few Windows tests as flaky)
2021-06-03 08:27:37 -07:00
mwtian
51da90aa09
Rollforward #f14f197 (#16201) 2021-06-03 07:17:58 -07:00
Eric Liang
43c97c2afb
Disable timeline events collection in Ray by default (#15989) 2021-06-02 18:04:29 -07:00
architkulkarni
46acac03ae
[Core] [runtime env] Fix bug where None is added to pip dependencies when built from source (#16173) 2021-06-02 18:12:59 -05:00
Ian Rodney
513196d71b
[Client] Make dir() work for ClientActorHandle (#16157) 2021-06-02 14:51:17 -07:00
Ian Rodney
ca3737c6aa
[Client][Test] Make "test_client_references.py" medium (#16193) 2021-06-02 13:13:31 -07:00
Alex Wu
9942505b63
Revert "[Client] Make Client{ObjectRef,ActorRef} subclasses of their server-side counterparts (#16110)" (#16196)
This reverts commit f14f197d42.
2021-06-02 10:31:01 -07:00
SangBin Cho
611da62739
Fix atof bug (#16140) 2021-06-02 10:25:25 -07:00
Stephanie Wang
ce25d4e896
[core] Record Plasma object sources and dump on out of memory (#16179)
* debug

* lint, build

* clean up logs

* fix build
2021-06-02 10:04:15 -07:00
Ian Rodney
4116c8c3f1
[ClientBuilder] Verify Module has ClientBuilder Class (#16076) 2021-06-02 09:19:44 -07:00
Ian Rodney
2e365f8797
[ClientBuilder] Code takes precedence over environment (#16112)
* no override address

* correct ordering
2021-06-02 13:10:15 +03:00
mwtian
f14f197d42
[Client] Make Client{ObjectRef,ActorRef} subclasses of their server-side counterparts (#16110)
* Implement ClientObjectRef and ClientActorID in cython

* fix doc

* Remove unnecessary declaration.
Add basic unit tests.

* Fix quotes.

* Skip tests on Windows
2021-06-01 23:45:41 +03:00
Amog Kamsetty
65f1d67e9c
[SGD] Ray Client Support and tests (#16111)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-06-01 13:21:26 -07:00
Travis Addair
050a076de9
[k8s] Refactored k8s operator to use kopf for controller logic (#15787)
Co-authored-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
2021-06-01 12:00:55 -07:00
Kai Fricke
153a8b8fec
[release] convert tune release tests (#15913) 2021-06-01 11:19:15 -07:00
matthewdeng
7637654557
[tune] populate internal configs when creating Trainable through DistributedTrainableCreator (#16128)
* [tune] populate internal configs when creating Trainable through DistributedTrainableCreator

* create DistributedTrainable class

* Fix tests and docs

* fix formatting

* Update python/ray/tune/trainable.py

* make call to DistributedTrainable explicit

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-06-01 09:59:37 -07:00
Amog Kamsetty
04863d158a
[Tune] MLflow with Ray Client (#16029)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-06-01 09:50:44 -07:00
Dmitri Gekhtman
27c2f570f1
[kubernetes] pin the K8s config yamls to ray:latest instead of ray1.3 (#15988) 2021-06-01 19:12:35 +03:00
Chris Bamford
1e3721ef4a
[RLlib] Remove bad spinlocks to allow pytorch GPU scheduler to interrupt. (#16162) 2021-06-01 16:40:28 +02:00
Alex Wu
de0f856b68
[namespaces] Isolation for named placement groups (#16000) 2021-06-01 05:50:19 -07:00
SangBin Cho
bfa8ebcae9
[Test] Fix flaky global gc test (#16154)
* fast global gc to fix flaky test

* lint
2021-06-01 00:17:03 -07:00
Chris K. W
31364ed9b4
[autoscaler] Autoscaler metrics (#16066)
Co-authored-by: Ian <ian.rodney@gmail.com>
2021-05-31 22:27:45 -07:00