Commit graph

6846 commits

Author SHA1 Message Date
Hao Zhang
7e52351ae5
[Collective] Some necessary abstraction of collective calls before introducing stream management (#13162) 2021-01-05 16:20:12 -08:00
Basu Jindal
4e569ee20b
Update multi_agent_independent_learning.py (#13196)
pettingzoo.utils.error.DeprecatedEnv: waterworld_v0 is now depreciated, use waterworld_v2 instead
2021-01-05 13:44:54 -08:00
Edward Oakes
dc101fd087
[serve] Move controller state into separate files (#13204) 2021-01-05 14:37:16 -06:00
Edward Oakes
d738610dc9
Disable atexit test on windows (#13207) 2021-01-05 14:33:51 -06:00
Kai Fricke
96c2d3d2b5
[tune] better signature check for tune.sample_from (#13171)
* [tune] better signature check for `tune.sample_from`

* Update python/ray/tune/sample.py

Co-authored-by: Sumanth Ratna <sumanthratna@gmail.com>

Co-authored-by: Sumanth Ratna <sumanthratna@gmail.com>
2021-01-05 08:04:18 -08:00
Edward Oakes
e8162f1b1f
[serve] Merge ActorReconciler and BackendState (#13139) 2021-01-05 09:56:22 -06:00
Hao Zhang
4150970226
[Collective][PR 2/6] Driver program declarative interfaces (#12874)
* scaffold of the code

* some scratch and options change

* NCCL mostly done, supporting API#1

* interface 2.1 2.2 scratch

* put code into ray and fix some importing issues

* add an addtional Rendezvous class to safely meet at named actor

* fix some small bugs in nccl_util

* some small fix

* scaffold of the code

* some scratch and options change

* NCCL mostly done, supporting API#1

* interface 2.1 2.2 scratch

* put code into ray and fix some importing issues

* add an addtional Rendezvous class to safely meet at named actor

* fix some small bugs in nccl_util

* some small fix

* add a Backend class to make Backend string more robust

* add several useful APIs

* add some tests

* added allreduce test

* fix typos

* fix several bugs found via unittests

* fix and update torch test

* changed back actor

* rearange a bit before importing distributed test

* add distributed test

* remove scratch code

* auto-linting

* linting 2

* linting 2

* linting 3

* linting 4

* linting 5

* linting 6

* 2.1 2.2

* fix small bugs

* minor updates

* linting again

* auto linting

* linting 2

* final linting

* Update python/ray/util/collective_utils.py

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* Update python/ray/util/collective_utils.py

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* Update python/ray/util/collective_utils.py

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* added actor test

* lint

* remove local sh

* address most of richard's comments

* minor update

* remove the actor.option() interface to avoid changes in ray core

* minor updates

Co-authored-by: YLJALDC <dal177@ucsd.edu>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-01-04 20:57:37 -08:00
Tao Wang
c617291b27
[build]Update description and add some keywords (#13163) 2021-01-05 11:34:03 +08:00
Tao Wang
a0bbf2bfc2
Notify listeners after registered node stored (#13069) 2021-01-05 11:18:03 +08:00
fangfengbin
88eaa87e3a
Remove unused file(object_manager_integration_test.cc) (#12989) 2021-01-05 11:09:36 +08:00
Barak Michener
9643e44af6
[ray_client]: Move from experimental to util (#13176)
Change-Id: I9f054881f0429092d265cd6944d89804cce9d946
2021-01-04 17:51:56 -08:00
Eric Liang
dfb326d4b5
Surface object store spilling statistics in ray memory (#13124) 2021-01-04 17:35:39 -08:00
Stephanie Wang
b765914a1b
Revert "Enabling the cancellation of non-actor tasks in a worker's queue (#12117)" (#13178)
This reverts commit b4d688b4a6.
2021-01-04 17:27:48 -08:00
Amog Kamsetty
e181515dff
[SGD] Fix Docstring for as_trainable (#13173) 2021-01-04 17:21:24 -08:00
Amog Kamsetty
15e86581bd
[XGboost] Update Documentation (#13017)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-01-04 17:21:04 -08:00
Siyuan (Ryans) Zhuang
46cf433f0e
[Core] Remove Arrow dependencies (#13157)
* remove arrow ubsan

* remove arrow build depend

* remove arrow buffer
2021-01-04 11:19:09 -08:00
Max Fitton
d018212db5
[Release] Update Release Process Documentation (#13123) 2021-01-04 11:09:43 -08:00
Raed Shabbir
d632b0f0f7
[Serve] Bug in Serve node memory-related resources calculation #11198 (#13061) 2021-01-04 11:04:59 -08:00
Gabriele Oliaro
b4d688b4a6
Enabling the cancellation of non-actor tasks in a worker's queue (#12117)
* wrote code to enable cancellation of queued non-actor tasks

* minor changes

* bug fixes

* added comments

* rev1

* linting

* making ActorSchedulingQueue::CancelTaskIfFound raise a fatal error

* bug fix

* added two unit tests

* linting

* iterating through pending_normal_tasks starting from end

* fixup! iterating through pending_normal_tasks starting from end

* fixup! fixup! iterating through pending_normal_tasks starting from end

* post merge fixes

* added debugging instructions, pulled Accept() out of guarded loop

* removed debugging instructions, linting
2021-01-04 09:52:29 -08:00
Clark Zinzow
c2bff64699
[Core] Locality-aware leasing: Milestone 1 - Owned refs, pinned location (#12817)
* Locality-aware leasing for owned refs (pinned locations).

* LessorPicker --> LeasePolicy.

* Consolidate GetBestNodeIdForTask and GetBestNodeIdForObjects.

* Update comments.

* Turn on locality-aware leasing feature flag by default.

* Move local fallback logic to LeasePolicy, move feature flag check to CoreWorker constructor, add local-only lease policy.

* Add lease policy consulting assertions to the direct task submitter tests.

* Add lease policy tests.

* LocalityLeasePolicy --> LocalityAwareLeasePolicy.

* Add missing const declarations.

Co-authored-by: SangBin Cho <rkooo567@gmail.com>

* Add RAY_CHECK for raylet address nullptr when creating lease client.

* Make the fact that LocalLeasePolicy always returns the local node more explicit.

* Flatten GetLocalityData conditionals to make it more readable.

* Add ReferenceCounter::GetLocalityData() unit test.

* Add data-intensive microbenchmarks for single-node perf testing.

* Add data-intensive microbenchmarks for simulated cluster perf testing.

* Remove redundant comment.

* Remove data-intensive benchmarks.

* Add locality-aware leasing Python test.

* Formatting changes in ray_perf.py.

Co-authored-by: SangBin Cho <rkooo567@gmail.com>
2021-01-04 09:49:08 -08:00
Dmitri Gekhtman
31453621ef
[kubernetes][docs][minor] Kubernetes version warning (#13161) 2021-01-04 10:29:17 -06:00
architkulkarni
a95275bdd9
[Serve] [Doc] Add existing web server integration ServeHandle tutorial (#13127) 2021-01-04 10:28:34 -06:00
Ameer Haj Ali
61c3b6d3bf
[docs] Small fix in C++ documentation. (#13154)
* prepare for head node

* move command runner interface outside _private

* remove space

* Eric

* flake

* min_workers in multi node type

* fixing edge cases

* eric not idle

* fix target_workers to consider min_workers of node types

* idle timeout

* minor

* minor fix

* test

* lint

* eric v2

* eric 3

* min_workers constraint before bin packing

* Update resource_demand_scheduler.py

* Revert "Update resource_demand_scheduler.py"

This reverts commit 818a63a2c86d8437b3ef21c5035d701c1d1127b5.

* reducing diff

* make get_nodes_to_launch return a dict

* merge

* weird merge fix

* auto fill instance types for AWS

* Alex/Eric

* Update doc/source/cluster/autoscaling.rst

* merge autofill and input from user

* logger.exception

* make the yaml use the default autofill

* docs Eric

* remove test_autoscaler_yaml from windows tests

* lets try changing the test a bit

* return test

* lets see

* edward

* Limit max launch concurrency

* commenting frac TODO

* move to resource demand scheduler

* use STATUS UP TO DATE

* Eric

* make logger of gc freed refs debug instead of info

* add cluster name to docker mount prefix directory

* grrR

* fix tests

* moving docker directory to sdk

* move the import to prevent circular dependency

* smallf fix

* ian

* fix max launch concurrency bug to assume failing nodes as pending and consider only load_metric's connected nodes as running

* small fix

* deflake test_joblib

* lint

* placement groups bypass

* remove space

* Eric

* first ocmmit

* lint

* exmaple

* documentation

* hmm

* file path fix

* fix test

* some format issue in docs

* modified docs

Co-authored-by: Ameer Haj Ali <ameerhajali@ameers-mbp.lan>
Co-authored-by: Alex Wu <alex@anyscale.io>
Co-authored-by: Alex Wu <itswu.alex@gmail.com>
Co-authored-by: Eric Liang <ekhliang@gmail.com>
Co-authored-by: Ameer Haj Ali <ameerhajali@Ameers-MacBook-Pro.local>
Co-authored-by: root <root@ip-172-31-56-188.us-west-2.compute.internal>
2021-01-02 11:47:06 -08:00
fangfengbin
456d08ad40
Deprecate setResource java api (#13117) 2021-01-02 12:17:45 +08:00
Ameer Haj Ali
27cbac576d
[docs] Minor change to formating C++ docs. (#13151) 2021-01-01 19:43:59 -08:00
Qing Wang
d3dd5b87ce
[Java] Support wasCurrentActorRestarted in actor task. (#13120)
* Remove check.

* Add test

* fix lint

* lint

* Fix spotless lint

* Address comments.

* Fix lint

Co-authored-by: Qing Wang <jovany.wq@antgroup.com>
2021-01-02 11:31:08 +08:00
Ameer Haj Ali
710615c228
[docs] Documentation + example for the C++ language API (#13138) 2021-01-01 18:18:41 -08:00
Sven Mika
9eba1871bb
[RLlib] Support easy use_attention=True flag for using the GTrXL model. (#11698) 2021-01-01 14:06:23 -05:00
Dmitri Gekhtman
4ca64549e2
[docs][kubernetes][minor] Update K8s examples in doce (#13129) 2020-12-31 16:25:38 -06:00
Simon Mo
fece8db70d
[Serve] Use a small object to track requests (#13125) 2020-12-31 11:43:03 -08:00
Edward Oakes
ef6d859e9b
[dashboard] Fix RAY_RAYLET_PID KeyError on Windows (#12948) 2020-12-31 10:54:40 -06:00
Ian Rodney
acb082fc47
[serve] Async controller (#13111) 2020-12-31 10:51:33 -06:00
Amog Kamsetty
7120f3a6ab
[Tune] Update URL to fix 403 not found error in PBT tranformers test case (#13131) 2020-12-31 10:45:57 -05:00
Qing Wang
f5412c0417
[Java] Avoid failure of serializing a user-defined unserializable exception. (#13119) 2020-12-31 19:47:35 +08:00
Sven Mika
8726521604
[RLlib] JAXPolicy prep PR #2 (move get_activation_fn (backward-compatibly), minor fixes and preparations). (#13091) 2020-12-30 22:30:52 -05:00
fyrestone
6a54897577
Job module without submission (#13081)
Co-authored-by: 刘宝 <po.lb@antfin.com>
2020-12-31 11:12:17 +08:00
Sven Mika
391cdfae8c
[RLlib] Trajectory view API docs. (#12718) 2020-12-30 17:32:21 -08:00
Sven Mika
28ac4243f4
[RLlib] Deflake test case: 2-step game MADDPG. (#13121) 2020-12-30 18:37:37 -05:00
Max Fitton
25f7bdc0d8
[Bugfix][Dashboard] Fix undefined logCount, errorCount UI crash (#13113) 2020-12-30 14:19:56 -06:00
Michael Luo
42cd414e5b
[RLlib] New Offline RL Algorithm: CQL (based on SAC) (#13118) 2020-12-30 10:11:57 -05:00
chaokunyang
33089c44e2
Fix streaming ci failure (#12830) 2020-12-30 10:45:52 +08:00
Sumanth Ratna
59e9b80903
[Doc] Fix Sphinx.add_stylesheet deprecation (#13067) 2020-12-29 16:35:40 -08:00
Michael Luo
eae7a1f433
[RLLib] Readme.md Documentation for Almost All Algorithms in rllib/agents (#13035) 2020-12-29 18:45:55 -05:00
Sven Mika
d811d65920
[RLlib] run_regression_tests.py: --framework flag (instead of --torch). (#13097) 2020-12-29 15:27:59 -05:00
architkulkarni
032a6546d5
Serve metrics docs (#13096) 2020-12-29 14:03:34 -06:00
Ameer Haj Ali
44483f465c
[autoscaler] Make placement groups bypass max launch limit (#13089) 2020-12-29 10:06:11 -08:00
Eric Liang
5a4e50c9d9
Disable broken streaming tests (#13095) 2020-12-29 00:58:18 -08:00
Ian Rodney
7ad56826db
[docker] Fix restart behavior with Docker (#12898)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: ijrsvt <ilr@anyscale.com>
2020-12-28 18:56:28 -08:00
chaokunyang
d1dd3410c8
[Java] Format ray java code (#13056) 2020-12-29 10:36:16 +08:00
architkulkarni
cc1c2c3dc9
[Serve] Use ServeHandle in HTTP proxy (#12523) 2020-12-28 18:33:42 -08:00