Commit graph

8549 commits

Author SHA1 Message Date
Amog Kamsetty
36fd741e6f
Revert "Enable TryCreateImmediately to use the fallback allocation" (#16542)
This reverts commit 41cf2e3d50.
2021-06-18 12:22:18 -07:00
Amog Kamsetty
bd3cbfc56a
Revert "[RLlib] Allow policies to be added/deleted on the fly. (#16359)" (#16543)
This reverts commit e78ec370a9.
2021-06-18 12:21:49 -07:00
architkulkarni
54d66ac637
[Core] iterate over entire dispatch queue instead of returning when worker unavailable (#16535) 2021-06-18 13:25:45 -05:00
Frank Luan
7588938e3c
Sorting benchmark (#16327)
* [WIP] Sorting benchmark

* Separate num_mappers and num_reducers

* Add tests

* Fix tests

* flake8

* flake8

* yapf

* Skip test on Windows

* Fix OS X test failure; test Windows again

* oops
2021-06-18 10:54:18 -07:00
Eric Liang
41cf2e3d50
Enable TryCreateImmediately to use the fallback allocation 2021-06-18 10:49:34 -07:00
Simon Mo
38b5fe7e51
[Buildkite] Add rest of the Python tests (#16517) 2021-06-18 11:18:05 -05:00
Sven Mika
2900a06dd7
[RLlib] Issue 14503: SAC not allowing custom action distributions. (#16427) 2021-06-18 17:27:29 +02:00
architkulkarni
6498ca3995
[Core] [runtime env] Don't delete working_dir from runtime env (#16475) 2021-06-18 10:15:20 -05:00
Chris K. W
a2c842ee3c
[Client] Add separate error message if dataclient has disconnected before a request is sent (#16508)
* Add earlier error message

* Adjust error message
2021-06-18 08:06:25 -07:00
Kai Fricke
172d33be02
[tune] Use unbuffered training when checkpoint_at_end is used. (#16504) 2021-06-18 14:19:14 +01:00
Kai Fricke
aecc4c8d28
[release] fix sgd base image, microbenchmark timeout, revert xgboost train_small to not use connect (#16532) 2021-06-18 11:40:04 +01:00
Kai Fricke
e13f0a4d91
[tune] Add option to keep random values constant over grid search (#16501) 2021-06-18 11:30:27 +01:00
Chris K. W
c91a1b1f92
[Client] Add warnings when user schedules many tasks with ray client (#16454)
* Add warnings when user schedules many tasks with ray client

* add test_client_warnings to BUILD

* better variable names

* use util.debug.log_once()

* batching -> explanation of batching

* Switch to warnings.warn

* Add links to Ray Design Pattern doc with code snippets

* Cleaner linking and refer to sections directly

* Better testNoWarning

* add sys.exit(pytest.main(...))

* Update python/ray/util/client/worker.py

Co-authored-by: Ameer Haj Ali <ameerh@berkeley.edu>

* Update python/ray/util/client/worker.py

Co-authored-by: Ameer Haj Ali <ameerh@berkeley.edu>

* better error messages

* Switch links to new readthedocs sections

* Revert "Switch links to new readthedocs sections"

This reverts commit d3785bf50459d89fb3f13966a030e954799309a8.

Co-authored-by: Ameer Haj Ali <ameerh@berkeley.edu>
2021-06-18 13:17:37 +03:00
Sven Mika
e78ec370a9
[RLlib] Allow policies to be added/deleted on the fly. (#16359) 2021-06-18 10:31:30 +02:00
Antoni Baum
d71ec6e874
[docs] Add examples of new features to contribute (#16477) 2021-06-18 00:07:03 -07:00
Stephanie Wang
5eb51c8b26
[core] Make object directory robust to out-of-order updates (#16314)
* Sequence ops

* id

* fix

* lint
2021-06-17 20:40:35 -07:00
Hao Zhang
f47a0e1f27
[Collective] generate ray.util.collective doc (#16521) 2021-06-17 18:41:57 -07:00
architkulkarni
76d602363b
update URL for boost 1.71.0 in bazel setup (#15991) 2021-06-17 12:34:35 -07:00
Kai Fricke
5352b786b3
[docs] Add ray desig patterns and antipatterns to docs (#16478) 2021-06-17 19:56:44 +01:00
Richard Liaw
ed093cebb0
Revert "[docs] readthedocs.yaml and remove requirements-rtd.txt (#16482)" (#16512)
This reverts commit c9537be5c1.
2021-06-17 11:50:30 -07:00
Alex Wu
6696c0c165
Revert "[Placement Group] Support infeasible placement groups for Placement Group. (#16188)" (#16509)
This reverts commit 7f91cfedd5.
2021-06-17 11:04:01 -07:00
Richard Liaw
c9537be5c1
[docs] readthedocs.yaml and remove requirements-rtd.txt (#16482) 2021-06-17 09:40:06 -07:00
architkulkarni
8d9a41af55
[Core] [runtime env] Merge actor/task's runtime env with JobConfig's runtime env (#16378) 2021-06-17 11:20:32 -05:00
Antoni Baum
f8e9f171df
[tune] Add add_evaluated_point method (#16485) 2021-06-17 11:30:48 +01:00
Michael Galarnyk
524517a14f
[docs][minor] Update ray-libraries.rst (#16387)
Alphabetical order is a good thing
2021-06-17 01:37:02 -07:00
Kai Fricke
e547a27944
[tune] Track live trials in a set in the TrialRunner to reduce linear scans (#15811) 2021-06-17 01:36:07 -07:00
Abhishek Malvankar
85bc1b2979
[docs] ray LSF integration (#16438) 2021-06-17 01:35:55 -07:00
Alex Wu
1df19a04fe
Job timestamp should always be in milliseconds (#16455)
* .

* .

* .

* .

Co-authored-by: Alex Wu <alex@anyscale.com>
2021-06-17 00:05:55 -07:00
Tao Wang
2523072a3d
[large scale]Use gcs client instead of redis client to increase job id (#16190)
Co-authored-by: Alex Wu <itswu.alex@gmail.com>
2021-06-17 15:01:32 +08:00
SangBin Cho
6dc4032d19
Set the 500GB block device for a single node test (#16493) 2021-06-16 22:37:30 -07:00
DK.Pino
7f91cfedd5
[Placement Group] Support infeasible placement groups for Placement Group. (#16188)
* init

* update comment

* update logical

* ut failing

* compile passing

* add ut

* lint

* fix comment

* lint

* fix ut and typo

* fix ut and typo

* lint

* typo
2021-06-16 21:48:39 -07:00
Alex Wu
45357ff590
[core] Fix multi-node placement group/job config bugs (#16345)
* .

* .

* seems to work?

* seems to work?

* .

* implement delete

* implement delete

* .

* tests

* .

* .

* .

* fix

* .

* .

* .

* .

* fix

* fix

* bump timeout

* bump timeout

* .

Co-authored-by: Alex Wu <alex@anyscale.com>
2021-06-16 21:12:20 -07:00
Dmitri Gekhtman
74bd332d88
Sidestep temp permissions issue when writing cluster address (#16473)
* not-enough

* comments

* tweak

* fix

* add-test-why-not

* fix
2021-06-16 21:11:08 -07:00
Maxim Egorushkin
742da5e68b
[tune] get_checkpoints_paths .tune_metadata file search bug fixed (#16396)
Co-authored-by: Maxim Egorushkin <maxim.egorushkin@gmail.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-06-16 20:48:41 -07:00
Eric Liang
3209084213
Fix fd reuse errors with plasma fallback allocation (#16451) 2021-06-16 19:28:23 -07:00
Amog Kamsetty
b986938f0f
Revert "[Pubsub] Use a pubsub module for Ownership based object directory (#16407)" (#16486)
This reverts commit 90599d3562.
2021-06-16 15:38:11 -07:00
Tao Wang
1a1b0da8c9
Run fn in specified io service completely (#15539) 2021-06-16 14:53:17 -07:00
Kai Fricke
9352cb781c
[release tests] Fix microbenchmark base image, network overhead cluster wait time, add long running tests (#16355) 2021-06-16 21:37:17 +01:00
Jiao
c6436ba7d6
[Serve] Add ray serve's logging context manager (#16468)
* Add ray serve's logging context manager

* Add ray serve's logging context manager

run formatting script scripts/format.sh

* fix missing package-lock json file

* linter

Co-authored-by: Jiao Dong <jiaodong@anyscale.com>
2021-06-16 13:17:07 -07:00
Clark Zinzow
00eb833de2
[Core] Stopgap fix for async actor lost object bug, and adds reproduction as test. (#16414)
* Support asyncio with max_concurrency == 1.

* Added test that reproduces lost object error.

* Create a fiber thread per caller instead of sharing a fiber thread among all callers.

* Formatting.

* Remove debug print statement.

* Try to accomodate dumb stupid linter that apparently doesn't know that async list comprehensions landed in Python 3.6, let alone await in list literals.
2021-06-16 12:39:45 -07:00
SangBin Cho
5997d19a5a
[Test] Global gc unit test flakniess fix (#16471) 2021-06-16 09:26:04 -07:00
SangBin Cho
90599d3562
[Pubsub] Use a pubsub module for Ownership based object directory (#16407)
* in progress

* In progress 2

* progress

* OBOD pubsub done

* Fix

* Fix a bug.

* Clean up getObjectLocationOwner

* Fix a build issue.

* Lint issue.

* test fix in progress

* continue debugging

* in progress

* Fix issues again.

* Formatting

* formating

* fix issues.

* Revert "fix issues."

This reverts commit 2da577e68abc6278e03d64a60e8b96c3136145bf.

* Fix a critical bug.

* Revert "Revert "fix issues.""

This reverts commit 6546ecbd1eb9798de0bf990b30b85a3ca3e5b4ad.

* Addressed code review.
2021-06-16 09:15:13 -07:00
mwtian
2f7d535253
[Test] Use Ray client in XGBoost train_small release test (#16319) 2021-06-16 14:39:32 +01:00
qicosmos
0f87eca3e9
[C++ Worker]Generate a template project for users (#16337) 2021-06-16 17:45:45 +08:00
Ian Rodney
90805d302f
[Client] Fix ArgParse (#16456)
Co-authored-by: Ian Rodney <ilr@anyscale.com>
2021-06-15 23:52:02 -07:00
Antoni Baum
ec7d7c8630
[Tune] Add soft imports test (#16450) 2021-06-15 18:50:21 -07:00
Eric Liang
5967cd3cf3
Make placement_group=None work as expected. (#16437)
* update

* add task test

* fix
2021-06-15 18:30:53 -07:00
Antoni Baum
2fb10e6730
[SGD] Add support for native Torch AMP in SGD (#16382)
* SGD native AMP initial commit

* SGD native amp second pass

* Update docs

* Update TorchTrainer doc

* Temp fix release test

* Update release/sgd_tests/sgd_gpu/sgd_gpu_app_config.yaml

Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
2021-06-15 17:48:21 -07:00
Amog Kamsetty
ca22df2367
[Dask] Re-enable scheduler on dask_shuffle example (#16405) 2021-06-15 17:47:57 -07:00
Amog Kamsetty
d23494d25a
[CI] Move test_shuffle to Medium tests (#16447)
* move

* unskip test
2021-06-15 17:45:54 -07:00