Commit graph

4902 commits

Author SHA1 Message Date
xwjiang2010
5d68657246
[Tune] Sanitize trial checkpoint filename. (#17985) 2021-08-24 10:08:36 -07:00
Chen Shen
3a04cb0d73
fix windows test failures (#18022) 2021-08-24 09:28:51 -07:00
Antoni Baum
1f8ce1ede8
[tune] Explicitly instantiate skopt categorical spaces (#18005) 2021-08-24 17:11:21 +02:00
Qing Wang
7c1f14ddd8
Do not connect in constructor to avoid potential risk. (#17916)
* Do not connect in ctor.

* Fix lint.

Co-authored-by: Qing Wang <jovany.wq@antgroup.com>
2021-08-24 16:41:30 +08:00
dependabot[bot]
15adedc72c
[tune](deps): Bump sigopt in /python/requirements/tune (#17996)
Bumps [sigopt](https://sigopt.com/) from 7.4.0 to 7.5.0.

---
updated-dependencies:
- dependency-name: sigopt
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-08-23 15:18:59 -07:00
dependabot[bot]
f97a292867
[tune](deps): Bump dask[complete] in /python/requirements/tune (#17997)
Bumps [dask[complete]](https://github.com/dask/dask) from 2021.06.1 to 2021.8.1.
- [Release notes](https://github.com/dask/dask/releases)
- [Changelog](https://github.com/dask/dask/blob/main/docs/release-procedure.md)
- [Commits](https://github.com/dask/dask/compare/2021.06.1...2021.08.1)

---
updated-dependencies:
- dependency-name: dask[complete]
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-08-23 15:18:37 -07:00
Yi Cheng
5849f80e41
[Core] Fix typo of actor repr (#18011) 2021-08-23 14:33:51 -07:00
Edward Oakes
1a61082ed4
[serve] Remove deprecated endpoints code (#17989) 2021-08-23 13:53:09 -07:00
Amog Kamsetty
4c384df526
fix wheel links (#17973) 2021-08-23 13:43:34 -07:00
chenk008
b9978dd02b
[Core] revert: revert Unified worker starter (#18008) 2021-08-23 13:34:32 -07:00
Yi Cheng
fd71bde9b4
[client] Allow multiple client connections from one driver (#17942) 2021-08-23 13:01:58 -07:00
Dmitri Gekhtman
13d5d0f9ef
[autoscaler][hotfix] Update node list after terminating unhealthy nodes (#17992)
* Update nodes; update test.

* consistency

* lint
2021-08-22 18:22:10 -04:00
Clark Zinzow
5ca28b1cc8
[Core] Update Bazel (to 3.4.1), gRPC, boringssl, and absl as a precursor to gRPC streaming PR. (#17903)
* Update Bazel (to 3.4.1), gRPC, boringssl, absl.

* Always reinstall Bazel if needing to upgrade to a new Bazel version.

* Add patch for properly detecting Windows Python headers when building gRPC.

* Add minimum Bazel version check.

* Update docs with new Bazel version.
2021-08-21 11:33:11 -07:00
Edward Oakes
b969aa3c80
[dashboard] Don't start dashboard agent when missing dependencies (#17966) 2021-08-21 01:04:21 -07:00
Eric Liang
58e35a21b4
Add split_at_indices() (#17990) 2021-08-20 15:35:22 -07:00
Chen Shen
dac1ba632e
[usability][rfc] ray status show demand summary by default (#17892) 2021-08-20 15:29:37 -07:00
Chris K. W
e3fb9650b2
[Client] Skip client object ref, actor handle, and actor ref dealloc/del if client package has already been cleaned up (#17969) 2021-08-20 15:18:43 -07:00
Edward Oakes
3ea5c0dc6b
[serve] Remove deprecated routing code (ServeStarletteRouter) (#17986) 2021-08-20 16:56:45 -05:00
Simon Mo
8236b7412e
[Serve] Mark serve.start beta API (instead of stable) (#17956) 2021-08-20 16:36:48 -05:00
Chen Shen
3dbb2e0020
change the way test run (#17930) 2021-08-20 11:26:16 -07:00
Edward Oakes
30541025e5
[serve] Remove deprecated APIs from code & docs (#17754) 2021-08-20 11:59:45 -05:00
Stephanie Wang
b8fe776638
[core] Fix inlined nested ids (#17834)
* test

* Use ObjectRef instead of ObjectID in nested refs

* java

* doc

* java

* build

* build

* x

* lint

* simplify

* fix
2021-08-20 08:58:29 -07:00
Amog Kamsetty
9416fce91b
[SGD] v2 Tune integration + iterator API (#17839)
* [SGD] implement SGD Trainer.to_tune_trainable

* address some comments

* add RESULT_DUPLICATE

* extract trainable creation logic out of Trainer

* add 1 CPU for driver

* use class attribute to fix serialization issues

* add examples

* add test for tune error

* tune

* test tune_linear

* run_iterator

* add to build file

* Update python/ray/util/sgd/v2/trainer.py

Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>

* Update python/ray/util/sgd/v2/trainer.py

Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>

* address comments

* fix tests & address comments

* resolve merge

* lint

* fix

* add team tag to tests

* fix tests

* lint

Co-authored-by: Matthew Deng <matthew.j.deng@gmail.com>
2021-08-20 08:31:21 -07:00
simonsays1980
60aee4a330
[RLlib] Add example script for bare metal Policy with custom view_requirements. (#17896) 2021-08-20 12:17:13 +02:00
Jingyu-Peng
40330ca439
Fix loading dynamic functions/classes when using code_search_path (#17605) 2021-08-20 17:24:11 +08:00
Eric Liang
236b772465
Revert "[GCS] GCS Based Actor Scheduler (#16580)" (#17941)
This reverts commit a9b4545502.
2021-08-19 21:46:52 -07:00
Eric Liang
661ac4e37b
Remove last traces of ref-counting flag (#17932) 2021-08-19 21:08:13 -07:00
architkulkarni
36c26578a7
[runtime env] [test] Add nightly test to verify Ray wheel URLs are valid (#17938) 2021-08-19 15:48:37 -07:00
matthewdeng
d081ee9d87
[SGD v2] Save checkpoints to disk (#17807)
* [SGD] save checkpoints to disk

* fix test; add logs

* rename log_dir to logdir for consistency with tune

* address comments: add run level directories, add CheckpointConfig

* check for empty strings

Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>

* address comments - refactor CheckpointStrategy, remove run_dir and checkpoint_dir configurability

* fix Trainer docs

* Update python/ray/util/sgd/v2/checkpoint.py

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* remove construct_path_with_default

Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-08-19 14:18:51 -07:00
Eric Liang
238941f857
Ray workflow comparison examples + add to tests (#17880) 2021-08-19 12:19:08 -07:00
architkulkarni
5ed3f0ce35
[Serve] [Dashboard] Add end times and DELETED state for endpoints (#17898) 2021-08-19 11:10:42 -05:00
souravraha
f5fcb3c576
Fixes bug #17424. (#17437) 2021-08-19 12:23:36 +02:00
Chong-Li
5e22257cec
[GCS] Fix: GCS Based Actor Scheduler (#17944) 2021-08-18 23:40:35 -07:00
Clark Zinzow
d958457d07
[Core] Second pass at privatizing APIs. (#17885)
* gcs_utils

* resource_spec

* profiling

* ray_perf and ray_cluster_perf

* test_utils
2021-08-18 20:56:33 -07:00
Simon Mo
b573864928
[CI] Add test owners (#17893) 2021-08-18 18:38:31 -07:00
Eric Liang
a9073d16f4
Revert "[Core] Unified worker initiators (#17401)" (#17935)
This reverts commit c3764ffd7d.
2021-08-18 18:06:24 -07:00
Chong-Li
a9b4545502
[GCS] GCS Based Actor Scheduler (#16580) 2021-08-18 13:44:59 -07:00
Yi Cheng
ddc2e59af5
[workflow] Simplify the workflow storage layer (#17883) 2021-08-18 13:26:50 -07:00
architkulkarni
7e109a3266
[hotfix] [runtime env] change MacOS wheel URL from 10_13 to 10_15 (#17902) 2021-08-18 09:16:09 +02:00
Eric Liang
5536c5fff6
Add namespace argument to Ray client get actor call (#17878) 2021-08-17 16:41:18 -07:00
SangBin Cho
4971e13941
[Build] Asan wheel test (#17685)
* in progerss

* ASAN tests.

* d

* in progress

* in progress without the asan wheel

* Support the asan wheel.

* Support the asan wheels

* Not build a binary for asan

* Fix issues

* Remove a wrong build

* Separate out asan wheel build

* Try preparing more deps.

* ip

* Try different version

* done

* d

* Trial

* Another try

* Another try

* skip cpp build to see what happens

* add more des

* ip

* abc

* Try next

* completed

* try

* Try without static libasan

* dbg

* Try static link

* Fix issues

* abc
2021-08-17 10:21:41 -07:00
Hasan Genc
adc0c47b4f
Shutdown clusters on AWS with >1000 nodes (#17841)
* Revert "Revert "Shutdown clusters when large number of nodes (#17642)" (#17836)"

This reverts commit 6957ce66f6.

* Update unit test and fix terminate_nodes
2021-08-17 16:26:10 +03:00
chenk008
c3764ffd7d
[Core] Unified worker initiators (#17401)
* use setup_worker as starter

* use setup_worker as starter

* add java test

* fix

* fix

* lint

* sleep in ci

* sleep in ci

* fix ut

* fix

* fix

* fix

* fix

* fix

* fix

* change test size

* test

* fix

* fix

* fix ut

* restore sgd test

* change test size

* fix merge confict

* restore cpp worker flag

* fix

* fix

* add worker-languange in setup_runtime_env.py

* lint

* fix java command

Co-authored-by: root <chenk008>
2021-08-17 19:37:26 +08:00
Guyang Song
8227e24424
[event] event framework integration in raylet, gcs server and core worker (#17671) 2021-08-17 11:21:23 +08:00
Hao Chen
ddb0dc8ad2
Fix client_test_enabled (#17699)
* Fix client_test_enabled

* fix

* trigger CI
2021-08-17 10:59:50 +08:00
Chen Shen
a9757a86b3
[Core] Fix nested ref count bug: add NestedIds to reference_counter once a task returns (#17802)
* add nested reference

* fix bug
2021-08-16 19:02:26 -07:00
Ian Rodney
8fe7111a7b
[Client] Bump Proto Version (#17879) 2021-08-16 17:08:36 -07:00
Yi Cheng
03a82d733a
Revert "Revert "Export useful metrics"" (#17755)
* Revert "Revert "[Observability] Export useful metrics (#17578)" (#17752)"

This reverts commit 02e79f3fe5.

* Update metric.h

* up

* up

* Update server_call.h

* Update test_metrics_agent.py

* up

* fix comment
2021-08-16 17:05:56 -07:00
Ian Rodney
2f200e5c2b
[Client] Pass ray.init() args to the remote server (#17776) 2021-08-16 12:34:01 -07:00
architkulkarni
7d690e7231
[serve] Add deployment and replica tags to logs (#17830) 2021-08-16 11:00:39 -05:00