Commit graph

5079 commits

Author SHA1 Message Date
Eric Liang
c9ca980c83
Check dataset pipeline is not read multiple times by accident (#18682) 2021-09-16 20:33:24 -07:00
Amog Kamsetty
84e958f330
[ML] Consolidate and upgrade Deep Learning Dependencies (#18574)
* wip
'

* upgrade requirements

* add file

* fix

* fixes

* Apply suggestions from code review

Try mlagents==0.21.0 for now (works with torch 1.9).

* Apply suggestions from code review

* wip

* wip

* fix

* fix

* upgrade lightning bolts

* address comment

Co-authored-by: Sven Mika <sven@anyscale.io>
2021-09-16 20:16:40 -07:00
Amog Kamsetty
de050e8187
[SGD] v2 Class API (#18571)
* wip

* wip

* add horovod example

* add example

* lint

* fix

* address comments

* updates

* lint

* update example

* address comment

* address comment

* update

* fix

* Update python/ray/util/sgd/v2/examples/horovod/horovod_stateful_example.py

Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>

* address comments

* add back name mangling

* fix tests

* Update python/ray/util/sgd/v2/trainer.py

* fix

* lint

* fix

* fix docstring

* Update python/ray/util/sgd/v2/tests/test_trainer.py

Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>

* update

Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
2021-09-16 12:33:38 -07:00
Simon Mo
eeaae5aa08
Revert "[Serve] Add InMemoryMetricsStore for Autoscaling (#18458)" (#18675)
This reverts commit a024effac7.
2021-09-16 11:37:31 -07:00
Simon Mo
a024effac7
[Serve] Add InMemoryMetricsStore for Autoscaling (#18458) 2021-09-16 11:08:42 -07:00
Simon Mo
317a34c523
[Serve] Use BackendConfig Protobuf (#17835) 2021-09-16 11:08:23 -07:00
Edward Oakes
e7ea1f9a82
[runtime_env] Remove global logger from working_dir code (#18605) 2021-09-16 10:37:45 -05:00
Jernej Makovsek
b5c5247ad4
Update example yaml file for running local clusters (#18530) 2021-09-16 02:24:45 -07:00
xwjiang2010
ea48b1227f
[Tune] Do not crash when resources are insufficient. (#18611) 2021-09-15 23:00:53 -07:00
Stephanie Wang
be7cb70c30
[core] Fix ref counting during actor construction (#18646)
* test

* fix

* cpp

* skip windows

Co-authored-by: Eric Liang <ekhliang@gmail.com>
2021-09-15 22:16:53 -07:00
Chris K. W
7df3441ae9
[client] Fix credential generation when secure=True but no credentials provided (#18636)
* set self._credentials if not provided

* fix credential generation
2021-09-16 00:37:33 +03:00
Antoni Baum
7e95f330d5
[ci] Fix xgboost_ray install from git (#18640) 2021-09-15 18:07:15 +01:00
Antoni Baum
d50ff16ccf
[ci] Fix HEBO breaking Tune tests (#18629) 2021-09-15 10:01:29 -07:00
Kai Fricke
0223ae9605
[xgboost] Bump xgboost_ray requirements_upstream.txt version to 0.1.3 (#18632) 2021-09-15 18:01:15 +01:00
Edward Oakes
7736cdd91d
[dashboard] Rename "new_dashboard" -> "dashboard" (#18214) 2021-09-15 11:17:15 -05:00
Edward Oakes
7d0a2b39e3
[runtime_env] Remove dynamically imported setup_hook (#18601) 2021-09-15 10:19:55 -05:00
Antoni Baum
eeb67a42cc
pip install xgboost_ray -> xgboost_ray[default] (#18607)
Co-authored-by: Kai Fricke <kai@anyscale.com>
2021-09-15 14:45:56 +01:00
Sven Mika
8a00154038
[RLlib] Bump tf version in ML docker to tf==2.5.0; add tfp to ML-docker. (#18544) 2021-09-15 08:46:37 +02:00
SangBin Cho
0684531e22
[Test] Break down placement group tests (#18612) 2021-09-14 21:55:18 -07:00
Chris K. W
cc1d7b8174
[client] Refactors for Reconnect PR (#18484)
* add refactors

* add worker annotation

* Regenerate credentials by default

* use self._secure

* infer secure if credentials provided

* separate _shutdown
2021-09-14 16:13:35 -07:00
Eric Liang
15512c27c2
Revert "Revert "Route core worker ERROR/FATAL logs to driver logs (#1… (#18604) 2021-09-14 13:32:07 -07:00
SangBin Cho
31e1638fb3
[CLI] Improve ray status for placement groups (#18289) 2021-09-14 11:29:13 -07:00
Edward Oakes
644f7bd7fa
[runtime_env] Remove no-longer-used mock setup function (#18600) 2021-09-14 11:35:09 -05:00
matthewdeng
380a653787
[SGD] update SGDv2 user guide docs (#18270)
* [SGD] update SGDv2 user guide docs

* Update doc/source/raysgd/v2/user_guide.rst

Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>

* add new line

* update docs

* fix header line length

* lint

* lint

* lint

* lint

* fix remaining lint issues

* Update doc/source/raysgd/v2/user_guide.rst

Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>

* Update doc/source/raysgd/v2/user_guide.rst

Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>

* address comments

* address comments

* add TODO for iterator API

* Update doc/source/raysgd/v2/user_guide.rst

Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>

* address comments

* address comments

* add tune doc

* restructure table of contents

* add examples; rename example files to include example suffix

* add quick start, porting code

* address comments

Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
2021-09-14 09:07:25 -07:00
mwtian
a3f399ef10
[Client] fix propagating errors to async calls during disconnect, and other cleanup (#18539)
* cleanup tests and errors for clients

* Fix lock and async get

* rerun

* Avoid running callback under lock. Make lock non-reentrant

* Add all necessary apis

* Removed unused APIs
2021-09-14 18:48:27 +03:00
Edward Oakes
7f8cdce67d
Revert "Route core worker ERROR/FATAL logs to driver logs (#18577)" (#18602)
This reverts commit 3e0ae38e11.
2021-09-14 10:41:10 -05:00
Antoni Baum
65d5deae60
[tests] Increase golden notebook test timeout to 20 mins (#18554) 2021-09-14 16:27:56 +01:00
Jiao
d3734d803d
[serve] Change nightly test docker image and enable micro benchmark (#18566) 2021-09-14 09:41:21 -05:00
Jiao
18bbf044a7
[serve] Add reconfigure with exception test and ensure it can rollback (#18568) 2021-09-14 08:39:46 -05:00
Ameer Haj Ali
e6807ecb43
Change tests owners for ml tests (#18417) 2021-09-14 01:04:52 -07:00
Eric Liang
3e0ae38e11
Route core worker ERROR/FATAL logs to driver logs (#18577) 2021-09-13 23:07:14 -07:00
Yi Cheng
7d1f408de9
[workflow] Move experimental/workflow to workflow (#18521) 2021-09-13 17:45:18 -07:00
Stephanie Wang
284dee493e
[core][usability] Disambiguate ObjectLostErrors for better understandability (#18292)
* Define error types, throw error for ObjectReleased

* x

* Disambiguate OBJECT_UNRECONSTRUCTABLE and OBJECT_LOST

* OwnerDiedError

* fix test

* x

* ObjectReconstructionFailed

* ObjectReconstructionFailed

* x

* x

* print owner addr

* str

* doc

* rename

* x
2021-09-13 16:16:17 -07:00
Alex Wu
6479a5fcfc
[Workflow] dedupe download on recovery (#18564)
* maybe works

* ?

* .

* seems to work

* seems to work

* .

* .

* lint

* address comments

* .

* test

* test

* .

* works?

* cleanup

* cleanup

* cleanup

* cleanup

* fix test + cleanup

* lint

* .

* lint

* lint

* lint

* lint

* fix tests

* lint

Co-authored-by: Alex Wu <alex@anyscale.com>
2021-09-13 15:30:47 -07:00
Edward Oakes
766d5526bb
[serve] Revert changes to serve::test_runtime_env (#18558) 2021-09-13 16:29:41 -05:00
Yi Cheng
e4d36f749d
[deflaky] Fix workflow storage test (#18536) 2021-09-13 12:47:30 -07:00
Clark Zinzow
a0fcc311ec
[Datasets] URL-encode paths if they are URLs. (#18440) 2021-09-13 12:46:21 -07:00
Chris K. W
5db1b2395e
ensure queue order matches req_id order (#18504) 2021-09-13 12:27:49 -07:00
Edward Oakes
111a31d6a1
[runtime_env] Make Ray client server setup go through the runtime_env agent (#18478) 2021-09-13 14:16:35 -05:00
Edward Oakes
3869af551a
[serve] Fix missing f-string in rollback message (#18561) 2021-09-13 14:16:04 -05:00
Eric Liang
a0336578a9
Update API stability annotations (#18452) 2021-09-13 11:00:16 -07:00
Jiao
1e26ca83ed
[serve] Add ability to recover replica state from actor names (#18293) 2021-09-13 11:44:29 -05:00
Kai Fricke
092a8a92f2
[tune] Improve HyperOpt KeyError message when metric was not found (#18549) 2021-09-13 17:02:20 +01:00
Guyang Song
3bc5f0501f
fix WaitPlacementGroupReady API (#18464) 2021-09-13 14:07:40 +08:00
Jiajun Yao
ae10a80d5e
Fix async actor worker process leak after calling ray.actor.exit_actor() (#18526) 2021-09-12 11:09:12 -07:00
Yi Cheng
15d67aa775
Support workflows step names via decorator (#18520) 2021-09-11 13:39:07 -07:00
Qing Wang
371f03fa48
Remove dynamic resource from client side. (#18514) 2021-09-11 10:39:59 -07:00
Chong-Li
d314d0c10e
[GCS] Fix the Windows build of GCS actor scheduling (#18012) 2021-09-10 17:17:25 -07:00
Alex Wu
1587eb22f0
[workflow] Dedupe object reference uploads (#18438)
* maybe works

* ?

* .

* seems to work

* seems to work

* .

* .

* lint

* address comments

* .

* test

* test

* .

* works?

* cleanup

* cleanup

* cleanup

* cleanup

* fix test + cleanup

* lint

Co-authored-by: Alex Wu <alex@anyscale.com>
2021-09-10 16:08:11 -07:00
dependabot[bot]
30012c990f
[tune](deps): Bump matplotlib in /python/requirements/tune (#18025)
Bumps [matplotlib](https://github.com/matplotlib/matplotlib) from 3.4.2 to 3.4.3.
- [Release notes](https://github.com/matplotlib/matplotlib/releases)
- [Commits](https://github.com/matplotlib/matplotlib/compare/v3.4.2...v3.4.3)

---
updated-dependencies:
- dependency-name: matplotlib
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-09-10 16:00:16 -07:00