architkulkarni
e8c25a2fa4
[Core] [runtime env] Merge child's runtime_env["env_vars"] with that of parent ( #16553 )
2021-06-24 12:13:13 -05:00
Simon Mo
aabdfe2989
[Serve] Fix HTTP headers ( #16647 )
2021-06-24 11:59:43 -05:00
Amog Kamsetty
53d16365b0
[Release] Convert Horovod and SGD release tests ( #15999 )
2021-06-24 15:56:02 +01:00
Kai Fricke
ef97bdd407
[release] Fix app config: Install latest releases. Bump xgboost-ray version ( #16581 )
2021-06-24 12:56:21 +01:00
Gabriele Oliaro
3e2f608145
Work stealing! ( #15475 )
...
* work_stealing one commit squash
* using random task id to request workers
* inlining methods in direct_task_transport.h
* faster checking for presence of stealable tasks in RequestNewWorkerIfNeeded
* linting
* fixup! using random task id to request workers
* estimating number of tasks to steal based only on tasks in flight
* linting
* fixup! linting
* backup of changes
* fixed issue in scheduling queue test after merge
* linting
* redesigned work stealing. compiles but not tested
* all tests passing locally
* fixup! all tests passing locally
* fixup! fixup! all tests passing locally
* fixed big bug in StealTasksIfNeeded
* rev1
* rev2 (before removing the work_stealing param)
* removed work_stealing flag, fixed existing unit tests
* added unit tests; need to figure out how to assign distinct worker ids in GrantWorkerLease
* fixed work stealing test
* revisions, added two more unit/regression tests
* test
2021-06-23 17:08:28 -07:00
Frank Luan
9249287a36
Object spilling threshold ( #16558 )
...
* Object spilling threshold
* clang-format
* Make tests more lenient
* Fix tests
* Fix tests
* Address comments
* Fix tests lint
* Refactor
* Fix tests
* Fix cpp tests
* Address comments
2021-06-23 16:54:41 -07:00
SangBin Cho
f816f613c7
[Test] Handle flaky tests ( #16602 )
...
* Handle flaky tests.
* lint
* tag more
* add test_scheduling
* Remove global gc
Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>
2021-06-23 16:24:12 -07:00
Amog Kamsetty
b9e5ca4c18
[tune] Deflake mnist_ptl_mini ( #16555 )
2021-06-23 14:26:40 -07:00
Eric Liang
29afaa34b6
FetchOrReconstruct message can get re-ordered until after task finishes, leaking get bundles
2021-06-23 14:02:05 -07:00
SangBin Cho
ccb02dacb6
Mark the global gc test unflaky ( #16601 )
2021-06-23 13:38:32 -07:00
architkulkarni
9cb65d5e2f
[Core] Move wheel URL utils from test_utils to utils ( #16386 )
2021-06-23 13:41:02 -05:00
mwtian
48599aef9e
Roll forward to run train_small in client mode. ( #16610 )
2021-06-23 08:52:08 +01:00
Sven Mika
c95dea51e9
[RLlib] External env enhancements + more examples. ( #16583 )
2021-06-23 09:09:01 +02:00
chenk008
82d92d0d61
[Core]Use worker shim PID to check worker registration ( #16398 )
2021-06-22 21:12:53 -07:00
Kai Fricke
a1765ac627
[tune] move to local parameter registry for tune.with_parameters()
( #16611 )
2021-06-22 17:58:11 -07:00
Eric Liang
dd439dd108
fix seg ( #16620 )
2021-06-22 17:45:06 -07:00
Amog Kamsetty
e26c232954
[CI] Suppress output for Mac wheel build ( #16603 )
2021-06-22 09:03:50 -07:00
Chris K. W
b4f2cbce02
[Client] Disconnect on dataclient error ( #16588 )
...
* disconnect when main thread finds dataclient shut down, update error messages
* Add test_dataclient_disconnect to small tests
* drop unused var
* add __main__ section to test
* avoid direct ray import
* rerun
2021-06-22 16:46:10 +03:00
Tao Wang
d1db4744e3
[large scale]Get next job id from gcs instead of redis - python part ( #16528 )
2021-06-22 14:06:30 +08:00
Eric Liang
21b22da3dd
Fix race condition is using CreateRequestQueue for inbound chunks
2021-06-21 22:35:54 -07:00
Stephanie Wang
e7b752cf33
[core] Fix bug in task dependency management for duplicate args ( #16365 )
...
* Pytest
* Skip on windows
* C++
2021-06-21 22:32:04 -07:00
SangBin Cho
5efeb5334b
Revert "Same worker id in python and c++ ( #16568 )" ( #16600 )
...
This reverts commit 9b5c0c32da
.
2021-06-21 18:58:31 -07:00
Tao Wang
2affe97f1a
[Core][Minor]Remove the hard check when disconnect GCS client ( #16572 )
2021-06-22 09:29:25 +08:00
SangBin Cho
497f6cee38
[Docs] [Dask on Ray] Specify version compatibility ( #16595 )
...
* Dask compat
* Update common.py
* Create common.py
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
2021-06-21 18:15:20 -07:00
Ian Rodney
d3832ab2e1
[Client] Fix gRPC Timeout Options ( #16554 )
2021-06-21 14:25:41 -07:00
Alex Wu
9b5c0c32da
Same worker id in python and c++ ( #16568 )
...
* .
* .
* test
Co-authored-by: Alex Wu <alex@anyscale.com>
2021-06-21 13:22:52 -07:00
mwtian
f5f23448fc
Support downloading and testing wheels for Python 3.9. ( #16586 )
2021-06-21 12:02:22 -07:00
Siyuan (Ryans) Zhuang
b7995f66a4
[Workflow] Sync mode fault tolerance ( #16282 )
2021-06-21 10:05:27 -07:00
mvindiola1
82a3ff795c
[RLlib] ensure curiosity exploration actions are passed in as tf tens… ( #15704 )
2021-06-21 10:03:17 -07:00
Benjamin D. Killeen
50049f86d0
[rllib] check if self.env is not None
explicitly ( #15634 )
...
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-06-21 10:02:13 -07:00
jenhaoyang
aabd507ec7
[docs] Add docker run gpu note ( #15566 )
...
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-06-21 09:31:09 -07:00
Sven Mika
be6db06485
[RLlib] Re-do: Trainer: Support add and delete Policies. ( #16569 )
2021-06-21 13:46:01 +02:00
qicosmos
4da69174c8
[C++ Worker]Remove unused boost sub libs for the generated template project ( #16526 )
2021-06-21 14:46:48 +08:00
Qinghao Hu
d922a79385
[sgd] DataParallel after Apex init. ( #15645 )
...
* [FIX] DataParallel after Apex init.
* lint
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-06-20 22:44:15 -07:00
lanlin
e5b50fcc9d
[tune] allow to read trial results from json files in Analysis ( #15915 )
...
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-06-20 20:41:48 -07:00
Dmitri Gekhtman
cb878b6514
[doc][kubernetes] K8s doc updates ( #16570 )
2021-06-20 19:38:34 -07:00
Brandon
2ab1c74032
[docs] Add link for launching ray manually in quickstart ( #15384 )
...
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-06-20 17:47:12 -07:00
Eric Liang
a0da009645
Allocate inbound object chunks using CreateRequestQueue instead of immediate allocation ( #16523 )
2021-06-20 09:22:12 -07:00
Yorick van Zweeden
db7e2c8f21
Remove outdated code from PopulationBasedTrainingReplay ( #16564 )
...
Co-authored-by: Yorick van Zweeden <git@yorickvanzweeden.nl>
2021-06-20 15:22:52 +02:00
Amog Kamsetty
e6d9f0b393
[Dask] Support Dask 2021.06.1 ( #16547 )
2021-06-19 18:22:23 -07:00
Sven Mika
169ddabae7
[RLlib] Issue 15973: Trainer.with_updates(validate_config=...) behaves confusingly. ( #16429 )
2021-06-19 22:42:00 +02:00
Alex Wu
197dab0e2f
[docs] Deploying Ray ( #16538 )
...
Co-authored-by: Alex Wu <alex@anyscale.com>
2021-06-19 10:07:15 -07:00
Ian Rodney
16d762aed0
[DocSprint] Ray Client Docs ( #16497 )
...
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-06-19 10:05:37 -07:00
Amog Kamsetty
33d798f8fc
[Docs] Add e2e guide on using Pytorch Lightning with Ray ( #16484 )
...
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-06-19 10:04:58 -07:00
Richard Liaw
669b7a2e8c
[docs] Update community libraries ( #16557 )
2021-06-19 09:01:40 -07:00
Sven Mika
79a9d6d517
[RLlib] Issues 16287 and 16200: RLlib not rendering custom multi-agent Envs. ( #16428 )
2021-06-19 08:57:53 +02:00
Chen Shen
853caea146
[tests]migrate test-many-tasks/test-dead-actors to nightly tests ( #16469 )
...
* init commit
* Update release/nightly_tests/nightly_tests.yaml
Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
* Update release/nightly_tests/nightly_tests.yaml
Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
2021-06-18 18:43:25 -07:00
Achal Shah
eadee8aba7
[docs] Update API docs for ray.init ( #16533 )
...
The incorrect indentation caused the docs render weirdly:
https://docs.ray.io/en/master/package-ref.html
2021-06-18 18:02:44 -07:00
Alex Wu
319d4fb164
Job timestamp should always be in milliseconds (fixed) ( #16548 )
...
* .
* Revert "Revert "Job timestamp should always be in milliseconds (#16455 )" (#16545 )"
This reverts commit 5030ed8588
.
* .
* .
* .
Co-authored-by: Alex Wu <alex@anyscale.com>
2021-06-18 17:07:21 -07:00
Amog Kamsetty
416cf3a2e7
Revert "Revert "Enable TryCreateImmediately to use the fallback allocation" ( #16542 )" ( #16544 )
...
This reverts commit 36fd741e6f
.
2021-06-18 15:39:37 -07:00