Commit graph

8585 commits

Author SHA1 Message Date
architkulkarni
b9f6132c08
skip flaky conda env fixture on MacOS (#16710) 2021-06-28 09:38:17 -07:00
Tao Wang
38157a3166
[Core]support external redis address when starting ray processes (#13170)
* support external redis address when starting ray processes

* use a more general name

* add cli option

* handle some details

* fix set shards logic

* reuse --address instead of introduce a new one

* lint

* tiny

* lint and fix
2021-06-28 09:22:40 -07:00
Kai Fricke
04bfba1274
[tune] Move reporter detection to utility function (#16673)
Test failures seem unrelated
2021-06-28 12:55:05 +01:00
qicosmos
500891c1e0
[C++ Worker]Support windows (#16700) 2021-06-28 17:45:20 +08:00
Amog Kamsetty
54ce8092ab
[Tune] Update transformers to 4.6.1 (#16397)
* add examples

* update dask docs

* add build file

* formatting

* fix ci command

* fix

* Update python/ray/util/dask/BUILD

* newline

* fix pytest fixtures

* fixes

* formatting

* fix shuffle example

* update

* dont log to wandb
2021-06-26 14:10:47 -07:00
AnnaKosiorek
1e709771b2
[rllib][minor] clarification of the softmax axis in dqn_torch_policy (#16311)
pytorch nn.functional.softmax (unlike tf.nn.softmax) calculates softmax along zeroth dimension by default
2021-06-26 11:19:54 -07:00
Eric Liang
aa882ed52d
Make it more convenient to develop ray.data by setting RAY_EXPERIMENTAL_DATA_API=1 (#16685)
* make it convenient to import ray.data

* update

* Update python/ray/experimental/data/read_api.py

Co-authored-by: Alex Wu <itswu.alex@gmail.com>

Co-authored-by: Alex Wu <itswu.alex@gmail.com>
2021-06-26 09:17:30 -07:00
Eric Liang
6bfa97eed7
Check in the first iteration of an Arrow-based dataset api (#16648) 2021-06-25 18:45:13 -07:00
Eric Liang
3f5ce01949
Address leftover comments from https://github.com/ray-project/ray/pull/16394/files (#16684) 2021-06-25 16:45:50 -07:00
Dmitri Gekhtman
7b58ec9ae5
[autoscaler] rsync bootstrap flag (#16667) 2021-06-25 15:26:47 -07:00
Eric Liang
9b17c35bee
Fix PullManager handling of get requests and liveness issues (#16394) 2021-06-25 13:01:46 -07:00
Kai Fricke
696334ff08
[tune] Fix Tee utility class properties (#16674) 2021-06-25 18:19:01 +01:00
architkulkarni
06dfd8dddb
Revert "[Dashboard][event] Basic event module (#16283)" (#16676)
This reverts commit 5afa53aa64.
2021-06-25 09:38:18 -07:00
architkulkarni
35039869ee
Revert "[RLlib] Add some learning tests to rllib-flaky (#16604)" (#16677)
This reverts commit d1510911e0.
2021-06-25 09:37:58 -07:00
Lixin Wei
a9d6e93977
[scheduler] Rename TaskRequest to ResourceRequest (#16649) 2021-06-25 08:50:20 -07:00
architkulkarni
503641c2c2
[Core] [runtime env] add C++ test for caching workers by runtime env hash (#16664) 2021-06-25 09:38:37 -05:00
architkulkarni
b15ab2d60b
[Core] [runtime env] Support specifying runtime env in @ray.remote decorator (#16660) 2021-06-25 09:37:40 -05:00
SongGuyang
e74d9d3ded
[runtime env] Download runtime env(conda) in agent instead of setup_worker (#16525) 2021-06-25 19:39:05 +08:00
dependabot[bot]
2e3771cc29
[tune](deps): Bump tensorflow-probability in /python/requirements/tune (#16561)
Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-06-25 11:50:35 +01:00
fyrestone
5afa53aa64
[Dashboard][event] Basic event module (#16283) 2021-06-25 13:59:02 +08:00
mwtian
49b8b86488
Remove empty ClusterTaskManager::ScheduleInfeasibleTasks() (#16665) 2021-06-24 22:34:57 -07:00
Eric Liang
1c709cbeb3
Fix typing (#16668) 2021-06-24 22:06:33 -07:00
Chen Shen
c4d7b31a79
[Test] Placement group stress test (#16633) 2021-06-24 21:35:55 -07:00
Qing Wang
89b07572da
[Java] Upgrade log4j (#16657) 2021-06-24 21:01:27 -07:00
Alex Wu
bfe85326f2
[core] Cleanup dead pubsub related code (#16629) 2021-06-24 19:36:56 -07:00
Dmitri Gekhtman
ea23382919
[autoscaler][docs] Doc tweak (#16663)
* doc-tweak

* fix
2021-06-24 16:25:00 -07:00
Alex Wu
8ffaa8d3fa
Refactor pubsub to support GCS publisher/raylet client (#16624)
* .

* .

* .

* .

* .

* import error :(

* boop

* .

* fix tests

* fix tests

* .

* cleanup

Co-authored-by: Alex Wu <alex@anyscale.com>
2021-06-24 15:30:42 -07:00
Amog Kamsetty
d1510911e0
[RLlib] Add some learning tests to rllib-flaky (#16604) 2021-06-25 00:28:54 +02:00
architkulkarni
8587f9d738
[Core] [runtime env] Fix conda/pip filepaths relative to working_dir (#16186) 2021-06-24 16:43:25 -05:00
Qing Wang
3272997b0d
[Java] Upgrade some deps to fix CVEs (#16650) 2021-06-24 10:56:20 -07:00
architkulkarni
4637298d36
Delete conda env before creating to deflake test_runtime_env_complicated (#16628) 2021-06-24 12:13:26 -05:00
architkulkarni
e8c25a2fa4
[Core] [runtime env] Merge child's runtime_env["env_vars"] with that of parent (#16553) 2021-06-24 12:13:13 -05:00
Simon Mo
aabdfe2989
[Serve] Fix HTTP headers (#16647) 2021-06-24 11:59:43 -05:00
Amog Kamsetty
53d16365b0
[Release] Convert Horovod and SGD release tests (#15999) 2021-06-24 15:56:02 +01:00
Kai Fricke
ef97bdd407
[release] Fix app config: Install latest releases. Bump xgboost-ray version (#16581) 2021-06-24 12:56:21 +01:00
Gabriele Oliaro
3e2f608145
Work stealing! (#15475)
* work_stealing one commit squash

* using random task id to request workers

* inlining methods in direct_task_transport.h

* faster checking for presence of stealable tasks in RequestNewWorkerIfNeeded

* linting

* fixup! using random task id to request workers

* estimating number of tasks to steal based only on tasks in flight

* linting

* fixup! linting

* backup of changes

* fixed issue in scheduling queue test after merge

* linting

* redesigned work stealing. compiles but not tested

* all tests passing locally

* fixup! all tests passing locally

* fixup! fixup! all tests passing locally

* fixed big bug in StealTasksIfNeeded

* rev1

* rev2 (before removing the work_stealing param)

* removed work_stealing flag, fixed existing unit tests

* added unit tests; need to figure out how to assign distinct worker ids in GrantWorkerLease

* fixed work stealing test

* revisions, added two more unit/regression tests

* test
2021-06-23 17:08:28 -07:00
Frank Luan
9249287a36
Object spilling threshold (#16558)
* Object spilling threshold

* clang-format

* Make tests more lenient

* Fix tests

* Fix tests

* Address comments

* Fix tests lint

* Refactor

* Fix tests

* Fix cpp tests

* Address comments
2021-06-23 16:54:41 -07:00
SangBin Cho
f816f613c7
[Test] Handle flaky tests (#16602)
* Handle flaky tests.

* lint

* tag more

* add test_scheduling

* Remove global gc

Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>
2021-06-23 16:24:12 -07:00
Amog Kamsetty
b9e5ca4c18
[tune] Deflake mnist_ptl_mini (#16555) 2021-06-23 14:26:40 -07:00
Eric Liang
29afaa34b6
FetchOrReconstruct message can get re-ordered until after task finishes, leaking get bundles 2021-06-23 14:02:05 -07:00
SangBin Cho
ccb02dacb6
Mark the global gc test unflaky (#16601) 2021-06-23 13:38:32 -07:00
architkulkarni
9cb65d5e2f
[Core] Move wheel URL utils from test_utils to utils (#16386) 2021-06-23 13:41:02 -05:00
mwtian
48599aef9e
Roll forward to run train_small in client mode. (#16610) 2021-06-23 08:52:08 +01:00
Sven Mika
c95dea51e9
[RLlib] External env enhancements + more examples. (#16583) 2021-06-23 09:09:01 +02:00
chenk008
82d92d0d61
[Core]Use worker shim PID to check worker registration (#16398) 2021-06-22 21:12:53 -07:00
Kai Fricke
a1765ac627
[tune] move to local parameter registry for tune.with_parameters() (#16611) 2021-06-22 17:58:11 -07:00
Eric Liang
dd439dd108
fix seg (#16620) 2021-06-22 17:45:06 -07:00
Amog Kamsetty
e26c232954
[CI] Suppress output for Mac wheel build (#16603) 2021-06-22 09:03:50 -07:00
Chris K. W
b4f2cbce02
[Client] Disconnect on dataclient error (#16588)
* disconnect when main thread finds dataclient shut down, update error messages

* Add test_dataclient_disconnect to small tests

* drop unused var

* add __main__ section to test

* avoid direct ray import

* rerun
2021-06-22 16:46:10 +03:00
Tao Wang
d1db4744e3
[large scale]Get next job id from gcs instead of redis - python part (#16528) 2021-06-22 14:06:30 +08:00