Commit graph

93 commits

Author SHA1 Message Date
Sven Mika
4cb23d1c95
[Tune; Testing] Revert to 3.7 (undone by accident by previous PR); + some minor comment cleanups. (#20031) 2021-11-04 10:58:34 +01:00
mwtian
f83195a1e1
[Build] Add GCS HA builds (#20008)
## Why are these changes needed?
Add builds for Python tests with GCS pubsub enabled.

## Related issue number
2021-11-03 11:58:16 -07:00
Avnish Narayan
026bf01071
[RLlib] Upgrade gym version to 0.21 and deprecate pendulum-v0. (#19535)
* Fix QMix, SAC, and MADDPA too.

* Unpin gym and deprecate pendulum v0

Many tests in rllib depended on pendulum v0,
however in gym 0.21, pendulum v0 was deprecated
in favor of pendulum v1. This may change reward
thresholds, so will have to potentially rerun
all of the pendulum v1 benchmarks, or use another
environment in favor. The same applies to frozen
lake v0 and frozen lake v1

Lastly, all of the RLlib tests and have
been moved to python 3.7

* Add gym installation based on python version.

Pin python<= 3.6 to gym 0.19 due to install
issues with atari roms in gym 0.20

* Reformatting

* Fixing tests

* Move atari-py install conditional to req.txt

* migrate to new ale install method

* Fix QMix, SAC, and MADDPA too.

* Unpin gym and deprecate pendulum v0

Many tests in rllib depended on pendulum v0,
however in gym 0.21, pendulum v0 was deprecated
in favor of pendulum v1. This may change reward
thresholds, so will have to potentially rerun
all of the pendulum v1 benchmarks, or use another
environment in favor. The same applies to frozen
lake v0 and frozen lake v1

Lastly, all of the RLlib tests and have
been moved to python 3.7
* Add gym installation based on python version.

Pin python<= 3.6 to gym 0.19 due to install
issues with atari roms in gym 0.20

Move atari-py install conditional to req.txt

migrate to new ale install method

Make parametric_actions_cartpole return float32 actions/obs

Adding type conversions if obs/actions don't match space

Add utils to make elements match gym space dtypes

Co-authored-by: Jun Gong <jungong@anyscale.com>
Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-11-03 16:24:00 +01:00
Sven Mika
e6ae08f416
[RLlib] Optionally don't drop last ts in v-trace calculations (APPO and IMPALA). (#19601) 2021-11-03 10:01:34 +01:00
Sven Mika
2d24ef0d32
[RLlib] Add all simple learning tests as framework=tf2. (#19273)
* Unpin gym and deprecate pendulum v0

Many tests in rllib depended on pendulum v0,
however in gym 0.21, pendulum v0 was deprecated
in favor of pendulum v1. This may change reward
thresholds, so will have to potentially rerun
all of the pendulum v1 benchmarks, or use another
environment in favor. The same applies to frozen
lake v0 and frozen lake v1

Lastly, all of the RLlib tests and Tune tests have
been moved to python 3.7

* fix tune test_sampler::testSampleBoundsAx

* fix re-install ray for py3.7 tests

Co-authored-by: avnishn <avnishn@uw.edu>
2021-11-02 12:10:17 +01:00
mwtian
7afdfdc6dd
[CI] narrow down tests that run when files change (#19656) 2021-10-29 16:47:54 -07:00
matthewdeng
bfb0ef1b08
move jsonschema to core dependencies and update default AutoscalerPrometheusMetrics (#19831) 2021-10-28 13:04:22 -07:00
Amog Kamsetty
db863aafc0
Revert "Revert "[Docker] Support multiple CUDA Versions (#19505)" (#19756)" (#19763)
This reverts commit e58fcca404.
2021-10-26 17:32:56 -07:00
Amog Kamsetty
e58fcca404
Revert "[Docker] Support multiple CUDA Versions (#19505)" (#19756)
This reverts commit f0053d405b.
2021-10-26 12:55:20 -07:00
Avnish Narayan
ad87ddf93e
[rllib] Add deterministic test to gpu (#19306)
Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-10-26 10:11:39 -07:00
Amog Kamsetty
f0053d405b
[Docker] Support multiple CUDA Versions (#19505)
* wip

* wip

* update

* finish

* deprecate

* debug

* fix and address comments

* try catch

* fix

* split tests

* force

* merge

* docs

* wip

* fix and check

* update readme

* fix

* fix

* fix sanity checking

* format
2021-10-25 18:57:05 -07:00
Jiajun Yao
256bf0bf3a
[Release] Bump up dask to latest compatible version 2021.9.1 (#19592)
* Bump up dask to latest compatible version 2021.9.1

* Bump up dask to latest compatible version 2021.9.1
2021-10-22 09:16:28 -07:00
Simon Mo
03805d4064
[Serve] Good error message when Serve not installed and ensure Serve installs ray[default] (#19570) 2021-10-21 13:47:29 -07:00
architkulkarni
b8941338d3
[runtime env] Raise error when creating runtime env when ray[default] is not installed (#19491) 2021-10-19 09:16:04 -05:00
matthewdeng
4674c78050
[Train] Rename Ray SGD v2 to Ray Train (#19436) 2021-10-18 22:27:46 -07:00
Kai Fricke
d8d8901192
[ci/tune] Remove deprecated jenkins_only tag from test tags (#19287) 2021-10-12 10:05:46 +01:00
SangBin Cho
0ef0d9a77d
Revert "[core] Assign tasks to the first available worker (#18167)" (#19180)
This reverts commit 545db13800.
2021-10-07 10:38:37 -07:00
Stephanie Wang
545db13800
[core] Assign tasks to the first available worker (#18167)
* Convert worker pool to queue

* Start up to backlog size more workers

* fixes

* Prestart workers according to num available CPUs

* lint

* x

* Update src/ray/raylet/worker_pool.h

Co-authored-by: Eric Liang <ekhliang@gmail.com>

* Update src/ray/raylet/worker_pool.h

Co-authored-by: Eric Liang <ekhliang@gmail.com>

* dedicated workers

* Fix tests

* x

* fix

* asan

* asan

* Workers can only exec tasks with same job ID

* size_t for runtime env hash, fix unit tests

* include job ID in runtime env hash, remove from worker registration msg

* x

* conflict

* debug

* Schedule and dispatch periodically, skip if no new tasks

* Update src/ray/common/task/task_spec.h

Co-authored-by: Eric Liang <ekhliang@gmail.com>

* Update src/ray/raylet/scheduling/cluster_task_manager.h

Co-authored-by: Eric Liang <ekhliang@gmail.com>

* Update src/ray/raylet/worker_pool.h

Co-authored-by: Eric Liang <ekhliang@gmail.com>

Co-authored-by: Eric Liang <ekhliang@gmail.com>
2021-10-05 13:45:50 -07:00
Kai Fricke
3dc176c42e
[ci/tune] Add SGD and Tune GPU pipeline step to CI (#18469)
* [ci/tune] Add Tune GPU pipeline step to CI

* cont.

* add sgd gpu tests

* format yaml, fix imports

* install horovod; fix line wrapping

* set GPU per worker to 0.5

* fix import

* move test to 4gpu machine

* fix lint

* lint

* set visible devices

* pull in tf gpu fix

* Fix Tune GPU pipeline step

* nit

* Disable GPU tests until we have some

* Re-add empty rllib tests

Co-authored-by: Matthew Deng <matthew.j.deng@gmail.com>
2021-10-01 18:34:05 -07:00
architkulkarni
0f0b161ea1
Revert "Revert "[Serve] [doc] Improve runtime env doc"" (#18943)
* Revert "Revert "[Serve] [doc] Improve runtime env doc (#18782)" (#18935)"

This reverts commit e4f4c79252.
2021-09-30 13:28:44 -05:00
Yi Cheng
e4f4c79252
Revert "[Serve] [doc] Improve runtime env doc (#18782)" (#18935)
This reverts commit d4d71985d5.
2021-09-27 21:52:13 -07:00
architkulkarni
d4d71985d5
[Serve] [doc] Improve runtime env doc (#18782) 2021-09-27 16:12:03 -05:00
Chen Shen
35aa944ef4
Fix thread-safety in global state accessor (#18746) 2021-09-19 12:01:31 -07:00
mwtian
efdbfcfdfb
[Build] Generate Bazel config for compiling with clang and libc++ in CI (#18622)
* Add Bazel config for building with llvm. Upgrade C++ std to 17.

* Fix redis. Try fixing asan and tsan

* Fix asan and format

* Update comments.

Co-authored-by: Chen Shen <scv119@gmail.com>
2021-09-17 19:01:07 -07:00
Sven Mika
8a72824c63
[RLlib Testig] Split and unflake more CI tests (make sure all jobs are < 30min). (#18591) 2021-09-15 22:16:48 +02:00
Edward Oakes
7736cdd91d
[dashboard] Rename "new_dashboard" -> "dashboard" (#18214) 2021-09-15 11:17:15 -05:00
Simon Mo
497c5f56fa
[CI] Temporary disable worker-in-container test (#18606)
* revert again

* disable tmp
2021-09-14 22:38:20 -07:00
mwtian
a3f399ef10
[Client] fix propagating errors to async calls during disconnect, and other cleanup (#18539)
* cleanup tests and errors for clients

* Fix lock and async get

* rerun

* Avoid running callback under lock. Make lock non-reentrant

* Add all necessary apis

* Removed unused APIs
2021-09-14 18:48:27 +03:00
Yi Cheng
7d1f408de9
[workflow] Move experimental/workflow to workflow (#18521) 2021-09-13 17:45:18 -07:00
Chen Shen
5f57079041
use clang for C++ debug testing (#18343) 2021-09-09 15:48:36 -07:00
Simon Mo
a29da81cfc
Revert "Revert "Fix tracing bug when actors are defined before connecting to …" (#16122) 2021-09-07 16:19:49 -07:00
matthewdeng
a3123b6860
[SGD] v2 Horovod backend (#18047)
* [SGD] add Horovod backend

* address comments: set CUDA_VISIBLE_DEVICES, refactor code

* fix gpu test

* fix lint/test import

* address comments, add example cluster config

* delay horovod imports
2021-08-31 12:54:59 -07:00
Kai Fricke
a8dbc44f9a
[ci] minimal dependency install test (#18071) 2021-08-31 15:26:25 +02:00
Sven Mika
599e589481
[RLlib] Move existing fake multi-GPU learning tests into separate buildkite job. (#18065) 2021-08-31 14:56:53 +02:00
Antoni Baum
5be6bda4cf
[tests] Add Ludwig CI test (#18126) 2021-08-30 12:27:39 -07:00
Amog Kamsetty
3b77840c1b
PyTorch Lightning Updates (#17876) 2021-08-27 23:15:51 -07:00
Chen Shen
7e3e0d1535
[Test] Add C++ tsan test (#17875) 2021-08-24 00:57:32 -07:00
Chen Shen
880797d5c2
[Core][Test] Add ubsan support for C++ tests (#17812)
* support ubsan

* update
2021-08-17 10:22:03 -07:00
SangBin Cho
4971e13941
[Build] Asan wheel test (#17685)
* in progerss

* ASAN tests.

* d

* in progress

* in progress without the asan wheel

* Support the asan wheel.

* Support the asan wheels

* Not build a binary for asan

* Fix issues

* Remove a wrong build

* Separate out asan wheel build

* Try preparing more deps.

* ip

* Try different version

* done

* d

* Trial

* Another try

* Another try

* skip cpp build to see what happens

* add more des

* ip

* abc

* Try next

* completed

* try

* Try without static libasan

* dbg

* Try static link

* Fix issues

* abc
2021-08-17 10:21:41 -07:00
Sven Mika
f3bbe4ea44
[RLlib] Test cases/BUILD cleanup; split "everything else" (longest running one rn) tests in 2. (#17640) 2021-08-16 22:01:01 +02:00
Clark Zinzow
d6eeb5dc70
[Datasets] Add local and S3 filesystem test coverage for file-based datasources. (#17158) 2021-08-12 08:39:31 -07:00
Chen Shen
0fd3f761b9
[ci][rfc] build debug wheels and run python test on debug build (#17399)
* enable debug mode

* add

* :upload debug wheels

* upload debug wheels

* add

* fix bug

* add dbg

* Update python/setup.py

Co-authored-by: Simon Mo <simon.mo@hey.com>

* skip windows

Co-authored-by: Simon Mo <simon.mo@hey.com>
2021-08-05 17:58:19 -07:00
Eric Liang
d4f9d3620e
Move ray.data out of experimental (#17560) 2021-08-04 13:31:10 -07:00
Sven Mika
5231fdd996
[Testing] Split RLlib example scripts CI tests into 4 jobs (from 2). (#17331) 2021-07-26 10:52:55 -04:00
matthewdeng
fdbeef6046
[SGD] RaySGD v2 skeleton code (#17300)
* [SGD] RaySGD v2 skeleton code

* add build file

* move file

* empty

* rename

* address comments

* add method interfaces

* move BUILD file out of tests dir

Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>
2021-07-25 17:39:24 -07:00
mwtian
b8e71f641c
[Build] Ray Docker image for Python 3.9. (#16571) 2021-07-22 13:38:57 -07:00
Amog Kamsetty
5c589debfa
Revert "Set runs_per_test_detect_flakes for core tests on master (#16863)" (#16936)
This reverts commit 44042519af.
2021-07-07 10:25:46 -07:00
Eric Liang
44042519af
Set runs_per_test_detect_flakes for core tests on master (#16863) 2021-07-06 18:46:48 -07:00
Clark Zinzow
52da2cce68
[Dataset] Adds JSON, CSV, Pandas, and Dask IO layers, and adds the write side of the Parquet IO layer. (#16724) 2021-07-01 11:57:40 -07:00
chenk008
06c7db7dca
[Core] Rename container option and ray-nest-container (#16771)
* rename container_option to container

* rename ray-nest-container to ray-worker-container

* lint

Co-authored-by: wuhua.ck <wuhua.ck@alibaba-inc.com>
2021-07-01 13:12:26 +08:00