Commit graph

217 commits

Author SHA1 Message Date
Yuhao Yang
ad4da17899 [Tune] Add example and tutorial for DCGAN (#6400) 2019-12-13 14:15:44 -08:00
Eric Liang
be5dd8eb5e
Enable direct calls by default (#6367)
* wip

* add

* timeout fix

* const ref

* comments

* fix

* fix

* Move actor state into actor handle

* comments 2

* enable by default

* temp reorder

* some fixes

* add debug code

* tmp

* fix

* wip

* remove dbg

* fix compile

* fix

* fix check

* remove non direct tests

* Increment ref count before resolving value

* rename

* fix another bug

* tmp

* tmp

* Fix object pinning

* build change

* lint

* ActorManager

* tmp

* ActorManager

* fix test component failures

* Remove old code

* Remove unused

* fix

* fix

* fix resources

* fix advanced

* eric's diff

* blacklist

* blacklist

* cleanup

* annotate

* disable tests for now

* remove

* fix

* fix

* clean up verbosity

* fix test

* fix concurrency test

* Update .travis.yml

* Update .travis.yml

* Update .travis.yml

* split up analysis suite

* split up trial runner suite

* fix detached direct actors

* fix

* split up advanced tesT

* lint

* fix core worker test hang

* fix bad check fail which breaks test_cluster.py in tune

* fix some minor diffs in test_cluster

* less workers

* make less stressful

* split up test

* retry flaky tests

* remove old test flags

* fixes

* lint

* Update worker_pool.cc

* fix race

* fix

* fix bugs in node failure handling

* fix race condition

* fix bugs in node failure handling

* fix race condition

* nits

* fix test

* disable heartbeatS

* disable heartbeatS

* fix

* fix

* use worker id

* fix max fail

* debug exit

* fix merge, and apply [PATCH] fix concurrency test

* [patch] fix core worker test hang

* remove NotifyActorCreation, and return worker on completion of actor creation task

* remove actor diied callback

* Update core_worker.cc

* lint

* use task manager

* fix merge

* fix deadlock

* wip

* merge conflits

* fix

* better sysexit handling

* better sysexit handling

* better sysexit handling

* check id

* better debug

* task failed msg

* task failed msg

* retry failed tasks with delay

* retry failed tasks with delay

* clip deps

* fix

* fix core worker tests

* fix task manager test

* fix all tests

* cleanup

* set to 0 for direct tests

* dont check worker id for ownership rpc

* dont check worker id for ownership rpc

* debug messages

* add comment

* remove debug statements

* nit

* check worker id

* fix test

* owner

* fix tests
2019-12-13 13:58:04 -08:00
Edward Oakes
032e8553c7
use numpy in long-running tests (#6448) 2019-12-11 17:53:30 -08:00
alindkhare
76e678d775 [Serve] Added deadline awareness (#6442)
* [Serve] Added deadline awareness

Added deadline awareness while enqueuing a query
Using Blist sorted-list implementation (ascending order) to get queries according to their specified deadlines. [buffer_queues]
Exposed slo_ms via handle/http request
Added slo example 
The queries in example will be executed in almost the opposite order of which they are fired
Added slo pytest
Added check for slo_ms to not be negative
Included the changes suggested

* Linting Corrections

* Adding the code changes suggested by format.sh

* Added the suggested changes

Added justification for blist
Added blist in travis/ci/install-dependencies.sh

* Fixed linting issues

* Added blist to ray/doc/requirements-doc.txt
2019-12-11 16:41:54 -08:00
Simon Mo
c61db84b8d Bump dev6->dev7 for two files not changed yet. (#6428) 2019-12-10 20:58:14 -08:00
Chaokun Yang
6272907a57 [Streaming] Streaming data transfer and python integration (#6185) 2019-12-10 20:33:24 +08:00
Victor Le
4e24c805ee AlphaZero and Ranked reward implementation (#6385) 2019-12-07 12:08:40 -08:00
Edward Oakes
f63b64310a
Bump version to 0.8.0.dev7 (#6303) 2019-12-05 18:33:54 -08:00
Philipp Moritz
a454c815f1
Fix long running stress tests (#6374) 2019-12-05 18:29:41 -08:00
Philipp Moritz
dd27bfbb75
Rename .rayproject to ray-project (#6278) 2019-12-05 16:15:42 -08:00
Eric Liang
4c6739476b
[rllib] Raise an error if GPUs are enabled but not tf.test.is_gpu_available() (#6365) 2019-12-05 10:13:54 -08:00
Simon Mo
31113aeded
Use rayproject repo (#6353) 2019-12-03 22:36:40 -08:00
Eric Liang
e5863d7914
Force tune tests to run in direct call mode (#6301)
* force tune direct mode

* force tune

* fix

* Update run_multi_node_tests.sh
2019-11-27 19:58:33 -08:00
Simon Mo
dd80c6e6d4 Hotfix make docker images building optional (#6309)
* Make docker build optional

* Fix syntax error
2019-11-27 20:52:21 -06:00
Simon Mo
22b305223a
Build Docker Containers for Linux Wheels (#6233) 2019-11-27 17:05:36 -08:00
Edward Oakes
141d667cee
Fix bash syntax error in test-wheels.sh (#6290) 2019-11-26 13:15:54 -06:00
Edward Oakes
7f8de61441 [hotfix] Remove python/ray/tests/__init__.py (#6279)
* Remove python/ray/tests/__init__.py for bazel

* Comment out checks
2019-11-25 17:04:20 -08:00
Eric Liang
64a3a7239e
Set RAY_FORCE_DIRECT=1 for run_rllib_tests, test_basic (#6171) 2019-11-25 14:12:11 -08:00
Eric Liang
7917bbef78
Set progress report interval for bazel explicitly (#6262)
* set progress internval

* add keep alive

* add keepalive

* remove cat

* smaller time

* squash error

* reduce log spam
2019-11-24 22:37:59 -08:00
Eric Liang
53641f1f74
Move more unit tests to bazel (#6250)
* move more unit tests to bazel

* move to avoid conflict

* fix lint

* fix deps

* seprate

* fix failing tests

* show tests

* ignore mismatch

* try combining bazel runs

* build lint

* remove tests from install

* fix test utils

* better config

* split up

* exclusive

* fix verbosity

* fix tests class

* cleanup

* remove flaky

* fix metrics test

* Update .travis.yml

* no retry flaky

* split up actor

* split basic test

* split up trial runner test

* split stress

* fix basic test

* fix tests

* switch to pytest runner for main

* make microbench not fail

* move load code to py3

* test is no longer package

* bazel to end
2019-11-24 11:43:34 -08:00
Simon Mo
9f0d005ce6
Use jobs 50 (#6255) 2019-11-24 00:32:38 -08:00
Simon Mo
f53f576120
Quiet Wget (#6244) 2019-11-22 14:32:14 -08:00
Simon Mo
c4132b501b [CI] Add Remote Caching (#6210) 2019-11-21 11:36:36 -08:00
Eric Liang
f3f86385d6
Minimal implementation of direct task calls (#6075) 2019-11-12 11:45:28 -08:00
Philipp Moritz
ccbcc4bafa
Use GRCP and Bazel 1.0 (#6002) 2019-11-08 15:58:28 -08:00
daiyaanarfeen
8f6d73a93a [sgd] Extend distributed pytorch functionality (#5675)
* raysgd

* apply fn

* double quotes

* removed duplicate TimerStat

* removed duplicate find_free_port

* imports in pytorch_trainer

* init doc

* ray.experimental

* remove resize example

* resnet example

* cifar

* Fix up after kwargs

* data_dir and dataloader_workers args

* formatting

* loss

* init

* update code

* lint

* smoketest

* better_configs

* fix

* fix

* fix

* train_loader

* fixdocs

* ok

* ok

* fix

* fix_update

* fix

* fix

* done

* fix

* fix

* fix

* small

* lint

* fix

* fix

* fix_test

* fix

* validate

* fix

* fi
2019-11-05 11:16:46 -08:00
Richard Liaw
e94bebb1de
[tune] Fix Jenkins tests (#6028) 2019-11-01 16:42:04 -07:00
Simon Mo
c8d7065bf3
[CI] Use rerunfailures instead of flaky (#6061)
* Use rerunfailures instead of flaky

* Lint
2019-11-01 13:59:03 -07:00
Philipp Moritz
f7455839bf
Expose raylet info to dashboard (#6045) 2019-10-31 17:36:59 -07:00
Simon Mo
4c4342c165
Bring back pytest-sugar (#6038)
* Add cloudpickle as doc requirements

* Bring back pytest-sugar

* Revert "Add cloudpickle as doc requirements"

This reverts commit 2206e9e62ee20d93638e115f07a3fc933cbad9a3.
2019-10-28 20:24:28 -07:00
Stephanie Wang
eb41c945a1 Add gRPC endpoint to raylet to expose metrics (#6005) 2019-10-26 16:37:39 -07:00
Richard Liaw
48ba484640
[tune] Test TF2.0, TF1.14, TF1.12 Tensorboard support (#5931) 2019-10-18 13:50:42 -07:00
Richard Liaw
d52a4983af
Update TF documentation (#5918) 2019-10-16 01:31:27 -07:00
Richard Liaw
9f23620412
[tune] tf2.0 mnist example (#5898)
* tfmnistexample

* tfmnist

* add_to_ci

* format

* exampledownlaod

* fix
2019-10-15 22:25:01 -07:00
Edward Oakes
abbfe7392f
Bump dev version to 0.8.0.dev6 (#5906) 2019-10-14 11:36:13 +01:00
Richard Liaw
1650f7b174
[tune] Remove TF MNIST example + add TrialRunner hook to execut… (#5868)
* remove test

* add trial runner

* remvoerestore

* Remove other mnist examples

* tunetest

* revert

* v1

* Revert "v1"

This reverts commit c8bddaf2db7a8270c43c02021cac0e75df15ed20.

* Revert "revert"

This reverts commit b58f56884a0c288d3a6f997d149ab4d496ddd7a3.

* errors

* format
2019-10-13 20:33:56 -07:00
Robert Nishihara
523c764c25
Python 2 compatibility. (#5887) 2019-10-10 19:09:25 -07:00
Eric Liang
04e997fe0d
Fix TF2 / rllib test (#5846) 2019-10-07 14:25:16 -07:00
Simon Mo
e8570874b6
[Serve] Implement flask_request and named python request (#5849)
* Implement flask_request and named python request

* Forgot to include missing files

* Address comment

* Add flask to requirements for doc (lint failed)

* Update doc requirement so lint will build

* Install flask in CI

* Fix typo in .travis.yml
2019-10-06 15:12:30 -07:00
Anthony Yu
b99cdf4e39 [tune] PBT + Memnn example (#5723)
* Add example file

* Move into train function

* Somewhat working example of MemNN, still has some failed trials

* Reorganize into a class

* Small fixes

* Iteration decrease and fix hyperparam_mutations

* Add example file

* Move into train function

* Somewhat working example of MemNN, still has some failed trials

* Reorganize into a class

* Small fixes

* Iteration decrease and fix hyperparam_mutations

* Some style edits

* Address PR changes without modifying learning rate

* Add configs and hyperparameter mutations

* Add tune test

* Modify import locations

* Some parameter changes for testing

* Update memnn example

* Add tensorboard support and address PR comment

* Final changes

* lint

* generator
2019-10-05 09:22:37 -07:00
Edward Oakes
972dddd776
[autoscaler] Kubernetes autoscaler backend (#5492)
* Add Kubernetes NodeProvider to autoscaler

* Split off SSHCommandRunner

* Add KubernetesCommandRunner

* Cleanup

* More config options

* Check if auth present

* More auth checks

* Better output

* Always bootstrap config

* All working

* Add k8s-rsync comment

* Clean up manual k8s examples

* Fix up submit.yaml

* Automatically configure permissisons

* Fix get_node_provider arg

* Fix permissions

* Fill in empty auth

* Remove ray-cluster from this PR

* No hard dep on kubernetes library

* Move permissions into autoscaler config

* lint

* Fix indentation

* namespace validation

* Use cluster name tag

* Remove kubernetes from setup.py

* Comment in example configs

* Same default autoscaling config as aws

* Add Kubernetes quickstart

* lint

* Revert changes to submit.yaml (other PR)

* Install kubernetes in travis

* address comments

* Improve autoscaling doc

* kubectl command in setup

* Force use_internal_ips

* comments

* backend env in docs

* Change namespace config

* comments

* comments

* Fix yaml test
2019-10-03 10:17:00 -07:00
Edward Oakes
ef1a61ab57
Log output in test_dead_actors.py (#5831) 2019-10-02 14:40:55 -07:00
Edward Oakes
443feb75f0 Fix test (#5810) 2019-09-30 19:39:53 -07:00
Richard Liaw
baf85c6665
[tune/sgd] Fix Jenkins (#5765) 2019-09-27 09:59:08 -07:00
Eric Liang
b5da32df78 Bump Ray version in documentation to dev5 (#5794) 2019-09-27 00:19:17 -07:00
Philipp Moritz
57a5871ea6
Convert long running stress tests to projects (#5641) 2019-09-26 11:25:09 -07:00
Eric Liang
5ecb02fb80
Release 0.7.5 updates (#5727) 2019-09-26 10:30:37 -07:00
Richard Liaw
10f21fa313
[docs] Convert Examples to Gallery (#5414) 2019-09-24 15:46:56 -07:00
Mitchell Stern
98dcc1d440 [Dashboard] Add initial version of new dashboard (#5730) 2019-09-23 08:50:40 -07:00
Robert Nishihara
1cfadf032e
Properly test Python wheels in Travis. (#5749) 2019-09-21 18:03:10 -07:00