Commit graph

384 commits

Author SHA1 Message Date
Sven
f1b56fa5ee PG unify/cleanup tf vs torch and PG functionality test cases (tf + torch). (#6650)
* Unifying the code for PGTrainer/Policy wrt tf vs torch.
Adding loss function test cases for the PGAgent (confirm equivalence of tf and torch).

* Fix LINT line-len errors.

* Fix LINT errors.

* Fix `tf_pg_policy` imports (formerly: `pg_policy`).

* Rename tf_pg_... into pg_tf_... following <alg>_<framework>_... convention, where ...=policy/loss/agent/trainer.
Retire `PGAgent` class (use PGTrainer instead).

* - Move PG test into agents/pg/tests directory.
- All test cases will be located near the classes that are tested and
  then built into the Bazel/Travis test suite.

* Moved post_process_advantages into pg.py (from pg_tf_policy.py), b/c
the function is not a tf-specific one.

* Fix remaining import errors for agents/pg/...

* Fix circular dependency in pg imports.

* Add pg tests to Jenkins test suite.
2020-01-02 16:08:03 -08:00
mehrdadn
f4b29dae9c Perform Bazel install directly in Windows CI (#6653) 2019-12-31 20:48:08 -08:00
Robert Nishihara
d2c6457832
Remove public facing references to --redis-address. (#6631) 2019-12-31 13:21:53 -08:00
Philipp Moritz
735f282494
Use 0.9.0.dev0 as the version tag (#6630) 2019-12-30 10:14:07 -08:00
Robert Nishihara
96f2f8ff10 Stop testing Python 2.7 and building Python 2.7 wheels. (#6601) 2019-12-27 20:47:49 -08:00
Robert Nishihara
eb0813ea35
Re-enable UI tests for wheels. (#6602) 2019-12-26 22:34:56 -08:00
Philipp Moritz
eaee672b7f
Revert "Perform Bazel install directly in Windows CI (#6529)" (#6593)
This reverts commit c5f141013b.
2019-12-24 16:39:07 -08:00
micafan
687de41273 [GCS] refactor the GCS Client Node Interface (#6010) 2019-12-24 20:36:37 +08:00
mehrdadn
c5f141013b Perform Bazel install directly in Windows CI (#6529) 2019-12-22 16:14:51 -08:00
Chaokun Yang
7bbfa85c66 [Streaming] Streaming data transfer java (#6474) 2019-12-22 10:56:05 +08:00
Simon Mo
26ec500ef9
Implement async get for direct actor call (#6339) 2019-12-18 11:50:21 -08:00
Eric Liang
6725a61bda
Release 0.8.0 test logs (#6512) 2019-12-17 15:56:50 -08:00
Eric Liang
1a1324d2a2
Bump version from 0.8.0.dev6 -> 0.9.0.dev (#6508) 2019-12-16 23:57:42 -08:00
Edward Oakes
b1e83d83d1
Print summaries for stress tests (#6498) 2019-12-16 14:14:48 -08:00
Mitchell Stern
1531c21dbd [Dashboard] Add remaining features from old dashboard (#6489)
* [Dashboard] Add remaining features from old dashboard

* Fix linting errors

* Set cluster uptime statistic to N/A

* Use proper singular or plural words for workers column

* Ignore .js, .jsx, .ts, .tsx files in check-git-clang-format-output.sh

* Fix bash quote issue
2019-12-16 11:21:18 -08:00
Richard Liaw
5719a05757
[sgd] Add support for multi-model multi-optimizer training (#6317) 2019-12-15 15:19:45 -08:00
Philipp Moritz
f5d10eea0b
[Projects] Refactor cluster specification (#6488) 2019-12-14 22:43:06 -08:00
Yuhao Yang
ad4da17899 [Tune] Add example and tutorial for DCGAN (#6400) 2019-12-13 14:15:44 -08:00
Eric Liang
be5dd8eb5e
Enable direct calls by default (#6367)
* wip

* add

* timeout fix

* const ref

* comments

* fix

* fix

* Move actor state into actor handle

* comments 2

* enable by default

* temp reorder

* some fixes

* add debug code

* tmp

* fix

* wip

* remove dbg

* fix compile

* fix

* fix check

* remove non direct tests

* Increment ref count before resolving value

* rename

* fix another bug

* tmp

* tmp

* Fix object pinning

* build change

* lint

* ActorManager

* tmp

* ActorManager

* fix test component failures

* Remove old code

* Remove unused

* fix

* fix

* fix resources

* fix advanced

* eric's diff

* blacklist

* blacklist

* cleanup

* annotate

* disable tests for now

* remove

* fix

* fix

* clean up verbosity

* fix test

* fix concurrency test

* Update .travis.yml

* Update .travis.yml

* Update .travis.yml

* split up analysis suite

* split up trial runner suite

* fix detached direct actors

* fix

* split up advanced tesT

* lint

* fix core worker test hang

* fix bad check fail which breaks test_cluster.py in tune

* fix some minor diffs in test_cluster

* less workers

* make less stressful

* split up test

* retry flaky tests

* remove old test flags

* fixes

* lint

* Update worker_pool.cc

* fix race

* fix

* fix bugs in node failure handling

* fix race condition

* fix bugs in node failure handling

* fix race condition

* nits

* fix test

* disable heartbeatS

* disable heartbeatS

* fix

* fix

* use worker id

* fix max fail

* debug exit

* fix merge, and apply [PATCH] fix concurrency test

* [patch] fix core worker test hang

* remove NotifyActorCreation, and return worker on completion of actor creation task

* remove actor diied callback

* Update core_worker.cc

* lint

* use task manager

* fix merge

* fix deadlock

* wip

* merge conflits

* fix

* better sysexit handling

* better sysexit handling

* better sysexit handling

* check id

* better debug

* task failed msg

* task failed msg

* retry failed tasks with delay

* retry failed tasks with delay

* clip deps

* fix

* fix core worker tests

* fix task manager test

* fix all tests

* cleanup

* set to 0 for direct tests

* dont check worker id for ownership rpc

* dont check worker id for ownership rpc

* debug messages

* add comment

* remove debug statements

* nit

* check worker id

* fix test

* owner

* fix tests
2019-12-13 13:58:04 -08:00
Edward Oakes
032e8553c7
use numpy in long-running tests (#6448) 2019-12-11 17:53:30 -08:00
alindkhare
76e678d775 [Serve] Added deadline awareness (#6442)
* [Serve] Added deadline awareness

Added deadline awareness while enqueuing a query
Using Blist sorted-list implementation (ascending order) to get queries according to their specified deadlines. [buffer_queues]
Exposed slo_ms via handle/http request
Added slo example 
The queries in example will be executed in almost the opposite order of which they are fired
Added slo pytest
Added check for slo_ms to not be negative
Included the changes suggested

* Linting Corrections

* Adding the code changes suggested by format.sh

* Added the suggested changes

Added justification for blist
Added blist in travis/ci/install-dependencies.sh

* Fixed linting issues

* Added blist to ray/doc/requirements-doc.txt
2019-12-11 16:41:54 -08:00
Simon Mo
c61db84b8d Bump dev6->dev7 for two files not changed yet. (#6428) 2019-12-10 20:58:14 -08:00
Chaokun Yang
6272907a57 [Streaming] Streaming data transfer and python integration (#6185) 2019-12-10 20:33:24 +08:00
Victor Le
4e24c805ee AlphaZero and Ranked reward implementation (#6385) 2019-12-07 12:08:40 -08:00
Edward Oakes
f63b64310a
Bump version to 0.8.0.dev7 (#6303) 2019-12-05 18:33:54 -08:00
Philipp Moritz
a454c815f1
Fix long running stress tests (#6374) 2019-12-05 18:29:41 -08:00
Philipp Moritz
dd27bfbb75
Rename .rayproject to ray-project (#6278) 2019-12-05 16:15:42 -08:00
Eric Liang
4c6739476b
[rllib] Raise an error if GPUs are enabled but not tf.test.is_gpu_available() (#6365) 2019-12-05 10:13:54 -08:00
Simon Mo
31113aeded
Use rayproject repo (#6353) 2019-12-03 22:36:40 -08:00
Eric Liang
e5863d7914
Force tune tests to run in direct call mode (#6301)
* force tune direct mode

* force tune

* fix

* Update run_multi_node_tests.sh
2019-11-27 19:58:33 -08:00
Simon Mo
dd80c6e6d4 Hotfix make docker images building optional (#6309)
* Make docker build optional

* Fix syntax error
2019-11-27 20:52:21 -06:00
Simon Mo
22b305223a
Build Docker Containers for Linux Wheels (#6233) 2019-11-27 17:05:36 -08:00
Edward Oakes
141d667cee
Fix bash syntax error in test-wheels.sh (#6290) 2019-11-26 13:15:54 -06:00
Edward Oakes
7f8de61441 [hotfix] Remove python/ray/tests/__init__.py (#6279)
* Remove python/ray/tests/__init__.py for bazel

* Comment out checks
2019-11-25 17:04:20 -08:00
Eric Liang
64a3a7239e
Set RAY_FORCE_DIRECT=1 for run_rllib_tests, test_basic (#6171) 2019-11-25 14:12:11 -08:00
Eric Liang
7917bbef78
Set progress report interval for bazel explicitly (#6262)
* set progress internval

* add keep alive

* add keepalive

* remove cat

* smaller time

* squash error

* reduce log spam
2019-11-24 22:37:59 -08:00
Eric Liang
53641f1f74
Move more unit tests to bazel (#6250)
* move more unit tests to bazel

* move to avoid conflict

* fix lint

* fix deps

* seprate

* fix failing tests

* show tests

* ignore mismatch

* try combining bazel runs

* build lint

* remove tests from install

* fix test utils

* better config

* split up

* exclusive

* fix verbosity

* fix tests class

* cleanup

* remove flaky

* fix metrics test

* Update .travis.yml

* no retry flaky

* split up actor

* split basic test

* split up trial runner test

* split stress

* fix basic test

* fix tests

* switch to pytest runner for main

* make microbench not fail

* move load code to py3

* test is no longer package

* bazel to end
2019-11-24 11:43:34 -08:00
Simon Mo
9f0d005ce6
Use jobs 50 (#6255) 2019-11-24 00:32:38 -08:00
Simon Mo
f53f576120
Quiet Wget (#6244) 2019-11-22 14:32:14 -08:00
Simon Mo
c4132b501b [CI] Add Remote Caching (#6210) 2019-11-21 11:36:36 -08:00
Eric Liang
f3f86385d6
Minimal implementation of direct task calls (#6075) 2019-11-12 11:45:28 -08:00
Philipp Moritz
ccbcc4bafa
Use GRCP and Bazel 1.0 (#6002) 2019-11-08 15:58:28 -08:00
daiyaanarfeen
8f6d73a93a [sgd] Extend distributed pytorch functionality (#5675)
* raysgd

* apply fn

* double quotes

* removed duplicate TimerStat

* removed duplicate find_free_port

* imports in pytorch_trainer

* init doc

* ray.experimental

* remove resize example

* resnet example

* cifar

* Fix up after kwargs

* data_dir and dataloader_workers args

* formatting

* loss

* init

* update code

* lint

* smoketest

* better_configs

* fix

* fix

* fix

* train_loader

* fixdocs

* ok

* ok

* fix

* fix_update

* fix

* fix

* done

* fix

* fix

* fix

* small

* lint

* fix

* fix

* fix_test

* fix

* validate

* fix

* fi
2019-11-05 11:16:46 -08:00
Richard Liaw
e94bebb1de
[tune] Fix Jenkins tests (#6028) 2019-11-01 16:42:04 -07:00
Simon Mo
c8d7065bf3
[CI] Use rerunfailures instead of flaky (#6061)
* Use rerunfailures instead of flaky

* Lint
2019-11-01 13:59:03 -07:00
Philipp Moritz
f7455839bf
Expose raylet info to dashboard (#6045) 2019-10-31 17:36:59 -07:00
Simon Mo
4c4342c165
Bring back pytest-sugar (#6038)
* Add cloudpickle as doc requirements

* Bring back pytest-sugar

* Revert "Add cloudpickle as doc requirements"

This reverts commit 2206e9e62ee20d93638e115f07a3fc933cbad9a3.
2019-10-28 20:24:28 -07:00
Stephanie Wang
eb41c945a1 Add gRPC endpoint to raylet to expose metrics (#6005) 2019-10-26 16:37:39 -07:00
Richard Liaw
48ba484640
[tune] Test TF2.0, TF1.14, TF1.12 Tensorboard support (#5931) 2019-10-18 13:50:42 -07:00
Richard Liaw
d52a4983af
Update TF documentation (#5918) 2019-10-16 01:31:27 -07:00