Commit graph

3387 commits

Author SHA1 Message Date
Siyuan (Ryans) Zhuang
95241f6686 Fix the incorrect serialization behavior with pickle (#5960) 2019-10-22 18:08:36 -07:00
Philipp Moritz
b6e7ed20ce
Fix random numbers on linux wheel build (#5975) 2019-10-22 17:52:12 -07:00
Eric Liang
f7bda0abad
[rllib] Fix rnn shape with multi-dimensional data (#5939)
* fix shape

* add test

* Update rnn_sequencing.py
2019-10-22 11:07:26 -07:00
Richard Liaw
81dd0dfb0a
[tune] fix conditional identifier (#5971)
* fix conditional identifier

* fix

* doc
2019-10-22 02:00:49 -07:00
Leo Sklyut
832b5ce1f6 [docs] fix code block display (#5967) 2019-10-22 00:45:38 -07:00
Richard Liaw
252a5d13ed
[sgd/tune][minor] more tf ports (#5953) 2019-10-21 16:46:16 -07:00
Mitchell Stern
235dec8aa3 [Dashboard] Remove token authentication from dashboard (#5888) 2019-10-21 12:48:48 -07:00
Richard Liaw
26a724c5e6
[core] Support kwargs and positionals in Ray remote calls (#5606) 2019-10-20 22:40:54 -07:00
Edward Oakes
fc56872012
Send active object IDs to the raylet (#5803)
* Send active object IDs to the raylet

* comment

* comments

* dedup

* signed int in config

* comments

* Remove object ID from monitor

* Fix test

* re-add check

* fix cast

* check if core worker

* Add comment

* Reservoir sampling

* Fix lint

* Pointer return

* tmp

* Fix merge

* Initialize object ids properly

* Fix lint
2019-10-20 22:05:28 -07:00
Zhuohan Li
f286356e06 [docs] add pages about examples on training language models with fairseq (#5755)
* add pages about examples on training language models with fairseq and ray autoscaler

* better format

* update ray_train.sh

* Move EFS to the autoscaler file

* nits

* add comments to the code & use a new way to implement checkpoint hook

* small bug fix

* polish the doc

* fix formatting

* yaml

* update docs

* fix the bugs and add preprocess.sh

* fix lint

* Reduce batch size & fix lint

* shorttitle
2019-10-20 20:28:16 -07:00
Simon Mo
6b36ef1138
[Serve] Ensure strict traffic splitting (#5929)
* [Serve] Ensure strict traffic splitting

* Fix test
2019-10-20 20:18:14 -07:00
Stephanie Wang
bc4a0de4da
Fix multiple drivers for named actors and add test (#5956) 2019-10-20 16:04:21 -07:00
Richard Liaw
74852c80cb
[docs] Improve more serialization Errors (#5658) 2019-10-20 14:06:00 -07:00
Richard Liaw
91acecc9f9
[tune][minor] gpu warning (#5948)
* gpu

* formaat

* defaults

* format_and_check

* better registration

* fix

* fix

* trial

* foramt

* tune
2019-10-19 17:09:48 -07:00
Philipp Moritz
d23696de17
Introduce flag to use pickle for serialization (#5805) 2019-10-18 22:29:36 -07:00
Philipp Moritz
29eee7f970
Forward multiple ports for autoscaler (#5893) 2019-10-18 16:50:46 -07:00
Richard Liaw
48ba484640
[tune] Test TF2.0, TF1.14, TF1.12 Tensorboard support (#5931) 2019-10-18 13:50:42 -07:00
Stephanie Wang
697f765efc
Refactor CoreWorker to remove TaskInterface (#5924)
* Remove TaskInterface

* Remove Status return value

* Remove CActorHandle, some return values, TaskSubmitter

* lint

* doc

* doc

* fix build

* lint

* Return Status, guarded by annotation, fail tasks for RECONSTRUCTING actors

* fix

* move annotation

* revert

* Fix core worker test

* nits
2019-10-18 00:03:57 -04:00
Stephanie Wang
3ac8592dcf
Remove actor handle IDs (#5889)
* Remove actor handle ID from main ActorHandle constructor

* Set the actor caller ID when calling submit task instead of in the actor handle

* Remove ActorHandle::Fork, remove actor handle ID from protobuf

* Make inner actor handle const, remove new_actor_handles

* Move caller ID into the common task spec, start refactoring raylet

* Some fixes for forking actor handles

* Store ActorHandle state in CoreWorker, only expose actor ID to Python

* Remove some unused fields

* lint

* doc

* fix merge

* Remove ActorHandleID from python/cpp

* doc

* Fix core worker test

* Move actor table subscription to CoreWorker, reset actor handles on actor failure

* lint

* Remove GCS client from direct actor

* fix tests

* Fix

* Fix tests for raylet codepath

* Fix local mode

* Fix multithreaded test

* Fix AsyncSubscribe issue...

* doc

* fix serve

* Revert bazel
2019-10-17 12:36:34 -04:00
Stefan Otte
d70abcfd70 Fix typo in examples/centralized_critic.py (#5943)
`opp_ops` should be `opp_obs`.
2019-10-17 08:42:50 -07:00
Alexander Scammon
4d08d3c188 Add dependencies for dashboard to installation.rst (#5942)
Updating the docs to include pip installing `aiohttp` and `psutil`, both of which the dashboard requires.  Since the whole dashboard section is optional, I thought I'd just add it in the docs rather than make it an explicit requirement of the project.  Tell me if you'd prefer them as requirements in the `setup.py`, though.
2019-10-17 00:39:56 -07:00
Philipp Moritz
32b2907457
Update max resource label and give better error message (#5916) 2019-10-16 22:37:01 -07:00
Peter Schafhalter
6c11b534c8 [Autoscaler] Update AWS Deep Learning AMI to version 24.3 (#5932) 2019-10-16 16:50:54 -07:00
Richard Liaw
d52a4983af
Update TF documentation (#5918) 2019-10-16 01:31:27 -07:00
Richard Liaw
9f23620412
[tune] tf2.0 mnist example (#5898)
* tfmnistexample

* tfmnist

* add_to_ci

* format

* exampledownlaod

* fix
2019-10-15 22:25:01 -07:00
Eric Liang
6843a01a7f
Automatically create custom node id resource (#5882)
* node id

* comment

* comments

* fix tests
2019-10-15 21:31:11 -07:00
Richard Liaw
c52bb0621d
[tune] Support TF2.0 on Keras Callback (#5912) 2019-10-15 10:49:50 -07:00
Eric Liang
69d5c1b53a
remove evil redirects (#5919) 2019-10-14 19:41:04 -07:00
Philipp Moritz
5382a26c2e Deactivate bazel caching for linux wheels (#5915) 2019-10-14 15:48:23 -07:00
Camille Couturier
320cba313f [tune] Explicitly set scheduler in run() (#5871)
* Explicitely set scheduler in run()

* Better formatting/indentation (after running format.sh)

* Remove accidental paste in parameters definitions.

* format
2019-10-14 15:44:59 -07:00
Richard Liaw
7f4141df4e
[docs] Pictures for all the Examples (#5859)
* image

* plot resnet

* hyperparam

* fixup_pictures

* custom_direct
2019-10-14 14:18:52 -07:00
Philipp Moritz
8fd23c0c3f
Add back TensorFlow test (#5885) 2019-10-14 11:26:02 -07:00
Richard Liaw
20c0cdee4f
[autoscaler] Worker-Head termination + Better Scale-up message (#5909) 2019-10-14 10:37:50 -07:00
Edward Oakes
abbfe7392f
Bump dev version to 0.8.0.dev6 (#5906) 2019-10-14 11:36:13 +01:00
Richard Liaw
1650f7b174
[tune] Remove TF MNIST example + add TrialRunner hook to execut… (#5868)
* remove test

* add trial runner

* remvoerestore

* Remove other mnist examples

* tunetest

* revert

* v1

* Revert "v1"

This reverts commit c8bddaf2db7a8270c43c02021cac0e75df15ed20.

* Revert "revert"

This reverts commit b58f56884a0c288d3a6f997d149ab4d496ddd7a3.

* errors

* format
2019-10-13 20:33:56 -07:00
Richard Liaw
52e5c9b22d
[tune] CPU-Only Head Node support (#5900)
* trialqueue

* add tests
2019-10-13 20:31:42 -07:00
Eric Liang
2cbc67f3d5 Fix test_dying_worker_get (#5908) 2019-10-13 18:06:28 -07:00
Richard Liaw
0f24509c30 [autoscaler] uptime redirect fix (#5907)
* small change

* comment
2019-10-13 23:25:15 +01:00
Edward Oakes
6eaa8e31fa
[autoscaler] Revert to double-spawning updater threads (#5903)
* [autoscaler] Revert to double-spawning threads

* Use log prefix

* add comment
2019-10-13 20:00:06 +01:00
Simon Mo
97a786cf11
[Serve] Remove handle passing in tail recursion (#5894)
* Remove handle pass in tail recursion

* Quick fix

* Fix worker timeout issue
2019-10-12 20:13:20 -07:00
Matthew A. Wright
0110941de5 rllib: use pytorch's fn to see if gpu is available (#5890) 2019-10-12 00:13:00 -07:00
Richard Liaw
898652837c
[minor][docs] Remove example link (#5880) 2019-10-11 11:49:18 -07:00
Eric Liang
0e8c3c0346
Don't wrap RayError with RayTaskError (#5870) 2019-10-11 11:00:08 -07:00
Edward Oakes
779f91523b [autoscaler] Fix quoting (#5891) 2019-10-11 00:40:26 -07:00
Simon Mo
4b99cb429e [Serve] Hotfix: Fix actor handle hashing in metric monitoring (#5886) 2019-10-11 00:31:42 -07:00
Robert Nishihara
523c764c25
Python 2 compatibility. (#5887) 2019-10-10 19:09:25 -07:00
Eric Liang
c3b2ae26c5
Fix str of RayTaskError (#5878)
* fix key error

* fix
2019-10-10 16:53:18 -07:00
Philipp Moritz
1100556ba2
Fix linux wheel build (#5881) 2019-10-10 16:15:26 -07:00
Mitchell Stern
195ca43e9c [Dashboard] Improve handling of logs and errors in dashboard backend (#5857)
* Improve handling of logs and errors in dashboard backend

* Update nested dict comprehension for clarity
2019-10-10 11:59:54 -07:00
Eric Liang
1a8ac3db46
Implement fair task queueing to prevent task starvation (#5851)
* initial commit

* lint

* clarify

* add feature flag

* comment

* add timeout to test

* fix print

* comment

* use id for scheduling class

* lint

* dad warn

* flake
2019-10-08 21:04:25 -07:00