Commit graph

1461 commits

Author SHA1 Message Date
Eric Liang
5ab5017c67
[rllib] Fix impala stress test (#5101)
* add copy

* upgrade to tf 1.14

* update

* reduce count to workaround https://github.com/ray-project/ray/issues/5125

* Update impala.py

* placeholder

* comments

* update
2019-07-09 20:22:30 -07:00
Eric Liang
5aec750107
Add warning/error if object store memory exceeds available memory (#4893)
* exclude

* format

* add warning

* hatch

* reduce mem usage

* reduce object store mem

* set obj mem
2019-07-08 21:37:08 -07:00
Stefan Pantic
dfc94ce7bc [rllib]Add entropy coeff decay (#5043) 2019-07-08 18:30:32 -07:00
Daniel Edgecumbe
eeb67db861 [autoscaler] Log AWS NodeProvider create_instances (#4998)
* autoscaler: Log on AWS NodeProvider create_instances

* logging
2019-07-08 13:22:26 -07:00
Hao Chen
8a30b93e42
Define common data structures with protobuf. (#5121) 2019-07-08 22:41:37 +08:00
Sam Toyer
7ad854d4c6 [tune] Use traceback.format_tb() (fixes #5135) (#5136) 2019-07-08 01:13:06 -07:00
Eric Liang
893744b3be
[rllib] Revert "use make template" which seems to break DQN/Atari (#5134)
* Revert "use make template"

This reverts commit 291e9e0031c6e315fe24e5b4973dea375fe73918.

* debug vars
2019-07-07 19:51:26 -07:00
Morgan Giraud
7e020e7183 [tune] tune.run keep_checkpoints_num (#5117)
* Add missing argument keep_checkpoints_num to tune

* expose keep checkpoints
2019-07-07 17:14:56 -07:00
Edward Oakes
8f53364097 Improve local_mode (#5060) 2019-07-07 17:10:50 -07:00
Eric Liang
932d6b2517
[rllib] Port IMPALA to ModelV2/build_tf_policy (#5130)
* port vtrace

* fix vf

* fix vs

* fix the example

* wip ddpg

* fix tests

* fix tests

* remove ddpg model

* comments

* set vf share layers True by default

* typo

* fix test
2019-07-07 15:06:41 -07:00
Richard Liaw
6a14f1a540 [autoscaler] Small fixes for local cluster usability (#4864) 2019-07-06 21:55:18 -07:00
Richard Liaw
1798d4f077 [autoscaler] Add hard kill and monitor commands (#5082)
* Add hard kill and monitor commands

* better_commands

* Update python/ray/scripts/scripts.py

Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>
2019-07-06 21:52:55 -07:00
Eric Liang
445bcb29b0
[hotfix] fix backward compat with older yaml libraries 2019-07-06 20:41:28 -07:00
Eric Liang
c15ed3ac55
[rllib] Shuffle RNN sequences in PPO as well (#5129)
* shuffle seq

* fix test
2019-07-06 20:40:49 -07:00
Brandon Bertelsen
c04b69902c Updates for #5072 (#5091) 2019-07-06 16:05:50 -07:00
Eric Liang
0448847a02
Update protobuf version (#5128) 2019-07-06 15:59:55 -07:00
Aleksei Petrenko
09bde397c9 Multiagent experiment resume (#5102)
* Fixed problem with multiagent experiment resume

* Applied format script

* fix lint
2019-07-06 11:38:17 -07:00
Dušan Josipović
e9b88dcbed [wingman -> tune] Add system performance tracking (#4924) 2019-07-06 00:57:35 -07:00
Richard Liaw
c3e9d94b18
[tune][minor] Reduce checkpointing frequency (#4859) 2019-07-06 00:54:24 -07:00
Kim Jeong Ju
4b56a5eb27 [tune] missing torch.load in mnist_pytorch_trainable.py (#5103) 2019-07-06 00:14:41 -07:00
Philipp Moritz
c5253cc300 Add job table to state API (#5076) 2019-07-06 00:05:48 -07:00
Richard Liaw
53d5a8a45f
[tune] Fix sort (#5111)
* fix sort

* fix tune list-experiments

* Update python/ray/tune/tests/test_commands.py
2019-07-05 16:05:10 -07:00
Robert Nishihara
9cc4cc6a52
Fail format.sh if yapf/flake8 versions are incorrect. (#5083) 2019-07-04 23:22:01 -07:00
ztangent
41a16c55ef [tune] Fixed bug with joining experiment_path twice. (#5106) 2019-07-03 22:48:07 -07:00
Patrick
1a543a6571 [serve] add missing __init__.py file under serve/utils (#4609)
* bugfix: add missing serve/utils __init__.py file

* Update __init__.py

* lint
2019-07-03 17:27:59 -07:00
Richard Liaw
0dbb6c4911
[tune] PBT perturbing after first iteration (#5097) 2019-07-03 17:27:26 -07:00
Eric Liang
34d054ff19
[rllib] ModelV2 API (#4926) 2019-07-03 15:59:47 -07:00
Kristian Hartikainen
9e0192bc0b [tune] Change the log syncing behavior (#4450)
* Change the log syncing behavior

* fix up abstractions for syncer

* Finished checkpoint syncing

* Code

* Set of changes to get things running

* Fixes for log syncing

* Fix parts

* Lint and other fixes

* fix some test

* Remove extra parsing functionality

* some test fixes

* Fix up cloud syncing

* Another thing to do

* Fix up tests and local sync

Changes LogSync into a mixin, and adds tests for different
functionalities.

* Fix up tests, start on local migration

* fix distributed migrations

* comments

* formatting

* Better checkpoint directory handling

* fix tests

* fix tests

* fix click

* comments

* formatting comments

* formatting and comments

* sync function deprecations

* syncfunction

* Add documentation for Syncing and Uploading

* nit

* BaseSyncer as base for Mixin in edge case

* more docs

* clean up assertions

* validate

* nit

* Update test_cluster.py

* betterdoc

* Update tune-usage.rst

* cleanup

* nit
2019-07-02 20:46:00 -07:00
Eric Liang
904dcf081d
Switch cluster longevity tests to DLAMI, fix ray up verbosity (#5084)
* fix

* add branch commit

* comments

* Update ci/long_running_tests/.gitignore

Co-Authored-By: Robert Nishihara <robertnishihara@gmail.com>
2019-07-02 00:19:05 -07:00
Simon Mo
d7ccfbe46b Bump version to 0.8.0.dev2 (#5069) 2019-06-29 23:30:26 -07:00
Simon Mo
b5d473847c bump version to 0.7.2 (#5066) 2019-06-29 19:06:51 -07:00
Joey Jiang
d6bbbdef35 Use gRPC to handle communication and data transmission between object manager (#4996) 2019-06-28 10:56:34 +08:00
Qing Wang
62e4b591e3
[ID Refactor] Rename DriverID to JobID (#5004)
* WIP

WIP

WIP

Rename Driver -> Job

Fix complition

Fix

Rename in Java

In py

WIP

Fix

WIP

Fix

Fix test

Fix

Fix C++ linting

Fix

* Update java/runtime/src/main/java/org/ray/runtime/config/RayConfig.java

Co-Authored-By: Stephanie Wang <swang@cs.berkeley.edu>

* Update src/ray/core_worker/core_worker.cc

Co-Authored-By: Stephanie Wang <swang@cs.berkeley.edu>

* Address comments

* Fix

* Fix CI

* Fix cpp linting

* Fix py lint

* FIx

* Address comments and fix

* Address comments

* Address

* Fix import_threading
2019-06-28 00:44:51 +08:00
Qing Wang
d9768c1cd2
[hotfix] Fix master's linting (#5049)
The linting in CI on master always fail.
2019-06-27 20:21:32 +08:00
Hao Chen
a1156754e9
Fix test_task_forward (#5040) 2019-06-27 14:37:00 +08:00
Daniel Edgecumbe
49c6e81de2 autoscaler/monitor: Kill workers on exception (#4997) 2019-06-26 17:59:12 -07:00
Robert Nishihara
a17c08faa4 Lengthen buffer in resource test. (#4961) 2019-06-26 09:54:04 -07:00
Richard Liaw
b1827d5fbe
[tune] Update MNIST Example (#4991) 2019-06-25 22:50:15 -07:00
Philipp Moritz
bbe3e5b4ed [rllib] Give error if sample_async is used with pytorch for A3C (#5000)
* give error if sample_async is used with pytorch

* update

* Update a3c.py
2019-06-25 22:06:35 -07:00
Eric Liang
aa5fc52e32
[rllib] Add QMIX mixer parameters to optimizer param list (#5014)
* add mixer params

* Update qmix_policy.py
2019-06-25 19:02:40 -07:00
Hao Chen
0131353d42 [gRPC] Migrate gcs data structures to protobuf (#5024) 2019-06-25 14:31:19 -07:00
Qing Wang
e33d0eac68
Add dynamic worker options for worker command. (#4970)
* Add fields for fbs

* WIP

* Fix complition errors

* Add java part

* FIx

* Fix

* Fix

* Fix lint

* Refine API

* address comments and add test

* Fix

* Address comment.

* Address comments.

* Fix linting

* Refine

* Fix lint

* WIP: address comment.

* Fix java

* Fix py

* Refin

* Fix

* Fix

* Fix linting

* Fix lint

* Address comments

* WIP

* Fix

* Fix

* minor refine

* Fix lint

* Fix raylet test.

* Fix lint

* Update src/ray/raylet/worker_pool.h

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

* Update java/runtime/src/main/java/org/ray/runtime/AbstractRayRuntime.java

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

* Address comments.

* Address comments.

* Fix test.

* Update src/ray/raylet/worker_pool.h

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

* Address comments.

* Address comments.

* Fix

* Fix lint

* Fix lint

* Fix

* Address comments.

* Fix linting
2019-06-23 18:08:33 +08:00
Joey Jiang
a7f84b536f Fix no cpus test (#5009) 2019-06-21 17:08:25 +08:00
Richard Liaw
31b6da12f9
[tune] Tutorial UX Changes (#4990)
* add integration, iris, ASHA, recursive changes, set reuse_actors=True, and enable Analysis as a return object

* docstring

* fix up example

* fix

* cleanup tests

* experiment analysis
2019-06-21 12:59:49 +08:00
Eric Liang
1d17125333 temp fix for build (#5006) 2019-06-20 18:07:44 -07:00
Andrew Berger
e59e8074dd fix handling of non-integral timeout values in signal.receive (#5002) 2019-06-20 15:33:40 -07:00
Hao Chen
2bf92e02e2
[gRPC] Use gRPC for inter-node-manager communication (#4968) 2019-06-17 19:00:50 +08:00
Simon Mo
05e2748070 Inherit Function Docstrings and other metedata (#4985) 2019-06-15 11:01:27 -07:00
Eric Liang
fa1d4c9807
[rllib] Fix DDPG example (#4973) 2019-06-13 15:07:46 -07:00
Robert Nishihara
d2f5b71c3b Remove typing from setup.py install_requirements. (#4971) 2019-06-12 15:02:12 -07:00