Eric Liang
893744b3be
[rllib] Revert "use make template" which seems to break DQN/Atari ( #5134 )
...
* Revert "use make template"
This reverts commit 291e9e0031c6e315fe24e5b4973dea375fe73918.
* debug vars
2019-07-07 19:51:26 -07:00
Morgan Giraud
7e020e7183
[tune] tune.run keep_checkpoints_num ( #5117 )
...
* Add missing argument keep_checkpoints_num to tune
* expose keep checkpoints
2019-07-07 17:14:56 -07:00
Edward Oakes
8f53364097
Improve local_mode ( #5060 )
2019-07-07 17:10:50 -07:00
Eric Liang
932d6b2517
[rllib] Port IMPALA to ModelV2/build_tf_policy ( #5130 )
...
* port vtrace
* fix vf
* fix vs
* fix the example
* wip ddpg
* fix tests
* fix tests
* remove ddpg model
* comments
* set vf share layers True by default
* typo
* fix test
2019-07-07 15:06:41 -07:00
Richard Liaw
6a14f1a540
[autoscaler] Small fixes for local cluster usability ( #4864 )
2019-07-06 21:55:18 -07:00
Richard Liaw
1798d4f077
[autoscaler] Add hard kill and monitor commands ( #5082 )
...
* Add hard kill and monitor commands
* better_commands
* Update python/ray/scripts/scripts.py
Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>
2019-07-06 21:52:55 -07:00
Eric Liang
445bcb29b0
[hotfix] fix backward compat with older yaml libraries
2019-07-06 20:41:28 -07:00
Eric Liang
c15ed3ac55
[rllib] Shuffle RNN sequences in PPO as well ( #5129 )
...
* shuffle seq
* fix test
2019-07-06 20:40:49 -07:00
Brandon Bertelsen
c04b69902c
Updates for #5072 ( #5091 )
2019-07-06 16:05:50 -07:00
Eric Liang
0448847a02
Update protobuf version ( #5128 )
2019-07-06 15:59:55 -07:00
Aleksei Petrenko
09bde397c9
Multiagent experiment resume ( #5102 )
...
* Fixed problem with multiagent experiment resume
* Applied format script
* fix lint
2019-07-06 11:38:17 -07:00
Dušan Josipović
e9b88dcbed
[wingman -> tune] Add system performance tracking ( #4924 )
2019-07-06 00:57:35 -07:00
Richard Liaw
c3e9d94b18
[tune][minor] Reduce checkpointing frequency ( #4859 )
2019-07-06 00:54:24 -07:00
Kim Jeong Ju
4b56a5eb27
[tune] missing torch.load in mnist_pytorch_trainable.py ( #5103 )
2019-07-06 00:14:41 -07:00
Philipp Moritz
c5253cc300
Add job table to state API ( #5076 )
2019-07-06 00:05:48 -07:00
Richard Liaw
53d5a8a45f
[tune] Fix sort ( #5111 )
...
* fix sort
* fix tune list-experiments
* Update python/ray/tune/tests/test_commands.py
2019-07-05 16:05:10 -07:00
Joey Jiang
4183303a2f
Add bazel build options for plasma to use glog ( #5108 )
2019-07-05 19:00:19 +08:00
Robert Nishihara
9cc4cc6a52
Fail format.sh if yapf/flake8 versions are incorrect. ( #5083 )
2019-07-04 23:22:01 -07:00
Zhijun Fu
54d5969cea
[grpc] Add grpc server to worker ( #5054 )
...
* refactor grpc server
* format
* change GetTask() to PushTask()
* change PushTask to AssignTask
* format
* update
* fix test
* format
* Update src/ray/rpc/worker_client.h
Co-Authored-By: Hao Chen <chenh1024@gmail.com>
* Update BUILD.bazel
* Update src/ray/core_worker/task_execution.cc
Co-Authored-By: Stephanie Wang <swang@cs.berkeley.edu>
* update
* format
* address comments
* format
* Update src/ray/rpc/worker/worker_server.h
Co-Authored-By: Stephanie Wang <swang@cs.berkeley.edu>
* Update src/ray/protobuf/worker.proto
Co-Authored-By: Stephanie Wang <swang@cs.berkeley.edu>
* format
* fix
* format
2019-07-04 20:16:42 +08:00
ztangent
41a16c55ef
[tune] Fixed bug with joining experiment_path twice. ( #5106 )
2019-07-03 22:48:07 -07:00
Patrick
1a543a6571
[serve] add missing __init__.py file under serve/utils ( #4609 )
...
* bugfix: add missing serve/utils __init__.py file
* Update __init__.py
* lint
2019-07-03 17:27:59 -07:00
Richard Liaw
0dbb6c4911
[tune] PBT perturbing after first iteration ( #5097 )
2019-07-03 17:27:26 -07:00
Eric Liang
34d054ff19
[rllib] ModelV2 API ( #4926 )
2019-07-03 15:59:47 -07:00
Kristian Hartikainen
9e0192bc0b
[tune] Change the log syncing behavior ( #4450 )
...
* Change the log syncing behavior
* fix up abstractions for syncer
* Finished checkpoint syncing
* Code
* Set of changes to get things running
* Fixes for log syncing
* Fix parts
* Lint and other fixes
* fix some test
* Remove extra parsing functionality
* some test fixes
* Fix up cloud syncing
* Another thing to do
* Fix up tests and local sync
Changes LogSync into a mixin, and adds tests for different
functionalities.
* Fix up tests, start on local migration
* fix distributed migrations
* comments
* formatting
* Better checkpoint directory handling
* fix tests
* fix tests
* fix click
* comments
* formatting comments
* formatting and comments
* sync function deprecations
* syncfunction
* Add documentation for Syncing and Uploading
* nit
* BaseSyncer as base for Mixin in edge case
* more docs
* clean up assertions
* validate
* nit
* Update test_cluster.py
* betterdoc
* Update tune-usage.rst
* cleanup
* nit
2019-07-02 20:46:00 -07:00
Stephanie Wang
71d4637b75
[core worker] Refactor CoreWorker member classes ( #5062 )
...
* Move store client mutex inside CoreWorkerPlasmaStoreProvider
* Move PlasmaClient inside CoreWorkerStoreProvider
* Remove CoreWorkerObjectInterface's ref to CoreWorker
* Remove WorkerLanguage
* Remove CoreWorkerTaskInterface's ref to CoreWorker
* Remove CoreWorkerTaskExecutionInterface's ref to CoreWorker
* lint
* move comment
* Fix build
* Fix build
2019-07-02 15:30:30 -07:00
Kai Yang
1cf7728f35
[Core worker] Serialize ActorHandle in core worker. Make ActorHandle thread safe. ( #5034 )
...
* Serialize ActorHandle in core worker. Make ActorHandle thread safe.
* Address comments
* Address comments
* Address comments
* Address comments
* lint
* Address comments
* Address comments
* Address comments
* Address comments
* Minor update
* Address comments
* lint
2019-07-02 16:48:43 +08:00
Eric Liang
904dcf081d
Switch cluster longevity tests to DLAMI, fix ray up verbosity ( #5084 )
...
* fix
* add branch commit
* comments
* Update ci/long_running_tests/.gitignore
Co-Authored-By: Robert Nishihara <robertnishihara@gmail.com>
2019-07-02 00:19:05 -07:00
Qing Wang
247f95b3ff
Refine RegisterClientRequest message to make it clearer. ( #5057 )
...
* transfor driver task id Explicitly
* Refins
* Fix and add comment.
* add more
* Fix
* Fix
* Add comments
* Fix
2019-07-02 14:26:19 +08:00
Philipp Moritz
a6a02fccd0
Do not compile redis twice ( #5074 )
2019-07-01 15:42:54 -07:00
Philipp Moritz
4e82313891
Update to latest arrow ( #5011 )
2019-06-30 20:36:36 -07:00
Simon Mo
0c4dd3c401
Use bazel disk cache with travis ( #5068 )
2019-06-30 17:57:48 -07:00
Simon Mo
6c4c1d444d
Update VersionKey in stats ( #5070 )
2019-06-30 18:23:12 +08:00
Simon Mo
d7ccfbe46b
Bump version to 0.8.0.dev2 ( #5069 )
2019-06-29 23:30:26 -07:00
Simon Mo
b5d473847c
bump version to 0.7.2 ( #5066 )
2019-06-29 19:06:51 -07:00
Robert Nishihara
bcc379556b
Make some fixes to long running stress tests. ( #5056 )
2019-06-28 15:42:54 -07:00
Kai Yang
4ccb7b05cc
[Core worker] Add metadata support in object interface ( #5031 )
2019-06-28 11:35:03 -07:00
Hao Chen
cefbb0c94c
Fix driver id in TaskInfo ( #5055 )
2019-06-28 12:56:48 +08:00
Kai Yang
a39982e676
[Core worker] Task execution passes TaskInfo struct to executor ( #5032 )
2019-06-28 10:59:45 +08:00
Joey Jiang
d6bbbdef35
Use gRPC to handle communication and data transmission between object manager ( #4996 )
2019-06-28 10:56:34 +08:00
Qing Wang
62e4b591e3
[ID Refactor] Rename DriverID to JobID ( #5004 )
...
* WIP
WIP
WIP
Rename Driver -> Job
Fix complition
Fix
Rename in Java
In py
WIP
Fix
WIP
Fix
Fix test
Fix
Fix C++ linting
Fix
* Update java/runtime/src/main/java/org/ray/runtime/config/RayConfig.java
Co-Authored-By: Stephanie Wang <swang@cs.berkeley.edu>
* Update src/ray/core_worker/core_worker.cc
Co-Authored-By: Stephanie Wang <swang@cs.berkeley.edu>
* Address comments
* Fix
* Fix CI
* Fix cpp linting
* Fix py lint
* FIx
* Address comments and fix
* Address comments
* Address
* Fix import_threading
2019-06-28 00:44:51 +08:00
Qing Wang
d9768c1cd2
[hotfix] Fix master's linting ( #5049 )
...
The linting in CI on master always fail.
2019-06-27 20:21:32 +08:00
Hao Chen
a1156754e9
Fix test_task_forward ( #5040 )
2019-06-27 14:37:00 +08:00
Hao Chen
469ae41013
Fix memory leak in rpc ServerCall and ClientCall ( #5046 )
2019-06-27 13:19:47 +08:00
Daniel Edgecumbe
49c6e81de2
autoscaler/monitor: Kill workers on exception ( #4997 )
2019-06-26 17:59:12 -07:00
Stephanie Wang
1a8d0af814
Remove debug check for uncommitted lineage ( #5038 )
2019-06-26 11:21:00 -07:00
Robert Nishihara
a17c08faa4
Lengthen buffer in resource test. ( #4961 )
2019-06-26 09:54:04 -07:00
Richard Liaw
b1827d5fbe
[tune] Update MNIST Example ( #4991 )
2019-06-25 22:50:15 -07:00
Philipp Moritz
bbe3e5b4ed
[rllib] Give error if sample_async is used with pytorch for A3C ( #5000 )
...
* give error if sample_async is used with pytorch
* update
* Update a3c.py
2019-06-25 22:06:35 -07:00
Zhijun Fu
bb8e75b532
[grpc] refactor rpc server to support multiple io services ( #5023 )
2019-06-25 19:08:09 -07:00
Eric Liang
aa5fc52e32
[rllib] Add QMIX mixer parameters to optimizer param list ( #5014 )
...
* add mixer params
* Update qmix_policy.py
2019-06-25 19:02:40 -07:00