1
0
Fork 0
mirror of https://github.com/vale981/ray synced 2025-04-01 07:49:12 -04:00
Commit graph

8589 commits

Author SHA1 Message Date
Stephanie Wang
66edebce3a
Spillback scheduling for direct task calls ()
* add dac

* remove cachign

* rename return buffer

* cleanup

* add tests

* add perf

* fix

* flip

* remove

* remove it

* lint

* remove fork safety

* lint

* comments

* s/core/client

* wip

* remove

* fmt

* consistently return direct naming

* basic pass by ref

* fix bugs

* wip

* wip

* wip

* wip

* add test

* works now

* fix constructor

* fix merge

* add todo for perf

* fix single client test

* use lower n

* bazel

* faster

* fix core worker test

* init

* fix tests

* no plasma for direct call

* Update worker.py

* add order test

* fixes

* comments

* remove old assert

* lint

* add test

* Very wip

* wip

* add options for tasks

* add test

* fmt

* add backpressure

* remove idle prof event

* lint

* Fix 0 returns

* Set memcopy threads globally

* add benchmark

* Fix object exists

* Fix reference

* Remove return_buffer

* Add check

* add exit handler

* update benchmarks

* Fix compile error

* Fix NoReturn

* Use is instead of == for NoReturn

* fix

* Remove list comprehension

* Fix core worker test

* comment

* Apply suggestions from code review

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* fix merge error

* lint

* wip

* fix merge

* wip

* finish

* lint

* task interface

* add file

* add

* wip

* now works!

* updated

* wip

* dep resolution

* remove remote dep handling

* comments

* fix test_multithreading

* fix merge

* fix exit handling

* fix merge

* comments

* get fallback fetch working

* handle contains

* fix typo

* Skeleton for SubmitTask proto

* Update src/ray/common/id.h

Co-Authored-By: Stephanie Wang <swang@cs.berkeley.edu>

* comments

* rename to core worker service

* lint

* fix compile

* wip

* update

* error code

* fix up and rename

* clean up call manager

* comments

* add test and cleanup deserialization

* fix pickle

* fix comments, lint

* test todo

* comments

* use shared ptr

* rename

* Update src/ray/protobuf/gcs.proto

Co-Authored-By: Stephanie Wang <swang@cs.berkeley.edu>

* require transport type for ids; lint

* cleanup

* comments 1

* use worker available for real

* wip

* fix test

* resolve local dependencies test

* add num pending metric

* client factory

* unit test task submission

* wip

* fix bug

* rename

* Pass through node manager port, connect in raylet client

* finish rename

* Switch submit task to grpc

* fix crash

* Check port in use

* fix merge

* comments more

* doc

* Remove default port, set port randomly from driver

* add unique_ptr comment about TaskSpec

* lint

* fix test

* update

* fix lint

* GetMessageMutable should not be const

* iwyu

* fix const

* Update direct_task_transport_test.cc

* fix segfault

* Fix test

* Add RpcAddress, set in actor table data

* fix serialization

* fix lint

* Pass through task caller address

* Fix object manager test

* RpcAddress -> Address

* merge

* Port WorkerLease to grpc

* wip

* fix test

* add mem test

* update

* comments

* fix core worker tests

* fix

* remove old worker lease code

* First pass on spillback

* lint

* crash?

* Debug

* Fix task spec copy, extend test basic

* lint

* Port return worker to grpc

* lint

* Return worker to the correct raylet

* Only request worker if queued tasks

* A bit better failure handling

* Fix unit test

* Add unit test for spillback

* fix

* python test multinode

* update

* updates

* fix
2019-11-17 20:29:32 -08:00
Philipp Moritz
fc655acfee
Fix linting on master branch () 2019-11-16 10:02:58 -08:00
Eric Liang
a68cda0a33
[rllib] remove exists call () 2019-11-15 21:59:40 -08:00
Danyang Zhuo
30e2b6b91b Microbenchmark for inter-node object transfer () 2019-11-15 21:39:06 -08:00
Adam Gleave
e8cce3fdd4 [autoscaler]: automatically pull new docker image ()
* Docker: automatically pull new image

* Fix missing value in schema

* Address review comments
2019-11-15 21:26:28 -08:00
Ion
1b80675206 Scheduling ids () 2019-11-15 16:04:16 -08:00
Edward Oakes
dee696577f
Fix passing object ids in local mode () 2019-11-15 15:46:39 -08:00
Edward Oakes
33040d734f
Disable stopgap GC by default ()
* disable stopgap gc by default

* fix gc testss
2019-11-15 15:42:59 -08:00
Hersh Godse
7aa06fb25c [tune] ExperimentalAnalysis in-memory cache () 2019-11-15 12:47:50 -08:00
Eric Liang
7d33e9949b
Integrate ref count module into local memory store () 2019-11-15 10:52:19 -08:00
Richard Liaw
62cbc043b4
[tune] tbx logger ()
* tbx

* add_hparams

* fix_hparams

* ok

* ok

* fix

* ok

* fix
2019-11-15 08:45:44 -08:00
Eric Liang
8ff393a7bd
Handle exchange of direct call objects between tasks and actors () 2019-11-14 17:32:04 -08:00
Edward Oakes
385783fcec
Ray on YARN + Skein Documentation () 2019-11-14 15:06:05 -08:00
Edward Oakes
2758cd0b34
Make log message debug () 2019-11-14 15:05:36 -08:00
Edward Oakes
e3b95dafeb
Fix sigterm_handler () 2019-11-14 13:41:50 -08:00
Eric Liang
243b1b7281
[rllib] Add microbatch optimizer with A2C example () 2019-11-14 12:14:00 -08:00
Eric Liang
0a3623ded6
Fix memory store wait () 2019-11-14 10:17:30 -08:00
Stephanie Wang
bbadde57e0
Pass through caller address when submitting a task ()
* Add RpcAddress, set in actor table data

* Pass through task caller address

* RpcAddress -> Address

* update

* fix

* lint

* fix cc tests
2019-11-14 09:14:08 -08:00
Ujval Misra
e3e3ad4b25 Add timeout param to ray.get () 2019-11-14 00:50:04 -08:00
waldroje
e4c0843f60 Allow EntropyCoeffSchedule to accept custom schedule ()
* modify tf_policy to enable EntropyCoeffSchedule to handle list, and avoid negative values under current implementation

* Update custom_metrics_and_callbacks.py

* Update tf_policy.py
2019-11-14 00:45:43 -08:00
Eric Liang
e4565c9cc6
Reduce RLlib log verbosity () 2019-11-13 18:50:45 -08:00
Edward Oakes
51e76151d6
Use shared_ptr for gcs client in profiler () 2019-11-13 15:24:01 -08:00
Philipp Moritz
f24d96ec4f Revert "Try to enable dashboard (again) ()" ()
This reverts commit 4044af8520.
2019-11-13 12:32:12 -08:00
Eric Liang
b924299833
Add large scale regression test for RLlib () 2019-11-13 12:22:55 -08:00
Eric Liang
f3f86385d6
Minimal implementation of direct task calls () 2019-11-12 11:45:28 -08:00
Stephanie Wang
35d177f459
Use grpc for communication from worker to local raylet (task submission and direct actor args only) ()
* Skeleton for SubmitTask proto

* Pass through node manager port, connect in raylet client

* Switch submit task to grpc

* Check port in use

* doc

* Remove default port, set port randomly from driver

* update

* Fix test

* Fix object manager test
2019-11-11 21:17:25 -08:00
Siyuan (Ryans) Zhuang
f48293f96d
Fix deprecated warning () 2019-11-11 17:49:15 -08:00
Simon Mo
c75ada9e04
[Autoscaler][K8s] Enforce memory limit in k8s yaml ()
* Enforce memory limit in k8s yaml

* Update python/ray/autoscaler/kubernetes/example-full.yaml

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* Line wrap
2019-11-11 14:06:34 -08:00
Adi Zimmerman
776b071f3b [tune] Let Search Algorithms use early stopped trials () 2019-11-11 09:38:14 -08:00
Edward Oakes
5780ec1b62
Refresh ObjectIDs in raylet for stopgap GC () 2019-11-10 23:12:59 -08:00
Philipp Moritz
decaa65cd6
Use pickle by default for serialization () 2019-11-10 18:12:18 -08:00
Adam Gleave
01aee8d970 [autoscaler] Retry creating EC2 instances in new AZ () 2019-11-09 19:44:27 -08:00
Miguel Morales
d17ae5ad7a Update hyperband-cartpole.yaml ()
Typo
2019-11-09 19:39:03 -08:00
Adam Gleave
c157e93ba1 [tune] Retry failed tasks with checkpointing disabled ()
* Allow recovery for failed tasks without checkpointing

* Update docs
2019-11-09 19:35:27 -08:00
Philipp Moritz
ccbcc4bafa
Use GRCP and Bazel 1.0 () 2019-11-08 15:58:28 -08:00
Eric Liang
afca6d3d87
Object store full with cyclic python references () 2019-11-08 14:08:24 -08:00
Edward Oakes
83378a8610
Improve flaky test_warning_monitor_died () 2019-11-08 12:11:15 -08:00
Eric Liang
4044af8520
Try to enable dashboard (again) ()
* Revert "Revert "Enable the Ray dashboard by default ()" ()"

This reverts commit 1a3e97cf23.

* fix tests that assume the dashboard isn't a job

* travis
2019-11-08 10:48:48 -08:00
Philipp Moritz
5a05eaaa54 Fix compilation on master () 2019-11-07 22:38:42 -08:00
Eric Liang
4a28306186
Allow large returns from direct actor calls () 2019-11-07 21:28:55 -08:00
Edward Oakes
ca53af4d0f
Add pending task dependencies to ObjectID ref counting () 2019-11-07 18:37:10 -08:00
Eric Liang
1f043daf69
[rllib] Fix and add test for LR annealing config 2019-11-07 12:17:27 -08:00
Simon Mo
fcb6bdbc39
[Doc] Document Actor.options API ()
* Document Actor.options API

* Undocument _remote
2019-11-06 23:12:23 -08:00
Edward Oakes
9820c10a09 Simplify gRPC service definition for the worker () 2019-11-06 13:00:39 -08:00
David Bignell
3f83b2daa9 [rllib] Rollout extensions ()
* Rollout improvements

* Make info-saving optional, to avoid breaking change.

* Store generating ray version in checkpoint metadata

* Keep the linter happy

* Add small rollout test

* Terse.

* Update test_io.py
2019-11-05 20:34:18 -08:00
Eric Liang
2a0225dd25
[rllib] RLlib chooses wrong neural network model for Atari in 0.7.5 () 2019-11-05 11:36:29 -08:00
daiyaanarfeen
8f6d73a93a [sgd] Extend distributed pytorch functionality ()
* raysgd

* apply fn

* double quotes

* removed duplicate TimerStat

* removed duplicate find_free_port

* imports in pytorch_trainer

* init doc

* ray.experimental

* remove resize example

* resnet example

* cifar

* Fix up after kwargs

* data_dir and dataloader_workers args

* formatting

* loss

* init

* update code

* lint

* smoketest

* better_configs

* fix

* fix

* fix

* train_loader

* fixdocs

* ok

* ok

* fix

* fix_update

* fix

* fix

* done

* fix

* fix

* fix

* small

* lint

* fix

* fix

* fix_test

* fix

* validate

* fix

* fi
2019-11-05 11:16:46 -08:00
Mitchell Stern
82be14f943 Move gRPC calls outside of Raylet stats lock () 2019-11-05 00:47:15 -08:00
mehrdadn
e312f3d282 Compatibility issues ()
* Pass -f - to tar to force stdin on Windows

* Quote paths that may contain spaces (causes issues on Windows)

* Copy over Windows code from Arrow for glog signal handle uninstall

* Add missing COPTS to build rules since we'll need them for Windows compatibility

* Begin adding COPTS for Windows compatibility

* Disable glog on Arrow until we change WIN32 to _WIN32 there

* Missing header files that cause problems on Windows

* WORD typedef conflicts with Windows; remove it

* uint -> unsigned int wherever we're dealing with milliseconds (signed version is already int)

* uint -> unsigned int for enums

* uint -> size_t, wherever we're dealing with sizes or indices into arrays

* Work around Boost 1.68 bug in detecting clang-cl (revert this after upgrading)

* Missing #include <unistd.h>

* Add check for signal handler uninstallation failure

* Linting issue
2019-11-05 00:08:14 -08:00
Philipp Moritz
fefe050a58
Fix running out of file descriptors in the WebUI () 2019-11-04 21:17:36 -08:00