Commit graph

2449 commits

Author SHA1 Message Date
Robert Nishihara
f2b5500642 Add ordered_set container. (#3352)
* Add ordered_set container.

* Fix

* Linting

* Constructors

* Remove O(n) call to list.size().

* Fix.

* Add documentation.

* Add iterators to ordered_set container implementation.

* iterator_type -> iterator

* Make typedefs private

* Add const_iterator
2018-11-19 17:01:18 -08:00
Eric Liang
d4dbd27e0d Don't retry IPC connect an absurd number of times (#3355) 2018-11-19 16:23:59 -08:00
Eric Liang
e4bb5d8d16
Fix logging when ray cluster utils is used 2018-11-18 21:49:27 -08:00
Eric Liang
61e3bbbfee
Update stale example links 2018-11-17 15:40:38 -08:00
Robert Nishihara
5cbc597494 Suppress duplicate pre-emptive object pushes. (#3276)
* Suppress duplicate pre-emptive object pushes.

* Add test.

* Fix linting

* Remove timer and inline recent_pushes_ into local_objects_.

* Improve test.

* Fix

* Fix linting

* Enable retrying pull from same object manager. Randomize object manager.

* Speed up test

* Linting

* Add test.

* Minor

* Lengthen pull timeout and reissue pull every time a new object becomes available.

* Increase pull timeout in test.

* Wait for nodes to start in object manager test.

* Wait longer for nodes to start up in test.

* Small fixes.

* _submit -> _remote

* Change assert to warning.
2018-11-16 23:02:45 -08:00
Wenting Shen
ab1e0f5c2f support home path and relative path for temp-dir (#3329) 2018-11-16 17:41:10 -08:00
Robert Nishihara
60b22d9a72 Don't unsubscribe dependencies for infeasible tasks. (#3338)
* Make scheduling queues RemoveTasks return task states as well.

* Add test

* Don't unsubscribe for infeasible tasks when spilling over.

* Linting

* Address comments.
2018-11-16 11:33:00 -08:00
Eric Liang
e0bf9d7305 Add debug string to raylet (#3317)
* initial debug string

* format

* wip debug string

* fix compile

* fix

* update

* finished

* to file

* logs dir

* use temp root

* fix

* override
2018-11-15 21:47:50 -08:00
Robert Nishihara
d10cb570ab Rename _submit -> _remote. (#3321) 2018-11-15 15:30:18 -08:00
Robert Nishihara
98edf752a9 Note requirement cython==0.27.3 in installation instructions. (#3322) 2018-11-15 15:27:19 -08:00
Philipp Moritz
1be1455d86 Fix redis crash when duplicate messages are appended to log. (#3316) 2018-11-15 15:09:39 -08:00
Eric Liang
5723291db6 Raise exception if the node is nearly out of memory (#3323)
* wip

* add

* comment

* escape hatch

* update

* object store too

* .2
2018-11-15 12:55:25 -08:00
Philipp Moritz
b6a12d1f97 Fix socket retry message (#3325) 2018-11-15 12:14:19 -08:00
Lewis Belcher
5319fd044c Update redis version in setup.py (#3333)
* `redis` has released a new version (https://github.com/andymccurdy/redis-py/releases/tag/3.0.0)
* `ray` is not compatible with this version
* This PR adds the "compatible release" operator for `redis` version 2.10.6.
2018-11-15 10:40:08 -08:00
Eric Liang
706dc1d473
[rllib] Add test for multi-agent support and fix IMPALA multi-agent (#3289)
IMPALA support for multiagent was broken since IMPALA has a requirement that batch sizes be of a certain length. However multi-agent envs can create variable-length batches.

Fix this by adding zero-padding as needed (similar to the RNN case).
2018-11-14 14:14:07 -08:00
andrewztan
57c7b4238e KL Divergence Metrics (#3300)
* added KL divergence metrics

* fix
2018-11-13 23:12:35 -08:00
Eric Liang
1660c9d627
Kill actor child processes on shutdown (#3297)
* example

* add env

* test pg

* change to test

* add atexit test

* Update rllib-env.rst

* comment

* revert unnecessary file

* fix title when actor is idle

* Update python/ray/actor.py

Co-Authored-By: ericl <ekhliang@gmail.com>
2018-11-13 19:16:42 -08:00
Stephanie Wang
577c1dda74 Release sender connections as soon as WriteMessageAsync completes (#3313) 2018-11-13 21:32:24 -05:00
Wang Qing
9d4847ad2d [hot-fix] Fix error when calling Ray.init() twice. (#3314) 2018-11-13 21:21:54 -05:00
Eric Liang
65c27c70cf [rllib] Clean up agent resource configurations (#3296)
Closes #3284
2018-11-13 18:00:03 -08:00
Philipp Moritz
d4fad222e1 Update profiling instructions for raylet (#3311) 2018-11-13 17:48:33 -05:00
Richard Liaw
97f423781b Clean up Ray processes after cluster util exits (#3278) 2018-11-13 13:18:12 -08:00
Richard Liaw
c3a2c7ebed [tune] Doc: Autofilled, StatusReporter (#3294)
* autofill and revise doc page for things

* lint

* comments
2018-11-13 13:15:56 -08:00
Eric Liang
6ee7a3b571
[rllib] Raise worker TF intra_op threads to 2, lower driver intra_op threads to 8 (#3299) 2018-11-13 11:41:58 -08:00
Richard Liaw
c0423db05c [core] Add Global State Test for multi-node setting (#3239)
* add test for adding node

* multinode test fixes

* First pass at allowing updatable values

* Fix compilation issues

* Add config file parsing

* Full initialization

* Wrote a good test

* configuration parsing and stuff

* docs

* write some tests, make it good

* fixed init

* Add all config options and bring back stress tests.

* Update python/ray/worker.py

* Update python/ray/worker.py

* Fix internalization

* some last changes

* Linting and Java fix

* add docstring

* Fix test, add assertions

* pytest ext

* lint

* lint
2018-11-13 10:35:24 -08:00
Eric Liang
d90f365394 [rllib] Add self-supervised loss to model (#3291)
# What do these changes do?

Allow self-supervised losses to be easily defined in custom models. Add this to the reference policy graphs.
2018-11-12 18:55:24 -08:00
Philipp Moritz
ce6e01b988 enable incremental builds (#3292) 2018-11-12 21:49:09 -05:00
Eric Liang
bd0dbde149
[rllib] Rename ServingEnv => ExternalEnv (#3302) 2018-11-12 16:31:27 -08:00
Richard Liaw
e37891d79d
[tune] Fix default handling for timesteps (#3293)
This PR fixes an issue where previously if timesteps_this_iter = 0,
then it would render as "None".

Closes #3057.
2018-11-12 15:52:17 -08:00
Eric Liang
49e2085d78
[rllib] Don't reset envs when possible (#3290)
* laz

* better errors
2018-11-11 01:45:37 -08:00
Eric Liang
463511f8a6
[tune] Track and warn on low memory (#3298) 2018-11-11 00:29:45 -08:00
Eric Liang
53489d2f85
[sgd] Document and add simple MNIST example (#3236) 2018-11-10 21:52:20 -08:00
Ion
d681893b0f Speed up task dispatch. (#3234)
* speed up task dispatch

* minor changes

* improved comments

* improved comments

* change argument of DispatchTasks to list of tasks

* dispatch only tasks whose dependencies have been fullfiled

* some updated comments

* refactored DispatchQueue() and Assigntask() to avoid the copy of the ready list

* minor fixes

* some more minor fixes

* some more minor fixes

* added more comments

* better comments?

* fixed all feedback comments, minus making the argument of AssignTask() const

* Assigntask() now taskes a const argument

* Do the task copy outside of the callback

* fix linting
2018-11-10 09:55:12 -08:00
Richard Liaw
29c182d449 [tune] Support "None" for upload_dir 2018-11-09 22:02:08 -08:00
Eric Liang
a51d618d88
[autoscaler] missing example-full.yaml file in the latest wheel for provider type "local" 2018-11-09 21:25:15 -08:00
Eric Liang
9dd3eedbac [rllib] rollout.py should reduce num workers (#3263)
## What do these changes do?

Don't create an excessive amount of workers for rollout.py, and also fix up the env wrapping to be consistent with the internal agent wrapper.

## Related issue number

Closes #3260.
2018-11-09 12:29:16 -08:00
Richard Liaw
22113be04c
[tune] Annotated Example Page and showcase Tutorials (#3267)
Adds an example page and link in codebase.

Closes #2728.
2018-11-08 23:45:05 -08:00
Eric Liang
588705b6fa [autoscaler] Add option to allow private ips only (#3270)
* merge

* update

* upd

* Update python/ray/autoscaler/autoscaler.py

Co-Authored-By: ericl <ekhliang@gmail.com>

* Update python/ray/autoscaler/autoscaler.py

Co-Authored-By: ericl <ekhliang@gmail.com>

* Update python/ray/autoscaler/aws/config.py

Co-Authored-By: ericl <ekhliang@gmail.com>

* fix
2018-11-08 17:07:31 -08:00
Philipp Moritz
8894883153 Force kill web UI in ray stop (#3257) 2018-11-08 00:05:32 -08:00
Eric Liang
9b2794101d
[minor] Change chunk already exists to DEBUG, add flags for rllib multi node testing (#3228) 2018-11-08 00:04:20 -08:00
Stephanie Wang
d950e92f63
Allow multiple threads to call ray.get and ray.wait (#3244)
* Handle multiple threads calling ray.get

* Multithreaded ray.wait

* Pass in current task ID in java backend

* Add multithreaded actor to tests, add warning messages to worker for multithreaded ray.get

* Fix test

* Some cleanups

* Improve error message

* Add assertion

* Cleanup, throw error in HandleTaskUnblocked if task not actually blocked

* lint

* Fix python worker reset

* Fix references to reconstruct_objects

* Linting

* java lint

* Fix java

* Fix iterator
2018-11-07 22:39:28 -08:00
Richard Liaw
0bab8ed95c
Expose internal config parameters for starting Ray (#3246)
## What do these changes do?

This PR exposes the CL option for using a config parameter. This is important for certain tests (i.e., FT tests that removing nodes) to run quickly.

Note that this is bad practice and should be replaced with GFLAGS or some equivalent as soon as possible.

#3239 depends on this.

TODO:
 - [x] Add documentation to method arguments before merging.
 - [x] Add test to verify this works?

## Related issue number
2018-11-07 21:46:02 -08:00
Eric Liang
43df405d07 [rllib] Add some debug logs during agent setup (#3247) 2018-11-07 14:54:28 -08:00
Richard Liaw
cf9e838326
[tune] Raise Error when overstepping (#3235) 2018-11-07 14:27:09 -08:00
Eric Liang
29e3362905
Better errors on process deaths (#3252) 2018-11-07 14:08:16 -08:00
Robert Nishihara
1dd5d92789 Enable timeline visualizations of object transfers. (#3255)
* Plot object transfers.

* Linting
2018-11-07 12:45:59 -08:00
Philipp Moritz
4182b85611 Cache resources in SchedulingQueue (#3232)
* cache resources

* fix

* documentation and remove old code

* fix PR

* update documentation

* linting
2018-11-06 21:23:31 -08:00
Eric Liang
2e04ffe00c Change dict serialization warning to debug (#3230) 2018-11-06 21:23:07 -08:00
Stephanie Wang
ca585703b2 Refactor ObjectDirectory to reduce and fix callback usage (#3227) 2018-11-06 20:33:10 -08:00
eugenevinitsky
344b4ef0ff [rllib] Fix filter sync for ES and ARS (#2918) 2018-11-06 19:09:34 -08:00