Commit graph

1536 commits

Author SHA1 Message Date
Philipp Moritz
7193107f32 fix build on macOS (#1687) 2018-03-08 23:23:21 -08:00
Philipp Moritz
5ef0892236 Compile boost from source to fix macOS wheels (#1688) 2018-03-08 23:22:23 -08:00
Alexey Tumanov
91464a56dd [XRay] Raylet node and object manager unification/backend redesign. (#1640)
* directory for raylet

* some initial class scaffolding -- in progress

* node_manager build code and test stub files.

* class scaffolding for resources, workers, and the worker pool

* Node manager server loop

* raylet policy and queue - wip checkpoint

* fix dependencies

* add gen_nm_fbs as target.

* object manager build, stub, and test code.

* Start integrating WorkerPool into node manager

* fix build on mac

* tmp

* adding LsResources boilerplate

* add/build Task spec boilerplate

* checkpoint ActorInformation and LsQueue

* Worker pool maintains started and removed workers

* todos for e2e task assignment

* fix build on mac

* build/add lsqueue interface

* channel resource config through from NodeServer to LsResources; prep LsResources to replace/provide worker_pool

* progress on LsResources class: resource availability check implementation

* Read task submission messages from a client

* Submit tasks from the client to the local scheduler

* Assign a task to a worker from the WorkerPool

* change the way node_manager is built to prevent build issues for object_manager.

* add namespaces. fix build.

* Move ClientConnection message handling into server, remove reference to
WorkerPool

* Add raw constructors for TaskSpecification

* Define TaskArgument by reference and by value

* Flatbuffer serialization for TaskSpec

* expand resource implementation

* Start integrating TaskExecutionSpecification into Task

* Separate WorkerPool from LsResources, give ownership to NodeServer

* checkpoint queue and resource code

* resoving merge conflicts

* lspolicy::schedule ; adding lsqueue and lspolicy to the nodeserver

* Implement LsQueue RemoveTasks and QueueReadyTasks

* Fill in some LsQueue code for assigning a task

* added suport for test_asio

* Implement LsQueue queue tasks methods, queue running tasks

* calling into policy from nodeserver; adding cluster resource map

* Feedback and Testing.
Incorporate Alexey's feedback. Actually test some code. Clean up callback imp.

* end to end task assignment

* Decouple local scheduler from node server

* move TODO

* Move local scheduler to separate file

* Add scaffolding for reconstruction policy, task dependency manager, and object manager

* fix

* asio for store client notifications.
added asio for plasma store connection.
added tests for store notifications.
encapsulate store interaction under store_messenger.

* Move Worker inside of ClientConnection

* Set the assigned task ID in the worker

* Several changes toward object manager implementation.
Store client integration with asio.
Complete OM/OD scaffolding.

* simple simulator to estimate number of retry timeouts

* changing dbclientid --> clientid

* fix build (include sandbox after it's fixed).

* changes to object manager, adding lambdas to the interface

* changing void * callbacks to std::function typed callbacks

* remove use namespace std from .h files.
use ray:: for Status everywhere.

* minor

* lineage cache interfaces

* TODO for object IDs

* Interface for the GCS client table

* Revert "Set the assigned task ID in the worker"

This reverts commit a770dd31048a289ef431c56d64e491fa7f9b2737.

* Revert "Move Worker inside of ClientConnection"

This reverts commit dfaa0d662a76976c05be6d76b214b45d88482818.

* OD/OM: ray::Status

* mock gcs integration.

* gcs mock clientinfo assignment

* Allow lookup of a Worker in the WorkerPool

* Split out Worker and ClientConnection source files

* Allow assignment of a task ID to a worker, skeleton for finishing a task

* integrate mock gcs with om tests.

* added tcp connection acceptor

* integrated OM with NM.
integrated GcsClient with NM.
Added multi-node integration tests.

* OM to receive incoming tcp connections.

* implemented object manager connection protocol.

* Added todos.

* slight adjustment to add/remove handler invocation on object store client.

* Simplify Task interface for getting dependencies

* Remove unused object manager file

* TaskDependencyManager tracks missing task dependencies and processes object add notifications

* Local scheduler queues tasks according to argument availability

* Fill in TaskSpecification methods to get arguments

* Implemented push.

* Queue tasks that have been scheduled but that are waiting for a worker

* Pull + mock gcs cleanup.

* OD/OM/GCS mock code review, fixing unused-result issues, eliminating copy ctor

* Remove unique_ptr from object_store_client

* Fix object manager Push memory error

* Pull task arguments in task dependency manager

* Add a demo script for remote task dependencies

* Some comments for the TaskDependencyManager

* code cleanup; builds on mac

* Make ClientConnection a templated type based on the connection protocol

* Add gmock to build

* Add WorkerPool unit tests

* clean up.

* clean up connection code.

* instantiate a template instance in the module

* Virtual destructors

* Document public api.

* Separate read and write buffers in ClientConnection; documentation

* Remove ObjectDirectory from NodeServer constructor, make directory InitGcs call a separate constructor

* Convert NodeServer Terminate to a destructor

* NodeServer documentation

* WorkerPool documentation

* TaskDependencyManager doc

* unifying naming conventions

* unifying naming conventions

* Task cleanup and documentation

* unifying naming conventions

* unifying naming conventions

* code cleanup and naming conventions

* code cleanup

* Rename om --> object_manager

* Merge with master

* SchedulingQueue doc

* Docs and implementation skeleton for ClientTable

* Node manager documentation

* ReconstructionPolicy doc

* Replace std::bind with lambda in TaskDependencyManager

* lineage cache doc

* Use \param style for doc

* documentation for scheduling policy and resources

* minor code cleanup

* SchedulingResources class documentation + code cleanup

* referencing ray/raylet directory; doxygen documentation

* updating trivial policy

* Fix bug where event loop stops after task submission

* Define entry point for ClientManager for handling new connections

* Node manager to node manager protocol, heartbeat protocol

* Fix flatbuffer

* Fix GCS flatbuffer naming conflict

* client connection moved to common dir.

* rename based on feedback.

* Added google style and 90 char lines clang-format file under src/ray.

* const ref ClientID.

* Incorporated feedback from PR.

* raylet: includes and namespaces

* raylet/om/gcs logging/using

* doxygen style

* camel casing, comments, other style; DBClientID -> ClientID

* object_manager : naming, defines, style

* consistent caps and naming; misc style

* cleaning up client connection + other stylistic fixes

* cmath, std::nan

* more style polish: OM, Raylet, gcs tables

* removing sandbox (moved to ray-project/sandbox)

* raylet linting

* object manager linting

* gcs linting

* all other linting


Co-authored-by: Melih <elibol@gmail.com>
Co-authored-by: Stephanie <swang@cs.berkeley.edu>
2018-03-08 12:53:24 -08:00
Eric Liang
d85274a12e [docs] update to expose libraries + landing page (#1642) 2018-03-08 09:18:09 -08:00
Eric Liang
75e825177f [rllib] Move Ape-X metrics behind a debug flag and remove some of them (#1656) 2018-03-08 00:48:49 -08:00
Robert Nishihara
b0510ee461 Give error when actor is created before ray.init. (#1666) 2018-03-07 10:36:49 -08:00
Philipp Moritz
a9acfab3a6 Start chain replicated GCS with Ray (#1538) 2018-03-07 10:18:58 -08:00
James Lamb
6dbf4f6318 Remove vim from base-deps container and reduce number of build layers (#1667) 2018-03-07 10:16:08 -08:00
Rohan Singh
0abebb0975 [Dataframes] Implement .__len__(), .__contains__(), .first_valid_index(), and .last_valid_index() (#1664)
* added len, contains, first_valid_index, last_valid_index

* fixed contains test cases

* test files updated for PR
2018-03-06 23:56:11 -08:00
Devin Petersohn
4af42d5bb6 [DataFrame] Adding error checking for pandas version (#1662)
* Adding error checking for pandas version

* Addressing comments
2018-03-06 09:57:49 -08:00
Stephanie Wang
0a6edb55a8 Implement the Subscribe call for the new GCS API (#1652)
* Implement the Subscribe call for the new GCS API

* Document tests

* Upper case function name

* Fix build errors

* lint
2018-03-06 09:56:12 -08:00
butchcom
936bebef99 [rllib] Upgrade to OpenAI Gym 0.10.3 (#1601) 2018-03-06 00:31:02 -08:00
Richard Liaw
162d063f0d
[autoscaler/tune] Optional YAML Fields + Fix Pretty Printing for Tune (#1541) 2018-03-04 23:35:58 -08:00
Richard Liaw
061e435411
[rllib] Fix eval.py -> rollout.py (#1650) 2018-03-04 14:59:16 -08:00
Philipp Moritz
a683cf2c70 Gcs Asio integration (#1633) 2018-03-04 14:51:04 -08:00
Richard Liaw
78716094b5
[tune] Async Hyperband (#1595) 2018-03-04 14:05:56 -08:00
Eric Liang
ecb811c26e
[rllib] Ape-X implementation and DQN refactor to handle replay in policy optimizer (#1604)
* minimal apex checkin

* cleanup dqn options

* actor utils

* Sun Feb 25 17:39:54 PST 2018

* update

* compression refactor

* fix

* add test

* fix models

* Sun Feb 25 21:46:27 PST 2018

* Wed Feb 28 10:26:34 PST 2018

* Wed Feb 28 10:28:09 PST 2018

* Wed Feb 28 10:42:59 PST 2018

* refactor

* Wed Feb 28 11:17:19 PST 2018

* Wed Feb 28 11:42:08 PST 2018

* Wed Feb 28 11:42:13 PST 2018

* Wed Feb 28 11:59:02 PST 2018

* Wed Feb 28 11:59:58 PST 2018

* Wed Feb 28 12:00:08 PST 2018

* Wed Feb 28 12:02:19 PST 2018

* Wed Feb 28 13:44:31 PST 2018

* Wed Feb 28 17:01:20 PST 2018

* Sat Mar  3 14:55:59 PST 2018

* make optimizer construction explicit

* Sat Mar  3 18:23:08 PST 2018

* Sat Mar  3 18:24:28 PST 2018

* Sat Mar  3 18:49:28 PST 2018

* Sat Mar  3 18:50:42 PST 2018

* Sat Mar  3 18:56:10 PST 2018
2018-03-04 12:25:25 -08:00
Eric Liang
9b33f3a7b7
[autoscaler] Bad error message when dict field omitted (#1632)
* Wed Feb 28 23:22:55 PST 2018

* Wed Feb 28 23:24:07 PST 2018
2018-03-03 20:25:58 -08:00
Eric Liang
75293a0ba0
[rllib] Basic regression tests on CartPole (#1608)
* Sun Feb 25 21:36:22 PST 2018

* Sun Feb 25 21:42:09 PST 2018

* Sun Feb 25 21:44:30 PST 2018

* fix lint

* Wed Feb 28 12:41:49 PST 2018
2018-03-03 16:27:56 -08:00
Eric Liang
80d7def9dc
[autoscaler] [tune] More doc fixes (#1560)
* Fri Feb 16 13:53:50 PST 2018

* Sat Feb 17 15:32:08 PST 2018

* Sat Feb 17 15:44:59 PST 2018

* fix

* Sun Feb 18 14:46:24 PST 2018

* Sun Feb 18 14:46:37 PST 2018

* Sun Feb 18 14:55:52 PST 2018

* Sun Feb 18 15:14:32 PST 2018

* Wed Feb 21 17:34:17 PST 2018

* Sun Feb 25 17:51:17 PST 2018

* Sun Feb 25 22:18:40 PST 2018

* Wed Feb 28 13:19:05 PST 2018

* Wed Feb 28 13:22:13 PST 2018

* Wed Feb 28 13:33:29 PST 2018

* Wed Feb 28 13:35:33 PST 2018

* add ex

* Fri Mar  2 12:50:17 PST 2018

* Fri Mar  2 12:54:31 PST 2018
2018-03-03 13:01:49 -08:00
Richard Liaw
96d7938fc4 [tune] Hyperband Max Iter Fix (#1620)
* nits

* cumul r

* docs

* min
2018-03-03 13:00:55 -08:00
Kunal Gosar
6685d4c446 fix tail and finish repr and str (#1628) 2018-03-02 15:26:54 -08:00
Zhenyu Guo
f1e5789c26 restructure how to organize 3rd party libs (#1630)
* restructure how to organize 3rd party libs

* Minor whitespace changes.

* Fix compilation on Linux.

* Pass around Python executable so that the correct version of Python is used.
2018-03-01 14:29:56 -08:00
Robert Nishihara
ec9dfe7748 Allow setting INCLUDE_UI=0 to disable building the UI. (#1618) 2018-03-01 02:17:15 -08:00
Robert Nishihara
1222d09224 Fix dataframe test linting and test. (#1629) 2018-02-28 15:21:49 -08:00
Robert Nishihara
0fcceef772 Update logging and check macros. (#1627)
* Update logging and check macros.

* Fix linting.

* Fix RAY_DCHECK and unused variable.

* Fix linting
2018-02-28 15:13:00 -08:00
Devin Petersohn
e7df293946 [DataFrames] Updating Error messages to encourage contribution. (#1623) 2018-02-27 21:44:33 -08:00
Kunal Gosar
4a15c2c65c [Dataframes] Call ray.init() on ray.dataframe import (#1626)
* ray.init on dataframe import

* wrapping ray.init in a try/except

* removing ray.init calls from test code

* resolving flake8
2018-02-27 16:11:23 -08:00
Kunal Gosar
34664dbf76 [DataFrame] Pass lengths to _default_index instead of df (#1621)
* Pass lengths to remote function over DataFrame

* Increasing performance by moving length to remote
2018-02-27 02:38:26 -08:00
Simon Mo
4ab16d7fb3 [DataFrame] Implement loc, iloc (#1612)
* Add parquet-cpp to gitignore

* Add read_csv and read_parquet

* Gitignore pytest_cache

* Fix flake8

* Add io to __init__

* Changing Index. Currently running tests, but so far untested.

* Removing issue of reassigning DF in from_pandas

* Fixing lint

* Fix bug

* Fix bug

* Fix bug

* Better performance

* Fixing index issue with sum

* Address comments

* Update io with index

* Updating performance and implementation. Adding tests

* Fixing off-by-1

* Fix lint

* Address Comments

* Make pop compatible with new to_pandas

* Format Code

* Cleanup some index issue

* Bug fix: assigned reset_index back

* Implement loc and iloc

* Revert whitespace

* Format code

* Address comments
2018-02-27 01:57:52 -08:00
Richard Liaw
b79597dc00
[rllib] PPO Thread Limit (#1568) 2018-02-26 22:22:05 -08:00
Kunal Gosar
f43328f332 moved _default_index to remote fn (#1617) 2018-02-26 21:12:04 -08:00
Kunal Gosar
48bd7b147d [DataFrame] Added Implementations for equals, query, and some other operations (#1610)
* Implemented Dataframe __abs__ and __iter__

* implemented __neg__

* implemented query

* Implemented equals

* Implemented __eq__ and __ne__ operators

* Added method level comments

* resolved flake8 comments

* resolving devin's comments
2018-02-26 18:31:00 -08:00
Simon Mo
d78a22f94c [DataFrame] Implement IO for ray_df (#1599)
* Add parquet-cpp to gitignore

* Add read_csv and read_parquet

* Gitignore pytest_cache

* Fix flake8

* Add io to __init__

* Changing Index. Currently running tests, but so far untested.

* Removing issue of reassigning DF in from_pandas

* Fixing lint

* Fix bug

* Fix bug

* Fix bug

* Better performance

* Fixing index issue with sum

* Address comments

* Update io with index

* Updating performance and implementation. Adding tests

* Fixing off-by-1

* Fix lint

* Address Comments

* Make pop compatible with new to_pandas

* Format Code

* Cleanup some index issue

* Bug fix: assigned reset_index back

* Remove unused debug line
2018-02-26 18:26:38 -08:00
Eric Liang
87e107edd8 [tune] Sync logs from workers and improve tensorboard reporting (#1567) 2018-02-26 11:35:51 -08:00
Richard Liaw
aefefcb0cd Upload wheels to 'latest' folder (#1606) 2018-02-26 10:26:38 -08:00
Devin Petersohn
1fa59f1887 [DataFrame] Adding insert, set_axis, set_index, reset_index and tests (#1603) 2018-02-26 08:58:15 -08:00
Richard Liaw
c2ad800cbf
[rllib] Registry fix for DQN Replay Evaluators (#1593) 2018-02-25 22:30:11 -08:00
Robert Nishihara
ba1ce85f58 Download Redis and flatbuffers differently. (#1602)
* Download Redis differently.

* Get flatbuffers with curl
2018-02-25 20:32:33 -08:00
Robert Nishihara
00ef2f6285 Only pass in --allow-root if the user is root. (#1594) 2018-02-25 13:40:51 -08:00
Devin Petersohn
529397b35e [DataFrames] Updating Index implementation, performance improvements (#1598) 2018-02-25 13:39:28 -08:00
Richard Liaw
31fefa20b7
[tune] HyperBand Fixes (#1586) 2018-02-25 13:26:58 -08:00
Philipp Moritz
2026c147ec say which port is local and which one is remote (#1591) 2018-02-25 10:19:12 -08:00
Robert Nishihara
5859a2d249 Replace python setup.py install with pip install -e . (#1460) 2018-02-22 11:15:03 -08:00
Robert Nishihara
f4b1881fec Update arrow to use updated pandas serializer. (#1582) 2018-02-22 11:10:52 -08:00
Robert Nishihara
330159d8bd Allow setting redis shard ports through ray start (also object store memory). (#1581)
* Allow passing in --object-store-memory to ray start.

* Allow setting ports for the redis shards.

* Reorder arguments and infer number of shards from ports.

* Move code block into only the head node case.

* Add test.
2018-02-22 11:05:37 -08:00
Robert Nishihara
a3b44309dd Pass --allow-root into jupyter notebook. (#1571) 2018-02-21 22:42:19 -08:00
Robert Nishihara
6a8661a014 Reduce verbosity of log monitor. (#1569) 2018-02-21 22:40:16 -08:00
Richard Liaw
e62ad7007d
[autoscaler] Improve UX for Autoscaler (#1558) 2018-02-21 22:19:04 -08:00
Devin Petersohn
de6fa02c85 [DataFrame] Fix transpose with nan values and add functionality needed for Index (#1545) 2018-02-21 08:46:37 -08:00