Commit graph

2084 commits

Author SHA1 Message Date
zhu-eric
65297e65f0 Experimental Actor Pool (#6055)
* mod_table

* Example fix for gallery

* lint

* nit

* nit

* fix

* gallery

* remove table for now

* training, object store, tune, actors, advanced

* start tf code

* first cut tf

* yapf

* pytorch

* add torch example

* torch

* parallel

* tune

* tuning

* reviewsready

* finetune

* fix

* move_code

* update conf

* compile

* init hyperparameter

* Start images

* overview

* extra

* fix

* works

* update-ps-example

* param_actor

* fix

* examples

* simple

* simplify_pong

* flake8 and run hyperopt

* add comments

* add comments

* add suggestion

* add suggestion

* suggestions

* add suggestion

* add suggestions

* fixed in wrong area

* last edit

* finish changes

* add line

* format

* reset

* tests and docs

* fix tests

* bazelify

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2019-12-26 14:35:10 -08:00
inventormc
0dd8a60679 [tune] Usability errors PBT (#5972)
* update with upstream master

* check for function args in hyperparam_mutations pbt

* fix style for pbt

* remove_checkpoint

* Update pbt.py

* Update pbt.py

* fix

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2019-12-26 14:27:07 -08:00
Zhijun Fu
d2bba596ab Fix actor reconstruction with direct call (#6570) 2019-12-26 10:59:50 +08:00
Yuhao Yang
be23b3ac41 [sgd] show training result for examples (#6552) 2019-12-26 02:15:43 +01:00
Yuhao Yang
df4533c649 [tune] demo exporting trained models in pbt examples (#6533) 2019-12-26 02:14:49 +01:00
Richard Liaw
93e8c85e72
[tune] Avoid duplication in TrialRunner execution (#6598)
* avoid_duplication

* Update python/ray/tune/ray_trial_executor.py

Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>

Co-authored-by: Kristian Hartikainen <kristian.hartikainen@gmail.com>
2019-12-26 02:13:55 +01:00
Yuhao Yang
8707a721d9 [tune] update params for optimizer in reset_config (#6522)
* reset config update lr

* add default

* Update pbt_dcgan_mnist.py

* Update pbt_convnet_example.py

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2019-12-26 02:10:09 +01:00
Richard Liaw
aa7b861332
[minor][tune] Support Type Hinting for py3 (#6571)
* fullargspec for new pyversion

* fi
2019-12-25 08:15:33 +01:00
Robert Nishihara
f89d81896a Fix flaky test_gpu_ids test. (#6579) 2019-12-24 14:26:44 -08:00
Robert Nishihara
2f57391595 Fix bug when failing to import remote functions or actors with args and kwargs. (#6577) 2019-12-24 13:23:48 -08:00
Edward Oakes
6b1a57542e
Add actor.__ray_kill__() to terminate actors immediately (#6523) 2019-12-23 23:12:57 -06:00
Yunzhi Zhang
bac6f3b61e [Dashboard] Collecting worker stats in node manager and implement webui display in the backend (#6574) 2019-12-22 17:50:23 -08:00
mehrdadn
50fb26de68 Fix FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use arr[tuple(seq)] instead of arr[seq]. In the future this will be interpreted as an array index, arr[np.array(seq)], which will result either in an error or a different result. (#6568) 2019-12-22 13:02:34 -08:00
Chaokun Yang
7bbfa85c66 [Streaming] Streaming data transfer java (#6474) 2019-12-22 10:56:05 +08:00
Edward Oakes
e50aa99be1
Reference counting for direct call submitted tasks (#6514)
Co-authored-by: Zhijun Fu <37800433+zhijunfu@users.noreply.github.com>
2019-12-20 17:06:33 -08:00
Eric Liang
de22cdb233
Reduce reporter CPU (#6553)
* wip

* remove

* Update ray_constants.py
2019-12-19 22:21:30 -08:00
Eric Liang
e556b729c2
[direct call] Fix max_calls interaction with background tasks. (#6536) 2019-12-19 13:48:32 -08:00
Edward Oakes
41fa2e9604 Remove object id translation (#6531) 2019-12-19 12:47:49 -08:00
Simon Mo
d807d0bab6
Serve small fixes (#6539)
* Tmp db

* Lint

* Turn on direct call for serve tests
2019-12-18 23:08:59 -08:00
alindkhare
d78a1062db [Serve] Pluggable Queueing Policy (#6492) 2019-12-18 21:28:38 -08:00
Yunzhi Zhang
c507859a83 [Dashboard] Node resource display fix (#6521) 2019-12-18 12:07:37 -08:00
Simon Mo
26ec500ef9
Implement async get for direct actor call (#6339) 2019-12-18 11:50:21 -08:00
Ujval Misra
ec4f8d5311 [hotfix][tune] fix get_sync_client. (#6525) 2019-12-17 23:59:43 -08:00
Simon Mo
e530c37b0e
Use localhost and set redis password by default (#6481) 2019-12-17 19:41:19 -08:00
Philipp Moritz
4d71ab83cf require packaging (#6517) 2019-12-17 12:01:14 -08:00
Ujval Misra
81197e47c7 [tune] Refactor syncer (#6496)
* Refactor syncer and log_sync.

* Fix documentation.

* Remove delete from api

* Rename to get_node_syncer
2019-12-17 05:25:16 -08:00
Yunzhi Zhang
166560e428 [Dashboard] displays resources row (#6516) 2019-12-17 01:05:57 -08:00
Eric Liang
1a1324d2a2
Bump version from 0.8.0.dev6 -> 0.9.0.dev (#6508) 2019-12-16 23:57:42 -08:00
Mitchell Stern
6cb34b699e Expose extra node info from raylet stats (#6511) 2019-12-16 18:22:37 -08:00
Yunzhi Zhang
ce1c9a87a7 Expand dashboard by default (#6505) 2019-12-16 17:17:29 -08:00
Mitchell Stern
b7d23405fe [Dashboard] Change default port from 8080 to 8265 (#6503)
* [Dashboard] Change default port from 8080 to 8265

* Revise order of imports in pip install setup command
2019-12-16 14:25:23 -08:00
Mitchell Stern
1531c21dbd [Dashboard] Add remaining features from old dashboard (#6489)
* [Dashboard] Add remaining features from old dashboard

* Fix linting errors

* Set cluster uptime statistic to N/A

* Use proper singular or plural words for workers column

* Ignore .js, .jsx, .ts, .tsx files in check-git-clang-format-output.sh

* Fix bash quote issue
2019-12-16 11:21:18 -08:00
Ujval Misra
e38b25edfb Fix duplicate progress output. (#6497) 2019-12-15 21:53:24 -08:00
Richard Liaw
5719a05757
[sgd] Add support for multi-model multi-optimizer training (#6317) 2019-12-15 15:19:45 -08:00
Philipp Moritz
f5d10eea0b
[Projects] Refactor cluster specification (#6488) 2019-12-14 22:43:06 -08:00
Philipp Moritz
afae8406da Make sure numpy >= 1.16.0 is installed for fast pickling support (#6486)
* Make sure numpy >= 1.16.0 is installed

* Works for 1.15.4

* lint

* formatting

* update

* put check into the right place

* lint
2019-12-14 16:36:49 -08:00
Tim Gates
ac8f8143e7 Fix simple typo: verion -> version (#6485)
Closes #6484
2019-12-14 15:37:55 -08:00
Edward Oakes
e2b7459bfc
Fix worker exit cleanup (#6450)
* working but ugly

* comments

* proper but hanging in grpc server destructor

* grpc server shutdown deadline

* fix disconnect

* lint

* shutdown_only in test

* replace shutdown
2019-12-13 16:52:50 -08:00
Yuhao Yang
ad4da17899 [Tune] Add example and tutorial for DCGAN (#6400) 2019-12-13 14:15:44 -08:00
Eric Liang
be5dd8eb5e
Enable direct calls by default (#6367)
* wip

* add

* timeout fix

* const ref

* comments

* fix

* fix

* Move actor state into actor handle

* comments 2

* enable by default

* temp reorder

* some fixes

* add debug code

* tmp

* fix

* wip

* remove dbg

* fix compile

* fix

* fix check

* remove non direct tests

* Increment ref count before resolving value

* rename

* fix another bug

* tmp

* tmp

* Fix object pinning

* build change

* lint

* ActorManager

* tmp

* ActorManager

* fix test component failures

* Remove old code

* Remove unused

* fix

* fix

* fix resources

* fix advanced

* eric's diff

* blacklist

* blacklist

* cleanup

* annotate

* disable tests for now

* remove

* fix

* fix

* clean up verbosity

* fix test

* fix concurrency test

* Update .travis.yml

* Update .travis.yml

* Update .travis.yml

* split up analysis suite

* split up trial runner suite

* fix detached direct actors

* fix

* split up advanced tesT

* lint

* fix core worker test hang

* fix bad check fail which breaks test_cluster.py in tune

* fix some minor diffs in test_cluster

* less workers

* make less stressful

* split up test

* retry flaky tests

* remove old test flags

* fixes

* lint

* Update worker_pool.cc

* fix race

* fix

* fix bugs in node failure handling

* fix race condition

* fix bugs in node failure handling

* fix race condition

* nits

* fix test

* disable heartbeatS

* disable heartbeatS

* fix

* fix

* use worker id

* fix max fail

* debug exit

* fix merge, and apply [PATCH] fix concurrency test

* [patch] fix core worker test hang

* remove NotifyActorCreation, and return worker on completion of actor creation task

* remove actor diied callback

* Update core_worker.cc

* lint

* use task manager

* fix merge

* fix deadlock

* wip

* merge conflits

* fix

* better sysexit handling

* better sysexit handling

* better sysexit handling

* check id

* better debug

* task failed msg

* task failed msg

* retry failed tasks with delay

* retry failed tasks with delay

* clip deps

* fix

* fix core worker tests

* fix task manager test

* fix all tests

* cleanup

* set to 0 for direct tests

* dont check worker id for ownership rpc

* dont check worker id for ownership rpc

* debug messages

* add comment

* remove debug statements

* nit

* check worker id

* fix test

* owner

* fix tests
2019-12-13 13:58:04 -08:00
Richard Liaw
3754effafc
Make setup-dev.py more resilient (#6467)
* fix_tests

* link_tests
2019-12-13 11:32:04 -08:00
Stephanie Wang
c57dcc82d1 Port actor creation to use direct calls (#6375) 2019-12-12 19:50:51 -08:00
Philipp Moritz
74b454c614
Fix overriding of params dictionary (#6445) 2019-12-12 19:15:13 -08:00
Eric Liang
5a5c94939f
[direct call] Retry failed tasks with delay (#6453)
* retry failed tasks with delay

* set to 0 for direct tests
2019-12-12 17:12:38 -08:00
alindkhare
76e678d775 [Serve] Added deadline awareness (#6442)
* [Serve] Added deadline awareness

Added deadline awareness while enqueuing a query
Using Blist sorted-list implementation (ascending order) to get queries according to their specified deadlines. [buffer_queues]
Exposed slo_ms via handle/http request
Added slo example 
The queries in example will be executed in almost the opposite order of which they are fired
Added slo pytest
Added check for slo_ms to not be negative
Included the changes suggested

* Linting Corrections

* Adding the code changes suggested by format.sh

* Added the suggested changes

Added justification for blist
Added blist in travis/ci/install-dependencies.sh

* Fixed linting issues

* Added blist to ray/doc/requirements-doc.txt
2019-12-11 16:41:54 -08:00
Robert Nishihara
240e8f5279 Fix error message when failing to start UI if grpcio not installed. (#6433) 2019-12-11 14:56:13 -08:00
Eric Liang
b3eb374817
[tune] Really disable retries by default 2019-12-11 13:12:28 -08:00
Edward Oakes
82f7dbc7a7
Increase TaskID size by 2 bytes, taken from JobID (#6425)
* Increase TaskID size by 2 bytes, taken from JobID

* comments

* check max job id

* fix doc

* fix local mode
2019-12-11 10:45:14 -08:00
Yuhao Yang
3db8faab0d [tune] fix log dir race condition (#6420) 2019-12-10 21:00:19 -08:00
Simon Mo
c61db84b8d Bump dev6->dev7 for two files not changed yet. (#6428) 2019-12-10 20:58:14 -08:00