Commit graph

1873 commits

Author SHA1 Message Date
Yunzhi Zhang
bac6f3b61e [Dashboard] Collecting worker stats in node manager and implement webui display in the backend (#6574) 2019-12-22 17:50:23 -08:00
mehrdadn
50fb26de68 Fix FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use arr[tuple(seq)] instead of arr[seq]. In the future this will be interpreted as an array index, arr[np.array(seq)], which will result either in an error or a different result. (#6568) 2019-12-22 13:02:34 -08:00
Chaokun Yang
7bbfa85c66 [Streaming] Streaming data transfer java (#6474) 2019-12-22 10:56:05 +08:00
Edward Oakes
e50aa99be1
Reference counting for direct call submitted tasks (#6514)
Co-authored-by: Zhijun Fu <37800433+zhijunfu@users.noreply.github.com>
2019-12-20 17:06:33 -08:00
Eric Liang
de22cdb233
Reduce reporter CPU (#6553)
* wip

* remove

* Update ray_constants.py
2019-12-19 22:21:30 -08:00
Eric Liang
e556b729c2
[direct call] Fix max_calls interaction with background tasks. (#6536) 2019-12-19 13:48:32 -08:00
Edward Oakes
41fa2e9604 Remove object id translation (#6531) 2019-12-19 12:47:49 -08:00
Simon Mo
d807d0bab6
Serve small fixes (#6539)
* Tmp db

* Lint

* Turn on direct call for serve tests
2019-12-18 23:08:59 -08:00
alindkhare
d78a1062db [Serve] Pluggable Queueing Policy (#6492) 2019-12-18 21:28:38 -08:00
Yunzhi Zhang
c507859a83 [Dashboard] Node resource display fix (#6521) 2019-12-18 12:07:37 -08:00
Simon Mo
26ec500ef9
Implement async get for direct actor call (#6339) 2019-12-18 11:50:21 -08:00
Ujval Misra
ec4f8d5311 [hotfix][tune] fix get_sync_client. (#6525) 2019-12-17 23:59:43 -08:00
Simon Mo
e530c37b0e
Use localhost and set redis password by default (#6481) 2019-12-17 19:41:19 -08:00
Philipp Moritz
4d71ab83cf require packaging (#6517) 2019-12-17 12:01:14 -08:00
Ujval Misra
81197e47c7 [tune] Refactor syncer (#6496)
* Refactor syncer and log_sync.

* Fix documentation.

* Remove delete from api

* Rename to get_node_syncer
2019-12-17 05:25:16 -08:00
Yunzhi Zhang
166560e428 [Dashboard] displays resources row (#6516) 2019-12-17 01:05:57 -08:00
Eric Liang
1a1324d2a2
Bump version from 0.8.0.dev6 -> 0.9.0.dev (#6508) 2019-12-16 23:57:42 -08:00
Mitchell Stern
6cb34b699e Expose extra node info from raylet stats (#6511) 2019-12-16 18:22:37 -08:00
Yunzhi Zhang
ce1c9a87a7 Expand dashboard by default (#6505) 2019-12-16 17:17:29 -08:00
Mitchell Stern
b7d23405fe [Dashboard] Change default port from 8080 to 8265 (#6503)
* [Dashboard] Change default port from 8080 to 8265

* Revise order of imports in pip install setup command
2019-12-16 14:25:23 -08:00
Mitchell Stern
1531c21dbd [Dashboard] Add remaining features from old dashboard (#6489)
* [Dashboard] Add remaining features from old dashboard

* Fix linting errors

* Set cluster uptime statistic to N/A

* Use proper singular or plural words for workers column

* Ignore .js, .jsx, .ts, .tsx files in check-git-clang-format-output.sh

* Fix bash quote issue
2019-12-16 11:21:18 -08:00
Ujval Misra
e38b25edfb Fix duplicate progress output. (#6497) 2019-12-15 21:53:24 -08:00
Richard Liaw
5719a05757
[sgd] Add support for multi-model multi-optimizer training (#6317) 2019-12-15 15:19:45 -08:00
Philipp Moritz
f5d10eea0b
[Projects] Refactor cluster specification (#6488) 2019-12-14 22:43:06 -08:00
Philipp Moritz
afae8406da Make sure numpy >= 1.16.0 is installed for fast pickling support (#6486)
* Make sure numpy >= 1.16.0 is installed

* Works for 1.15.4

* lint

* formatting

* update

* put check into the right place

* lint
2019-12-14 16:36:49 -08:00
Tim Gates
ac8f8143e7 Fix simple typo: verion -> version (#6485)
Closes #6484
2019-12-14 15:37:55 -08:00
Edward Oakes
e2b7459bfc
Fix worker exit cleanup (#6450)
* working but ugly

* comments

* proper but hanging in grpc server destructor

* grpc server shutdown deadline

* fix disconnect

* lint

* shutdown_only in test

* replace shutdown
2019-12-13 16:52:50 -08:00
Yuhao Yang
ad4da17899 [Tune] Add example and tutorial for DCGAN (#6400) 2019-12-13 14:15:44 -08:00
Eric Liang
be5dd8eb5e
Enable direct calls by default (#6367)
* wip

* add

* timeout fix

* const ref

* comments

* fix

* fix

* Move actor state into actor handle

* comments 2

* enable by default

* temp reorder

* some fixes

* add debug code

* tmp

* fix

* wip

* remove dbg

* fix compile

* fix

* fix check

* remove non direct tests

* Increment ref count before resolving value

* rename

* fix another bug

* tmp

* tmp

* Fix object pinning

* build change

* lint

* ActorManager

* tmp

* ActorManager

* fix test component failures

* Remove old code

* Remove unused

* fix

* fix

* fix resources

* fix advanced

* eric's diff

* blacklist

* blacklist

* cleanup

* annotate

* disable tests for now

* remove

* fix

* fix

* clean up verbosity

* fix test

* fix concurrency test

* Update .travis.yml

* Update .travis.yml

* Update .travis.yml

* split up analysis suite

* split up trial runner suite

* fix detached direct actors

* fix

* split up advanced tesT

* lint

* fix core worker test hang

* fix bad check fail which breaks test_cluster.py in tune

* fix some minor diffs in test_cluster

* less workers

* make less stressful

* split up test

* retry flaky tests

* remove old test flags

* fixes

* lint

* Update worker_pool.cc

* fix race

* fix

* fix bugs in node failure handling

* fix race condition

* fix bugs in node failure handling

* fix race condition

* nits

* fix test

* disable heartbeatS

* disable heartbeatS

* fix

* fix

* use worker id

* fix max fail

* debug exit

* fix merge, and apply [PATCH] fix concurrency test

* [patch] fix core worker test hang

* remove NotifyActorCreation, and return worker on completion of actor creation task

* remove actor diied callback

* Update core_worker.cc

* lint

* use task manager

* fix merge

* fix deadlock

* wip

* merge conflits

* fix

* better sysexit handling

* better sysexit handling

* better sysexit handling

* check id

* better debug

* task failed msg

* task failed msg

* retry failed tasks with delay

* retry failed tasks with delay

* clip deps

* fix

* fix core worker tests

* fix task manager test

* fix all tests

* cleanup

* set to 0 for direct tests

* dont check worker id for ownership rpc

* dont check worker id for ownership rpc

* debug messages

* add comment

* remove debug statements

* nit

* check worker id

* fix test

* owner

* fix tests
2019-12-13 13:58:04 -08:00
Richard Liaw
3754effafc
Make setup-dev.py more resilient (#6467)
* fix_tests

* link_tests
2019-12-13 11:32:04 -08:00
Stephanie Wang
c57dcc82d1 Port actor creation to use direct calls (#6375) 2019-12-12 19:50:51 -08:00
Philipp Moritz
74b454c614
Fix overriding of params dictionary (#6445) 2019-12-12 19:15:13 -08:00
Eric Liang
5a5c94939f
[direct call] Retry failed tasks with delay (#6453)
* retry failed tasks with delay

* set to 0 for direct tests
2019-12-12 17:12:38 -08:00
alindkhare
76e678d775 [Serve] Added deadline awareness (#6442)
* [Serve] Added deadline awareness

Added deadline awareness while enqueuing a query
Using Blist sorted-list implementation (ascending order) to get queries according to their specified deadlines. [buffer_queues]
Exposed slo_ms via handle/http request
Added slo example 
The queries in example will be executed in almost the opposite order of which they are fired
Added slo pytest
Added check for slo_ms to not be negative
Included the changes suggested

* Linting Corrections

* Adding the code changes suggested by format.sh

* Added the suggested changes

Added justification for blist
Added blist in travis/ci/install-dependencies.sh

* Fixed linting issues

* Added blist to ray/doc/requirements-doc.txt
2019-12-11 16:41:54 -08:00
Robert Nishihara
240e8f5279 Fix error message when failing to start UI if grpcio not installed. (#6433) 2019-12-11 14:56:13 -08:00
Eric Liang
b3eb374817
[tune] Really disable retries by default 2019-12-11 13:12:28 -08:00
Edward Oakes
82f7dbc7a7
Increase TaskID size by 2 bytes, taken from JobID (#6425)
* Increase TaskID size by 2 bytes, taken from JobID

* comments

* check max job id

* fix doc

* fix local mode
2019-12-11 10:45:14 -08:00
Yuhao Yang
3db8faab0d [tune] fix log dir race condition (#6420) 2019-12-10 21:00:19 -08:00
Simon Mo
c61db84b8d Bump dev6->dev7 for two files not changed yet. (#6428) 2019-12-10 20:58:14 -08:00
Edward Oakes
044527adb8
Remove ref counting dependencies on ray.get() (#6412)
* Remove ref counting dependencies on Get()

* comment

* don't send IDs when disabled

* pass through internal config

* fix

* allow reinit

* remove flag
2019-12-10 18:11:34 -08:00
Ujval Misra
4e1d1ed00d [tune] Report trials by state fairly (#6395)
* Fairly represented trial states.

* filter test

* Indent

* Add test to BUILD

* Address Eric's comments (show truncation by state).

* Sort trials, only show 20.

* Fix lint
2019-12-10 14:56:54 -08:00
Philipp Moritz
16be483af7
[Projects] Return parameters for a command (#6409) 2019-12-10 10:25:01 -08:00
Chaokun Yang
6272907a57 [Streaming] Streaming data transfer and python integration (#6185) 2019-12-10 20:33:24 +08:00
Rong Rong
c1d4ab8bb4 Move top level RayletClient to ray::raylet::RayletClient (#6404) 2019-12-09 21:08:59 -08:00
Eric Liang
304b4f0d3d
Shard unit tests into medium sized files for test stability (#6398) 2019-12-09 13:15:29 -08:00
Eric Liang
a6bc2b1842
Misc direct call fixes from unit tests (#6394) 2019-12-08 19:34:02 -08:00
visatish
e2ba8c1898 [tune] Fixed bug in PBT where initial trial result is empty. (#6351)
* Fixed bug in tune pbt where initial result is empty.

* Updated mock trial executor in test suite.

* Added comment.
2019-12-06 15:30:27 -08:00
Zhijun Fu
b88b8202cc fix java build failure (#6062) 2019-12-06 14:38:43 +08:00
Ion
1c638a11a7 Refactor helper methods for new scheduler integration (#6354) 2019-12-05 18:49:25 -08:00
Edward Oakes
f63b64310a
Bump version to 0.8.0.dev7 (#6303) 2019-12-05 18:33:54 -08:00