Commit graph

1612 commits

Author SHA1 Message Date
Alex Wu
d9c68fca5c
[Core] Logging improvements (#10625)
* other stuff
:

* lint

* .

* .

* lint

* comment

* lint

* .
2020-09-08 20:58:05 -07:00
SangBin Cho
b7040f1310
Revert "[Streaming] fix streaming ci (#9675)" (#10656)
This reverts commit 3645a05644.
2020-09-08 19:07:21 -07:00
SangBin Cho
dcb9e03fde
[Placement Group] Atomic Creation using 2 phase protocol part 2. (#10599)
* In progress.

* In Progress

* Basic done.

* Fix build issues.

* Addressed code review.

* Change the confusing test name.

* Fix comments.

* Addressed code review.
2020-09-08 13:11:11 -07:00
chaokunyang
bbfbc98a41
[Core] Allow users to specify the classpath and import path (#10560)
* move job resource path to job config

* job resource path support list

* job resource path support for python

* fix job_resource_path support

* fix worker command

* fix job config

* use jar file instead of parent path

* fix job resource path

* add test to test.sh

* lint

* Update java/runtime/src/main/resources/ray.default.conf

Co-authored-by: Kai Yang <kfstorm@outlook.com>

* fix testGetFunctionFromLocalResource

* lint

* fix rebase

* add jars in resource path to classloader

* add job_resource_path to worker

* add ray stop

* rename job_resource_path to resource_path

* fix resource_path

* refine resource_path comments

* rename job resource path to code search path

* Add instruction about starting a cross-language cluster

* fix ClassLoaderTest.java

* add code-search-path to RunManager

* refine comments for code-search-path

* rename resourcePath to codeSearchPath

* Update doc

* fix

* rename resourcePath to codeSearchPath

* update doc

* filter out empty path

* fix comments

* fix comments

* fix tests

* revert pom

* lint

* fix doc

* update doc

* Apply suggestions from code review

* lint

Co-authored-by: Kai Yang <kfstorm@outlook.com>
Co-authored-by: Hao Chen <chenh1024@gmail.com>
2020-09-09 00:46:32 +08:00
chaokunyang
3645a05644
[Streaming] fix streaming ci (#9675) 2020-09-08 22:20:58 +08:00
Kai Yang
ca8792e4ff
[Java] Disable the multi-worker feature by default (#10593) 2020-09-08 13:10:46 +08:00
kisuke95
b7003839bd
[Core] Use core worker options to initialize (#10467)
* fix

* fix

* .
2020-09-07 16:36:43 -07:00
Stephanie Wang
4f02ad4ef9
[core] Disable GCS reconnect (#10579)
* Set default GCS retries to 1

* Fix cc test
2020-09-05 13:14:07 -07:00
Kai Yang
5f5160ead9
[Core] Multi-tenancy: Worker capping (#10500) 2020-09-04 20:34:06 +08:00
SangBin Cho
2a7f56e429
[Placement group] Fix Logging issues. (#10557) 2020-09-03 23:55:10 -07:00
chaokunyang
cf3875bd8c
[Java] add exitActor API for java (#10496) 2020-09-04 10:11:42 +08:00
Edward Oakes
ead30ca655
[Core] fix named actor bug (#10550) 2020-09-03 17:48:31 -07:00
Clark Zinzow
0c0b0d0a73
[Core] Added support for submission-time task names. (#10449)
* Added support for submission-time task names.

* Suggestions from code review: add missing consts

Co-authored-by: SangBin Cho <rkooo567@gmail.com>

* Add num_returns arg to actor method options docstring example.

* Add process name line and proctitle assertion to submission-time task name section of advanced docs.

* Add submission-time task name --> proctitle test for Python worker.

* Added Python actor options tests for num_returns and name.

* Added Java test for submission-time task names.

* Add dashboard image to task name docs section.

* Move to fstrings.

Co-authored-by: SangBin Cho <rkooo567@gmail.com>
2020-09-03 11:45:24 -07:00
Edward Oakes
71274954d1
Remove unnecessary output when connecting to a cluster. (#10512) 2020-09-03 13:30:33 -05:00
Sven Mika
715ee8dfc9
[RLlib] Issue 10469: Callbacks should receive env idx ... (#10477) 2020-09-03 17:27:05 +02:00
SangBin Cho
dc7fe1a4c5
[Placement Group] Atomic Placement Group Part 1, Basic Structure. (#10482)
* Write a test.

* Basic structure done.

* Reduce flakiness of tests.

* Addressed code review.

* Skipping tests because it is flaky for now.

* Fix linting issues.

* Increase sleep time to see lint messages.

* Lint issue fixed.
2020-09-02 18:14:46 -07:00
chaokunyang
f10a5a40b0
[Java] Simplify ray cmd params (#10394) 2020-09-02 19:47:52 +08:00
Ian Rodney
283f4d1060
[docker] Use tmp paths for rsync and fix file_mounts on docker (#10368) 2020-09-01 13:14:35 -07:00
chaokunyang
d584a4e5c4
Fix java ci break (#10472) 2020-09-01 19:57:03 +08:00
chaokunyang
ba3bd6b225
Fix java ci break (#10470) 2020-09-01 19:33:23 +08:00
SangBin Cho
a0c7907d88
[Placement Group] Leasing context refactoring part 2 (#10413)
* In progress.

* Refactoring done, but still failing tests.

* Fix issues.

* Addressed code review.

* Addressed code review.
2020-08-31 15:54:34 -07:00
Gabriele Oliaro
05fe6dc278
Keeping pipelines full (#10225)
* requesting new workers only when pipelines to existing ones are full

* linting

* added unit testing & linting

* finished refactoring to consolidate all the fields that belong to a SchedulingKey into a single hashmap

* linting

* fixed bugs introduced by rebasing from new upstream master

* changes as part of the PR review process

* Fix typo in src/ray/core_worker/transport/direct_task_transport.cc

Co-authored-by: fangfengbin <869218239a@zju.edu.cn>

* Fixed comment in src/ray/core_worker/transport/direct_task_transport.cc

Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>

* second revision, with linting. all tests are passing locally

* Renamed SafeToDeleteEntry method in SchedulingKeyEntry

Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>

* all new revisions but the memory leak check. performed linting.

* added checks to make sure scheduling_key_entries does not leak memory

* linting. all checks passing locally

* edited CheckNoSchedulingKeyEntries function

* linting

* fixed build error on mac

* created public version of CheckNoSchedulingKeyEntries to acquire the lock

* linting

Co-authored-by: fangfengbin <869218239a@zju.edu.cn>
Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>
2020-08-30 18:49:25 -07:00
fyrestone
e9b046306a
[Dashboard] Dashboard basic modules (#10303)
* Improve reporter module

* Add test_node_physical_stats to test_reporter.py

* Add test_class_method_route_table to test_dashboard.py

* Add stats_collector module for dashboard

* Subscribe actor table data

* Add log module for dashboard

* Only enable test module in some test cases

* CI run all dashboard tests

* Reduce test timeout to 10s

* Use fstring

* Remove unused code

* Remove blank line

* Fix dashboard tests

* Fix asyncio.create_task not available in py36; Fix lint

* Add format_web_url to ray.test_utils

* Update dashboard/modules/reporter/reporter_head.py

Co-authored-by: Max Fitton <mfitton@berkeley.edu>

* Add DictChangeItem type for Dict change

* Refine logger.exception

* Refine GET /api/launch_profiling

* Remove disable_test_module fixture

* Fix test_basic may fail

Co-authored-by: 刘宝 <po.lb@antfin.com>
Co-authored-by: Max Fitton <mfitton@berkeley.edu>
2020-08-29 23:09:34 -07:00
Stephanie Wang
9a31166050
Option to disable profiling and task timeline (#10414) 2020-08-29 11:35:22 -07:00
Lixin Wei
eb66db3199
[Build] bug fixed for logging (#10364) 2020-08-28 09:17:08 -07:00
SangBin Cho
d206fbbc99
[Placement group] Scheduler map refactoring part 1. (#10381)
* In Progress

* done.

* Address code review.
2020-08-28 00:57:09 -07:00
SongGuyang
cb70864c04
[cpp worker] support cluster mode and object Put/Get works (#9682) 2020-08-28 13:53:36 +08:00
SangBin Cho
17f465d5c1
[Core] Improve raylet failure error msg (#10345)
* Improve error message.

* Lint.

* Addressed code review.
2020-08-27 12:53:18 -07:00
Clark Zinzow
0178d6318e
[Core] Expand job ID to 4 bytes by removing object flag bytes. (#10187) 2020-08-27 14:08:17 -05:00
Stephanie Wang
f75dfd60a3
[api] API deprecations and cleanups for 1.0 (internal_config and Checkpointable actor) (#10333)
* remove

* internal config updates, remove Checkpointable

* Lower object timeout default

* remove json

* Fix flaky test

* Fix unit test
2020-08-27 10:19:53 -07:00
Edward Oakes
60665fc936
Clean up task dependency and scheduler metrics (#10340) 2020-08-26 22:56:03 -05:00
Lixin Wei
4b856fa416
[Core]Async updating issue fixed for actor's num_restart (#10176)
* bug fixed for num_restart updating

* add log

* log updated

* lint

* fixed

* Update src/ray/gcs/gcs_server/gcs_actor_manager.cc

Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>

* bug fixed

* bug fixed

* test passed

Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>
2020-08-26 11:49:26 -07:00
Edward Oakes
c35ad8237d
[metrics] Clean up object manager stats (#10316) 2020-08-26 13:43:06 -05:00
Edward Oakes
916a19363f
Clean up actor metrics (#10317) 2020-08-26 10:21:15 -05:00
Edward Oakes
cbd9632f3a
Fix wait timeout logic (#10199) 2020-08-25 22:41:39 -05:00
fyrestone
08adbb371f
Cross language exception (#10023) 2020-08-26 10:46:05 +08:00
Edward Oakes
1e99b814f0
Remove unused scheduler states (#10318)
* remove unused state

* remove unused states
2020-08-25 18:56:21 -07:00
Stephanie Wang
d4537ac1ce
[core] Try to schedule tasks locally before spilling over to remote nodes (#10302)
* Regression test

* Spillback

* Remove check for actor tasks
2020-08-25 15:01:59 -07:00
kisuke95
24a7a8a04d
[Streaming] Build fix (#10233) 2020-08-25 11:37:21 -07:00
fyrestone
05c103af94
[Dashboard] Start the new dashboard (#10131)
* Use new dashboard if environment var RAY_USE_NEW_DASHBOARD exists; new dashboard startup

* Make fake client/build/static directory for dashboard

* Add test_dashboard.py for new dashboard

* Travis CI enable new dashboard test

* Update new dashboard

* Agent manager service

* Add agent manager

* Register agent to agent manager

* Add a new line to the end of agent_manager.cc

* Fix merge; Fix lint

* Update dashboard/agent.py

Co-authored-by: SangBin Cho <rkooo567@gmail.com>

* Update dashboard/head.py

Co-authored-by: SangBin Cho <rkooo567@gmail.com>

* Fix bug

* Add tests for dashboard

* Fix

* Remove const from Process::Kill() & Fix bugs

* Revert error check of execute_after

* Raise exception from DashboardAgent.run

* Add more tests.

* Fix compile on Linux

* Use dict comprehension instead of dict(generator)

* Fix lint

* Fix windows compile

* Fix lint

* Test Windows CI

* Revert "Test Windows CI"

This reverts commit 945e01051ec95cff5fcc1c0bc37045b46e7ad9a6.

* Fix ParseWindowsCommandLine bug

* Update src/ray/util/util.cc

Co-authored-by: Robert Nishihara <robertnishihara@gmail.com>

Co-authored-by: 刘宝 <po.lb@antfin.com>
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: Robert Nishihara <robertnishihara@gmail.com>
2020-08-24 13:24:23 -07:00
Kai Yang
07f6cb17e4
[Core] Multi-tenancy: Refine worker env variable passing (#10191)
* Resolve issues with environment variable handling

* fix

* fix warning

* lint

Co-authored-by: Mehrdad <noreply@github.com>
2020-08-24 09:04:22 -07:00
fangfengbin
b61a79efd7
[Placement Group]Fix SigSegv bug (#10262)
* fix SigSegv bug

* fix review comments

* fix ut bug

Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-08-23 11:33:40 -07:00
Ian Rodney
32ed1a18b7
[hotfix] Fix lint in master (#10254) 2020-08-21 20:53:05 -07:00
Alex Wu
136c8ff19e
[NewScheduler] Pass test_basic.py (#10059)
* .

* .

* Cleanup

* .

* whoops

* Update src/ray/raylet/scheduling/cluster_task_manager.h

Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>

* Update src/ray/raylet/scheduling/cluster_task_manager.h

Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>

* CR

* .

* .

* done

* .

* Unit tests

Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>
2020-08-21 15:00:08 -07:00
Barak Michener
f03caa4532
rpc: Follow-up by sharing the core worker client pool within the core worker. (#10206)
* Share CoreWorkerClientPool

* Format
2020-08-21 11:01:22 -07:00
Stephanie Wang
85e57a7a98
[Object spilling] Look up the location of the primary raylet from the owner's metadata (#10197)
* Get the primary copy from the owner, python test, some node manager fixes

* fixes and todo

* update

* lint

* fix build
2020-08-20 14:46:59 -07:00
fangfengbin
a462ae2747
[Placement Group]Add strict spread strategy (#10174)
* support STRICT_SPREAD strategy

* fix review comments

* rebase master

* fix lint error

* fix lint error

Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-08-20 10:18:58 -07:00
SangBin Cho
224933b5e4
[Placement Group] Remove API part 2 (#10215)
* Initial progress done.

* Fix mistake.

* Addressed code review.

* Fix cpp build issue.

* Addressed code review.
2020-08-20 09:50:13 -07:00
fangfengbin
9734dbca3e
[Placement Group]Reschedule bundles when the node of bundles is dead (#10021) 2020-08-19 13:24:42 -07:00
SangBin Cho
263df6163c
[Placement Group] Placement group remove api part 1 (#10063)
* Added basic rpc calls.

* fix issues.

* Fix the gcs server not getting request issue.

* In Progress.

* Basic logic done. Tests are required.

* In progress.

* In progress in refactoring context.

* Revert "In progress in refactoring context."

This reverts commit 38236256cf1306c60dd203e75d45ceb4509c8106.

* Working now.

* Python test works.

* Lint.

* Addressed code review.

* Addressed code review.

* Lint.

* Added unit tests.

* Done, but one of unit tests fail

* Addressed code review.

* Addressed the last code review.

* Fix the wrong test case.
2020-08-18 12:44:00 -07:00