Commit graph

9914 commits

Author SHA1 Message Date
matthewdeng
19eabd7a55
[train] remove default num_workers (#19518)
* [train] remove default num_workers

* fix tests
2021-10-19 13:53:23 -07:00
Akash Patel
7e10f6a876
add missing <limits> header for prometheus_cpp (#19108) 2021-10-19 13:33:31 -07:00
gjoliver
2bd7932830
Add a script to analyze python module dependencies using static analysis (#18965) 2021-10-19 13:33:02 -07:00
mwtian
098ff36faa
[CI] Remove config that disables Bazel test result cache (#18701) 2021-10-19 13:31:42 -07:00
Edward Oakes
a596d59863
[serve] Modify serve debugger example to use current APIs (#19513) 2021-10-19 13:21:56 -07:00
matthewdeng
56e46c3c23
[train] add callbacks package compatibility (#19519) 2021-10-19 12:56:49 -07:00
Kai Fricke
3e8587644b
[ci/release] wrap all release test pip github installs in quotation marks (#19521) 2021-10-19 20:55:02 +01:00
Edward Oakes
4645893a5f
Add prototype of ray.serve.pipeline (#19278) 2021-10-19 11:36:49 -07:00
xwjiang2010
a6f9c93db0
Revert "[Datasets] Add support for slicing Arrow blocks that contain tensor columns. (#19494)" (#19517)
This reverts commit ad03917b8f.
2021-10-19 11:35:04 -07:00
Duarte OC
5af6152e76
[Serve] [Doc] Update docs with import missing (#19469) 2021-10-19 11:23:50 -07:00
Tao He
1dde588702
[Dataset] Support dataset from a single dataframe/table. (#18205) 2021-10-19 10:27:43 -07:00
Alex Wu
a819e417ac
Revert "[Hotfix] Revert "[Workflow] workflow.delete"" (#19248)
* Revert "Revert "[Workflow] workflow.delete (#19178)" (#19247)"

This reverts commit b59317520d.

* fix

* .

* .

* .

* Revert "."

This reverts commit 423b9b8e7e83f07cb0942b04e568e37ea0c62ba8.

* .

* .

* done?

* 4real

Co-authored-by: Alex <alex@anyscale.com>
2021-10-19 09:47:56 -07:00
Gagandeep Singh
cc00ab74da
[Windows] Fix test_fair_queuing and test_wait_timing (#19456)
* modified timeout in test_fair_qeueing

* bump bounds to pass the tests
2021-10-19 09:27:04 -07:00
mwtian
3260330e45
Disable clang-tidy until ergonomic issues are resolved (#19499)
Why are these changes needed?
Currently clang-tidy does not run inside scripts/format.sh. Also clang-tidy can produce false positive warnings. Maybe we can disable clang-tidy until ergonomic issues are resolved.
2021-10-19 08:45:25 -07:00
architkulkarni
b8941338d3
[runtime env] Raise error when creating runtime env when ray[default] is not installed (#19491) 2021-10-19 09:16:04 -05:00
Jiajun Yao
805ce453dd
[Java] Remove auto-generated pom.xml files. (#19475) 2021-10-19 17:35:37 +08:00
matthewdeng
4674c78050
[Train] Rename Ray SGD v2 to Ray Train (#19436) 2021-10-18 22:27:46 -07:00
Guyang Song
46b4c7464d
runtime env eager install by default (#19449) 2021-10-19 11:31:14 +08:00
Clark Zinzow
ad03917b8f
[Datasets] Add support for slicing Arrow blocks that contain tensor columns. (#19494) 2021-10-18 20:07:06 -07:00
Simon Mo
6f2eb1f9fa
[Serve] Use ray core metrics for autoscaling (#19038) 2021-10-18 19:32:49 -07:00
Chen Shen
b38ebd368c
[Dataset][nighlyt-test] spend less money #19488
Reduce the epoch and ensure everything runs in the same datacenter.
2021-10-18 18:53:50 -07:00
Gagandeep Singh
0b82135d2d
Use 127.0.0.1 in win32 as node ip addr (#19362) 2021-10-18 15:51:15 -07:00
gjoliver
e9f66cc394
Reduce success criteria for a few learning tests. (#19484) 2021-10-18 15:44:38 -07:00
Ian Rodney
74db390d15
[Docker] Fix Rsync (#19020)
* rsync down

* Rsync up, but not delete

* test fixes

* Explicit rsync -e

* Better copy check

* quick comment

* Additional fix to rsync_up
2021-10-18 14:35:22 -07:00
Kai Fricke
6798bdbb5d
Revert "Revert "[RLlib](deps): Bump tensorflow from 2.5.0 to 2.6.0 in /python/requirements/rllib"" (#19352)
This reverts commit bde9e058da.
2021-10-18 22:29:16 +01:00
Simon Mo
a081579f68
[Dashboard] Fix gRPC GCS healthcheck thread (#19360) 2021-10-18 13:18:06 -07:00
Eric Liang
1bb2b1fc49
[hotfix] Pin pyspark dep to 3.1.2 2021-10-18 13:10:06 -07:00
Jiajun Yao
4d9585773f
[Release] Remove release process doc (#19312) 2021-10-18 11:24:03 -07:00
Yi Cheng
f47f69d31e
[nightly] Add decision_tree_autoscaling_20_runs to nightly test 2021-10-18 11:19:40 -07:00
Kai Fricke
ad94eb03c6
[ci/release] wrap pip github installs in quotation marks to prevent comment errors (#19464) 2021-10-18 18:55:56 +01:00
mwtian
9742abb749
[Debugging] Print Python stack trace in addition to C++ stack trace, when Python worker crashes (#19423)
Why are these changes needed?
Right now the failure signal handler registered in Python worker is skipped on crashes like segfault, because C++ core worker overrides the failure signal handler here and does not call the previously registered handler. This prevents Python stack trace from being printed on crashes. The fix is to make the C++ fault signal handler to call the previous signal handler registered in Python. For example with the script below which segfaults,

import ray
ray.init()

@ray.remote
def f():
    import ctypes;
    ctypes.string_at(0)

ray.get(f.remote())
Ray currently only prints the following stack trace:

(pid=26693) *** SIGSEGV received at time=1634418743 ***
(pid=26693) PC: @     0x7fff203d9552  (unknown)  _platform_strlen
(pid=26693) [2021-10-16 14:12:23,331 E 26693 12194577] logging.cc:313: *** SIGSEGV received at time=1634418743 ***
(pid=26693) [2021-10-16 14:12:23,331 E 26693 12194577] logging.cc:313: PC: @     0x7fff203d9552  (unknown)  _platform_strlen
With this change, Python stack trace will be printed in addition to the stack trace above:

(pid=26693) Fatal Python error: Segmentation fault
(pid=26693)
(pid=26693) Stack (most recent call first):
(pid=26693)   File "/Users/mwtian/opt/anaconda3/envs/ray/lib/python3.7/ctypes/__init__.py", line 505 in string_at
(pid=26693)   File "stack.py", line 7 in f
(pid=26693)   File "/Users/mwtian/work/ray-project/ray/python/ray/worker.py", line 425 in main_loop
(pid=26693)   File "/Users/mwtian/work/ray-project/ray/python/ray/workers/default_worker.py", line 212 in <module>
This should make debugging crashes in Python worker easier, for users and Ray devs.

Also, try to initialize symbolizer in GCS, Raylet and core worker. This is a no-op on MacOS and some Linux environments (e.g. Ray on Ubuntu 20.04 already produces symbolized stack traces), but should make Ray more likely to have symbolized stack traces on other platforms.
2021-10-18 09:05:08 -07:00
Kai Fricke
eee05505b1
[ci/release] Add separate timeout parameter for prepare commands (#19459) 2021-10-18 16:29:25 +01:00
Kai Fricke
57fe405120
[ci/release] Bump long running release test timeouts to 6 minutes (#19458) 2021-10-18 16:27:53 +01:00
Chen Shen
9dba5e0ead
[dataset][nightly-test] fix pipeline ingest test (#19437) 2021-10-18 11:31:24 +01:00
Kai Fricke
6c6639a0d7
[ci/release] hotfix for undefined local variable (#19460) 2021-10-18 11:28:33 +01:00
matthewdeng
caa42d753c
[release] pin modin>=0.11.0 due to ray.services being removed (#19446) 2021-10-18 11:23:05 +01:00
Kai Fricke
c10d434713
[release] Allow commit hashes instead of URLs, add bisection utility (#19398) 2021-10-18 10:44:29 +01:00
Guyang Song
c04fb62f1d
[C++ worker] set native library path for shared library search (#19376) 2021-10-18 16:03:49 +08:00
Qing Wang
1047914ee0
[Java] Skip javadoc when deploying. (#19428) 2021-10-17 15:21:13 +08:00
Hao Zhang
c96c2e9b5f
[Collective] Enhance the collective group GC a bit (#19402) 2021-10-15 18:47:54 -07:00
Yi Cheng
a3dc07b1ee
[core] Fix some legacy issues (#19392)
## Why are these changes needed?
There are some issues left from previous PRs.

- Put the gcs_actor_scheduler_mock_test back
- Add comment for named actor creation behavior
- Fix the comment for some flags. 

## Related issue number
2021-10-15 18:06:01 -07:00
Chen Shen
a9c34d55e3
Throw if infinite (#19418) 2021-10-15 18:01:53 -07:00
Gagandeep Singh
d226cbf21a
Added StartupToken to idenitfy a process at startup (#19014)
* Added StartupToken to idenitfy a process at startup

* Applied linting formats

* Addressed reviews

* Fixing worker_pool_test

* Fixed worker_pool_test

* Applied linting formatting

* Added documentation for StartupToken

* Fixed linting

* Reordered initialisation of WorkerPool members

* Fixed Python docs

* Fixing bugs in cluster_mode_test

* Fixing Java tests

* Create and set shim process after verifying startup_token

* shim_process.GetId() -> worker_shim_pid

* Improvements in startup token and modifying java files

* update io_ray_runtime_RayNativeRuntime.h

* Fixed java tests by adding startup-token to conf

* Applied linting

* Increased arg count for startup_token

* Attempt to fix streaming tests

* Type correction

* applied linting

* Corrected index of startup token arg

* Modified, mock_worker.cc to accept startup tokens

* Applied linting

* Applied linting changes from CI

* Removed override from worker.h

* Applied linting from scripts/format.sh

* Addressed reviews and applied scripts/format.sh

* Applied linting script from ci/travis

* Removed unrequired methods from public scope

* Applied linting
2021-10-15 15:13:13 -07:00
Chen Shen
acfbf4c170
Fix from Dask bug in Datasets (#19409) 2021-10-15 15:04:52 -07:00
Gagandeep Singh
07064cddf9
Re-enabling tests from test_basic (#19384)
Why are these changes needed?
Related issue number
##19177

Quoting #19177 (comment) here,

The following tests fail when not skipped,

=================================== short test summary info ====================================
FAILED python\ray\tests\test_basic.py::test_user_setup_function - subprocess.CalledProcessErro...
FAILED python\ray\tests\test_basic.py::test_disable_cuda_devices - subprocess.CalledProcessErr...
FAILED python\ray\tests\test_basic.py::test_wait_timing - assert (1634209333.6099107 - 1634209...

Results (395.22s):
      36 passed
       3 failed
         - ray\tests/test_basic.py:197 test_user_setup_function
         - ray\tests/test_basic.py:220 test_disable_cuda_devices
         - ray\tests/test_basic.py:265 test_wait_timing
=================================== short test summary info ====================================
FAILED python\ray\tests\test_basic_3.py::test_fair_queueing - AssertionError: 23

Results (198.33s):
       1 failed
         - ray\tests/test_basic_3.py:169 test_fair_queueing
The following test passed when not skipped. Opening a PR to verify that.

def test_oversized_function(ray_start_shared_local_modes)
2021-10-15 14:02:57 -07:00
Kai Fricke
bb38c5cb1f
[tune] Fix result buffering case check (fixes bug introduced in #19140) (#19399) 2021-10-15 10:43:34 +01:00
Siyuan (Ryans) Zhuang
0d4b0ded27
[Serialization] Update cloudpickle to v2.0.0 (#19383)
* update cloudpickle to v2.0.0
2021-10-15 02:37:29 -07:00
Hao Zhang
4b92f34ada
[Collective] Remove an unnecessary cuda.stream.synchornize (#19400) 2021-10-14 21:33:59 -07:00
SangBin Cho
9bfe43198f
Use cleaner code for the map (#19386) 2021-10-14 21:18:42 -07:00
Matti Picus
f372bb07aa
Enable dashboard on Windows (#19319) 2021-10-14 14:42:22 -07:00