architkulkarni
9cb65d5e2f
[Core] Move wheel URL utils from test_utils to utils ( #16386 )
2021-06-23 13:41:02 -05:00
chenk008
82d92d0d61
[Core]Use worker shim PID to check worker registration ( #16398 )
2021-06-22 21:12:53 -07:00
Kai Fricke
a1765ac627
[tune] move to local parameter registry for tune.with_parameters()
( #16611 )
2021-06-22 17:58:11 -07:00
Chris K. W
b4f2cbce02
[Client] Disconnect on dataclient error ( #16588 )
...
* disconnect when main thread finds dataclient shut down, update error messages
* Add test_dataclient_disconnect to small tests
* drop unused var
* add __main__ section to test
* avoid direct ray import
* rerun
2021-06-22 16:46:10 +03:00
Tao Wang
d1db4744e3
[large scale]Get next job id from gcs instead of redis - python part ( #16528 )
2021-06-22 14:06:30 +08:00
Stephanie Wang
e7b752cf33
[core] Fix bug in task dependency management for duplicate args ( #16365 )
...
* Pytest
* Skip on windows
* C++
2021-06-21 22:32:04 -07:00
SangBin Cho
5efeb5334b
Revert "Same worker id in python and c++ ( #16568 )" ( #16600 )
...
This reverts commit 9b5c0c32da
.
2021-06-21 18:58:31 -07:00
Ian Rodney
d3832ab2e1
[Client] Fix gRPC Timeout Options ( #16554 )
2021-06-21 14:25:41 -07:00
Alex Wu
9b5c0c32da
Same worker id in python and c++ ( #16568 )
...
* .
* .
* test
Co-authored-by: Alex Wu <alex@anyscale.com>
2021-06-21 13:22:52 -07:00
Siyuan (Ryans) Zhuang
b7995f66a4
[Workflow] Sync mode fault tolerance ( #16282 )
2021-06-21 10:05:27 -07:00
Qinghao Hu
d922a79385
[sgd] DataParallel after Apex init. ( #15645 )
...
* [FIX] DataParallel after Apex init.
* lint
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-06-20 22:44:15 -07:00
lanlin
e5b50fcc9d
[tune] allow to read trial results from json files in Analysis ( #15915 )
...
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-06-20 20:41:48 -07:00
Dmitri Gekhtman
cb878b6514
[doc][kubernetes] K8s doc updates ( #16570 )
2021-06-20 19:38:34 -07:00
Eric Liang
a0da009645
Allocate inbound object chunks using CreateRequestQueue instead of immediate allocation ( #16523 )
2021-06-20 09:22:12 -07:00
Yorick van Zweeden
db7e2c8f21
Remove outdated code from PopulationBasedTrainingReplay ( #16564 )
...
Co-authored-by: Yorick van Zweeden <git@yorickvanzweeden.nl>
2021-06-20 15:22:52 +02:00
Amog Kamsetty
e6d9f0b393
[Dask] Support Dask 2021.06.1 ( #16547 )
2021-06-19 18:22:23 -07:00
Achal Shah
eadee8aba7
[docs] Update API docs for ray.init ( #16533 )
...
The incorrect indentation caused the docs render weirdly:
https://docs.ray.io/en/master/package-ref.html
2021-06-18 18:02:44 -07:00
Alex Wu
319d4fb164
Job timestamp should always be in milliseconds (fixed) ( #16548 )
...
* .
* Revert "Revert "Job timestamp should always be in milliseconds (#16455 )" (#16545 )"
This reverts commit 5030ed8588
.
* .
* .
* .
Co-authored-by: Alex Wu <alex@anyscale.com>
2021-06-18 17:07:21 -07:00
Amog Kamsetty
416cf3a2e7
Revert "Revert "Enable TryCreateImmediately to use the fallback allocation" ( #16542 )" ( #16544 )
...
This reverts commit 36fd741e6f
.
2021-06-18 15:39:37 -07:00
Jiao
39cc81c633
[serve] Fix ray serve shutdown to properly go through controller ( #16524 )
2021-06-18 17:18:04 -05:00
architkulkarni
3ba1cb851e
[Core] [runtime env] Print message on driver when installing conda or pip ( #16516 )
2021-06-18 16:02:46 -05:00
Amog Kamsetty
e6fa8c0015
[Hotfix] [Dask] Fix Dask Pin ( #16552 )
...
* dask-pin-36
* fix
2021-06-18 13:31:50 -07:00
Amog Kamsetty
904232b4f8
[Dask] Pin dask version to 2021.06.0 ( #16546 )
2021-06-18 12:40:14 -07:00
Alex Wu
5030ed8588
Revert "Job timestamp should always be in milliseconds ( #16455 )" ( #16545 )
...
This reverts commit 1df19a04fe
.
2021-06-18 12:37:05 -07:00
Amog Kamsetty
36fd741e6f
Revert "Enable TryCreateImmediately to use the fallback allocation" ( #16542 )
...
This reverts commit 41cf2e3d50
.
2021-06-18 12:22:18 -07:00
Frank Luan
7588938e3c
Sorting benchmark ( #16327 )
...
* [WIP] Sorting benchmark
* Separate num_mappers and num_reducers
* Add tests
* Fix tests
* flake8
* flake8
* yapf
* Skip test on Windows
* Fix OS X test failure; test Windows again
* oops
2021-06-18 10:54:18 -07:00
Eric Liang
41cf2e3d50
Enable TryCreateImmediately to use the fallback allocation
2021-06-18 10:49:34 -07:00
architkulkarni
6498ca3995
[Core] [runtime env] Don't delete working_dir from runtime env ( #16475 )
2021-06-18 10:15:20 -05:00
Chris K. W
a2c842ee3c
[Client] Add separate error message if dataclient has disconnected before a request is sent ( #16508 )
...
* Add earlier error message
* Adjust error message
2021-06-18 08:06:25 -07:00
Kai Fricke
172d33be02
[tune] Use unbuffered training when checkpoint_at_end is used. ( #16504 )
2021-06-18 14:19:14 +01:00
Kai Fricke
e13f0a4d91
[tune] Add option to keep random values constant over grid search ( #16501 )
2021-06-18 11:30:27 +01:00
Chris K. W
c91a1b1f92
[Client] Add warnings when user schedules many tasks with ray client ( #16454 )
...
* Add warnings when user schedules many tasks with ray client
* add test_client_warnings to BUILD
* better variable names
* use util.debug.log_once()
* batching -> explanation of batching
* Switch to warnings.warn
* Add links to Ray Design Pattern doc with code snippets
* Cleaner linking and refer to sections directly
* Better testNoWarning
* add sys.exit(pytest.main(...))
* Update python/ray/util/client/worker.py
Co-authored-by: Ameer Haj Ali <ameerh@berkeley.edu>
* Update python/ray/util/client/worker.py
Co-authored-by: Ameer Haj Ali <ameerh@berkeley.edu>
* better error messages
* Switch links to new readthedocs sections
* Revert "Switch links to new readthedocs sections"
This reverts commit d3785bf50459d89fb3f13966a030e954799309a8.
Co-authored-by: Ameer Haj Ali <ameerh@berkeley.edu>
2021-06-18 13:17:37 +03:00
Alex Wu
6696c0c165
Revert "[Placement Group] Support infeasible placement groups for Placement Group. ( #16188 )" ( #16509 )
...
This reverts commit 7f91cfedd5
.
2021-06-17 11:04:01 -07:00
architkulkarni
8d9a41af55
[Core] [runtime env] Merge actor/task's runtime env with JobConfig's runtime env ( #16378 )
2021-06-17 11:20:32 -05:00
Antoni Baum
f8e9f171df
[tune] Add add_evaluated_point
method ( #16485 )
2021-06-17 11:30:48 +01:00
Kai Fricke
e547a27944
[tune] Track live trials in a set in the TrialRunner to reduce linear scans ( #15811 )
2021-06-17 01:36:07 -07:00
Alex Wu
1df19a04fe
Job timestamp should always be in milliseconds ( #16455 )
...
* .
* .
* .
* .
Co-authored-by: Alex Wu <alex@anyscale.com>
2021-06-17 00:05:55 -07:00
DK.Pino
7f91cfedd5
[Placement Group] Support infeasible placement groups for Placement Group. ( #16188 )
...
* init
* update comment
* update logical
* ut failing
* compile passing
* add ut
* lint
* fix comment
* lint
* fix ut and typo
* fix ut and typo
* lint
* typo
2021-06-16 21:48:39 -07:00
Alex Wu
45357ff590
[core] Fix multi-node placement group/job config bugs ( #16345 )
...
* .
* .
* seems to work?
* seems to work?
* .
* implement delete
* implement delete
* .
* tests
* .
* .
* .
* fix
* .
* .
* .
* .
* fix
* fix
* bump timeout
* bump timeout
* .
Co-authored-by: Alex Wu <alex@anyscale.com>
2021-06-16 21:12:20 -07:00
Dmitri Gekhtman
74bd332d88
Sidestep temp permissions issue when writing cluster address ( #16473 )
...
* not-enough
* comments
* tweak
* fix
* add-test-why-not
* fix
2021-06-16 21:11:08 -07:00
Maxim Egorushkin
742da5e68b
[tune] get_checkpoints_paths .tune_metadata file search bug fixed ( #16396 )
...
Co-authored-by: Maxim Egorushkin <maxim.egorushkin@gmail.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-06-16 20:48:41 -07:00
Eric Liang
3209084213
Fix fd reuse errors with plasma fallback allocation ( #16451 )
2021-06-16 19:28:23 -07:00
Amog Kamsetty
b986938f0f
Revert "[Pubsub] Use a pubsub module for Ownership based object directory ( #16407 )" ( #16486 )
...
This reverts commit 90599d3562
.
2021-06-16 15:38:11 -07:00
Kai Fricke
9352cb781c
[release tests] Fix microbenchmark base image, network overhead cluster wait time, add long running tests ( #16355 )
2021-06-16 21:37:17 +01:00
Jiao
c6436ba7d6
[Serve] Add ray serve's logging context manager ( #16468 )
...
* Add ray serve's logging context manager
* Add ray serve's logging context manager
run formatting script scripts/format.sh
* fix missing package-lock json file
* linter
Co-authored-by: Jiao Dong <jiaodong@anyscale.com>
2021-06-16 13:17:07 -07:00
Clark Zinzow
00eb833de2
[Core] Stopgap fix for async actor lost object bug, and adds reproduction as test. ( #16414 )
...
* Support asyncio with max_concurrency == 1.
* Added test that reproduces lost object error.
* Create a fiber thread per caller instead of sharing a fiber thread among all callers.
* Formatting.
* Remove debug print statement.
* Try to accomodate dumb stupid linter that apparently doesn't know that async list comprehensions landed in Python 3.6, let alone await in list literals.
2021-06-16 12:39:45 -07:00
SangBin Cho
5997d19a5a
[Test] Global gc unit test flakniess fix ( #16471 )
2021-06-16 09:26:04 -07:00
SangBin Cho
90599d3562
[Pubsub] Use a pubsub module for Ownership based object directory ( #16407 )
...
* in progress
* In progress 2
* progress
* OBOD pubsub done
* Fix
* Fix a bug.
* Clean up getObjectLocationOwner
* Fix a build issue.
* Lint issue.
* test fix in progress
* continue debugging
* in progress
* Fix issues again.
* Formatting
* formating
* fix issues.
* Revert "fix issues."
This reverts commit 2da577e68abc6278e03d64a60e8b96c3136145bf.
* Fix a critical bug.
* Revert "Revert "fix issues.""
This reverts commit 6546ecbd1eb9798de0bf990b30b85a3ca3e5b4ad.
* Addressed code review.
2021-06-16 09:15:13 -07:00
Ian Rodney
90805d302f
[Client] Fix ArgParse ( #16456 )
...
Co-authored-by: Ian Rodney <ilr@anyscale.com>
2021-06-15 23:52:02 -07:00
Antoni Baum
ec7d7c8630
[Tune] Add soft imports test ( #16450 )
2021-06-15 18:50:21 -07:00