Amog Kamsetty
7dbd0ff824
fix example ( #10964 )
2020-09-23 10:33:19 -07:00
SangBin Cho
390107b6cb
[Core] Allow to pass node ip address to gcs server. ( #10946 )
...
* Allow to pass node ip address to gcs server.
* Fix.
* Addressed code review.
* Fixed an error.
* Addressed code review.
2020-09-23 01:52:26 -07:00
Lee moon soo
df4c3abe30
[autoscaler] Staroid node provider ( #10956 )
2020-09-22 21:25:29 -07:00
Simon Mo
7fbe076813
[Serve] Unwrap Flask Request from ServeHandle ( #10845 )
2020-09-22 12:44:23 -07:00
Kai Yang
864d1d2b59
[Core] Multi-tenancy: Kill idle workers in FIFO order ( #10597 )
...
* Kill idle workers in FIFO order
* Update test
* minor update
* Address comments
* fix after merge
* fix worker_pool_test
2020-09-22 10:59:11 -07:00
architkulkarni
67c653c053
[Serve] Only install dataclasses on Python 3.6 ( #10936 )
2020-09-22 09:39:39 -07:00
Kai Fricke
2d16ab2e16
[tune] Remove unnecessary wandb group parameter ( #10950 )
2020-09-22 09:36:51 -07:00
Sumanth Ratna
770c3633f0
Update max_failures kwarg docstring ( #10953 )
2020-09-22 09:02:15 -07:00
SangBin Cho
9d205cda86
[Tests] Fix flaky test_failures by using semaphore instead of sleep ( #10942 )
...
* Use semaphore to avoid timing issues.
* Addressed code review.
2020-09-21 23:39:49 -07:00
Kai Fricke
6247740b94
[tune] sort running trials to top in status table ( #10926 )
...
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-09-21 13:39:51 -07:00
Kai Fricke
50d63b8077
[tune] update pt tutorial docs ( #10925 )
...
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-09-21 13:33:37 -07:00
Amog Kamsetty
d1d4743702
[Ray SGD] FP16 Hotfix ( #10931 )
2020-09-21 13:10:10 -07:00
Sumanth Ratna
9da7bdcc8e
Use master for links to docs in source ( #10866 )
2020-09-19 00:30:45 -07:00
Eric Liang
6a227ae501
[autoscaler] Split autoscaler interface public private ( #10898 )
2020-09-18 18:16:23 -07:00
Alex Wu
9a07c7b963
[1.0] Remove args from ray start ( #10659 )
2020-09-18 16:41:23 -07:00
Alex Wu
e56f2b8586
[autoscaler] hotfix calculate_node_resources ( #10874 )
2020-09-18 13:39:00 -07:00
architkulkarni
102c498653
[Doc] Fix RayServeHandle doc ( #10896 )
2020-09-18 10:29:21 -07:00
Kai Fricke
508cfa3540
[tune] Support yield
and return
statements ( #10857 )
...
* Support `yield` and `return` statements in Tune trainable functions
* Support anonymous metric with ``tune.report(value)``
* Raise on invalid return/yield value
* Fix end to end reporter test
2020-09-17 20:18:35 -07:00
SongGuyang
5cbc411e38
[cpp worker] support cluster mode ( #9977 )
2020-09-18 11:08:18 +08:00
Eric Liang
1b295a17cb
Add accelerator-type to multi node type example YAML ( #10871 )
2020-09-17 17:09:35 -07:00
Ian Rodney
a159ae72b3
[autoscaler] Ensure run_init
happens in all cases ( #10820 )
...
* taking a stab at the horror that is NodeUpdaterThread
* added tests
2020-09-17 11:23:56 -07:00
Ian Rodney
47d7d83b6f
[docker] Fix GPU support for tensorflow ( #10779 )
2020-09-17 10:56:58 -07:00
fyrestone
269e1f0b98
Fix push_error_to_driver_through_redis ( #10848 )
...
Co-authored-by: 刘宝 <po.lb@antfin.com>
2020-09-17 10:50:44 -07:00
Keqiu Hu
8a77cf925a
[cli][ray] update ray cli message ( #10823 )
2020-09-17 09:26:55 -07:00
Kai Fricke
ee99c919e3
[tune] lazy trials ( #10802 )
...
* Lazily fill trial queue
* Update interface
* Update end to end reporter test
* Removed `next_trials()` method
* Lint
* Print total number of samples to be generated in progress reporter. Allow infinite samples.
* Nit check
2020-09-17 08:51:46 -07:00
SangBin Cho
fe4c6ab778
[Core] Remove unused credis related code. ( #10849 )
...
* Done.
* Lint.
2020-09-16 23:34:54 -07:00
Alex Wu
6f479d4697
[hotfix] CPU Detection ( #10821 )
2020-09-16 21:02:52 -07:00
Richard Liaw
d3feb83053
[tune] check for running session ( #10840 )
2020-09-16 18:55:11 -07:00
Ameer Haj Ali
54c616e9f3
[autoscaler] Enforce CommandRunnerInterface abstractions ( #10822 )
...
Co-authored-by: root <root@ip-172-31-28-155.us-west-2.compute.internal>
Co-authored-by: Ameer Haj Ali <ameerhajali@Ameers-MacBook-Pro.local>
2020-09-16 16:56:03 -07:00
SangBin Cho
1fdb7ef6c3
[docs] Placement group documentation ( #10555 )
...
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-09-16 16:07:55 -07:00
Alex Wu
9d0a9e73de
Add java yaml example ( #10835 )
...
* java example
* Update python/ray/autoscaler/aws/example-java.yaml
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-09-16 14:35:10 -07:00
SangBin Cho
d7d4e1c87b
[Doc] Document options method ( #10830 )
...
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-09-16 14:23:24 -07:00
fyrestone
50784e2496
[Dashboard] Dashboard node grouping ( #10528 )
...
* Add RAY_NODE_ID environment var to agent
* Node ralated data use node id as key
* ray.init() return node id; Pass test_reporter.py
* Fix lint & CI
* Fix comments
* Minor fixes
* Fix CI
* Add const to ClientID in AgentManager::Options
* Use fstring
* Add comments
* Fix lint
* Add test_multi_nodes_info
Co-authored-by: 刘宝 <po.lb@antfin.com>
2020-09-16 10:17:29 -07:00
Tao Wang
eb891e6c94
[TEST]make retry counts ( #10463 )
...
* make retry counts
* lint
* use function
* move methond inside
2020-09-16 10:12:30 -07:00
Kai Fricke
c9fafe7733
[tune] added type hints ( #10806 )
...
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-09-15 21:03:56 -07:00
Yiran Wang
c198869721
[Autoscaler] Change poll interval to 5 sec when checking VMs status ( #10462 )
2020-09-15 17:23:29 -07:00
Amog Kamsetty
d5a7c53908
[Ray SGD] use_local flag + Worker group abstraction ( #10539 )
...
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-09-15 11:58:57 -07:00
Kai Fricke
0865d68466
[tune] convert fallback representation to numbers in wandb integration ( #10799 )
2020-09-15 11:47:11 -07:00
Max Fitton
334e11c704
[Dashboard] Fix a number of console warnings caused by incorrect usage of react keys ( #10749 )
...
* Fix a number of console warnings caused by incorrect usage of react keys
* lint
* lint
Co-authored-by: Max Fitton <max@semprehealth.com>
2020-09-15 11:21:44 -07:00
Siyuan (Ryans) Zhuang
c2dff126aa
[Core] Warn when failed to get the exact version of pickle5 ( #10731 )
...
* warn when failed to get the exact version of pickle5
* add missing spaces
2020-09-15 11:21:27 -07:00
Ameer Haj Ali
6edacb22b8
Fix abstraction violations in command_runner interface ( #10715 )
...
* Fix abstraction violations in command_runner interface
* user guide
* lint
* breaking abstraction in commands
* extra initialization commands
* more cleanup
* small fixes
* fix test_integration_kubernetes.py
* lint
Co-authored-by: root <root@ip-172-31-28-155.us-west-2.compute.internal>
Co-authored-by: Ameer Haj Ali <ameerhajali@Ameers-MacBook-Pro.local>
2020-09-14 20:28:38 -07:00
Hao Chen
1a5cfe0b79
[Java] rename config ray.redis.address to ray.address ( #10772 )
2020-09-15 11:13:19 +08:00
Eric Liang
ea3e4d622e
Restore plasma directory option ( #10784 )
2020-09-14 19:05:55 -07:00
Richard Liaw
e3fd5eceec
[minor] fix warning about docker cpus ( #10768 )
2020-09-14 09:08:34 -07:00
Kai Yang
4c03f7ca2f
[Core] Multi-tenancy: Reject worker registration if job has finished ( #10569 )
2020-09-14 14:49:31 +08:00
Ian Rodney
5bc2ba38fd
[docker] Detect CPUs in container correctly ( #10507 )
...
Co-authored-by: simon-mo <simon.mo@hey.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Alex Wu <itswu.alex@gmail.com>
2020-09-13 23:40:48 -07:00
Richard Liaw
660aee6311
[cli] make test failure less verbose + print ssh ( #10767 )
2020-09-13 23:37:10 -07:00
Alex Wu
9795356ac0
[hotfix] Autoscaler's K8 support ( #10766 )
...
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-09-13 16:43:57 -07:00
Eric Liang
00fefba72c
[autoscaler] Usability improvements in logging ( #10764 )
2020-09-13 13:54:44 -07:00
Alex Wu
d0b73647b4
[Autoscaler] Unmanaged nodes ( #10513 )
2020-09-13 11:58:47 -07:00