Eric Liang
37053443b4
Restore set omp ( #7051 )
2020-02-04 15:02:23 -08:00
Simon Mo
dd095c476a
Move serve and asyncio tests to bazel ( #6979 )
2020-02-04 08:29:16 -08:00
Edward Oakes
844f607c93
Collect contained ObjectIDs during deserialization ( #7029 )
2020-02-03 22:49:14 -08:00
Simon Mo
5e8ded344a
[Serve] Fix flaky test with nursery double init ( #6982 )
2020-02-03 21:32:12 -08:00
Edward Oakes
984490d2be
Collect object IDs during serialization ( #6946 )
2020-02-03 18:38:11 -08:00
SangBin Cho
ca5a9c6739
Exclude test profiling info endpoint ( #7030 )
...
* Skip test_profiling_info_endpoint when pytest running locally
* Fixed formatting.
* Fixed the reason for skipping the test based on pr comments
2020-02-03 16:49:03 -08:00
Siyuan (Ryans) Zhuang
42cbf801e1
workaround for python3.5 fast numpy serialization ( #6675 )
2020-02-03 13:08:18 -08:00
Mitchell Stern
271de9b04d
[Dashboard] Remove files used by previous dashboard ( #7028 )
2020-02-03 11:51:09 -08:00
Eric Liang
740bd00651
Use 100k for memory limit #7013 )
2020-02-02 22:48:59 -08:00
Eric Liang
f939cb39ee
always set it ( #7006 )
2020-02-02 22:48:29 -08:00
Richard Liaw
52c33b53f7
[minor][core] fix gpu ids for SLURM ( #7014 )
...
* fix gpu ids
* fix
2020-02-02 16:09:22 -08:00
Frank Röder
9d04f6617a
[tune] Align scheduler mode with search algorithm in example of… ( #7012 )
2020-02-02 15:06:39 -08:00
Philipp Moritz
cc43c9c1a2
Increase limit for autoscaler keys ( #7007 )
2020-02-01 22:29:40 -08:00
Eric Liang
8b4b49662b
Force OMP_NUM_THREADS=1 if unset ( #6998 )
...
* force omp
* update
* set
* workers
* link
2020-02-01 11:46:11 -08:00
Edward Oakes
92525f35d1
Remove raylet client from Python worker ( #6018 )
2020-01-31 18:23:01 -08:00
Edward Oakes
341a921d81
Remove vanilla pickle serialization for task arguments ( #6948 )
2020-01-31 16:52:43 -08:00
Simon Mo
4e2c4302e8
Remove test_gather_benchmark ( #6983 )
2020-01-31 09:42:05 -08:00
Maksim Smolin
64c8996a43
[raysgd] Update to fix examples out of the box ( #6966 )
...
* Update tf-example-sgd dependencies, AMI, and instance type
* Make PyTorch dependency optional
* Re-implement optional torch import
* Update tensorflow_train_example
* Setup tf-example-sgd config for SGD development
* Document the MultiWorkerMirroredStrategy behavior
* Run scripts/format
* Undo GPU default for CI
* Remove dev deploy file_mounts
* Update docs on tf_runner and tf_trainer
* Fix formatting
* Remove the debug file-mounts again
* Disable cifar example GPU usage by default so CI runs properly
* Mark failing PyTorch test as flaky
* Clarify the tf SGD sanity check
* Run format script
* Update tf-example-sgd.yaml
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-01-31 01:16:57 -08:00
roireshef
dc7a555260
[rllib] Feature/histograms in tensorboard ( #6942 )
...
* Added histogram functionality to custom metrics infrastructure (another tab in tensorboard)
* updated example to include histogram metric
* added histograms to TBXLogger
* add episode rewards
* lint
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-01-30 22:02:53 -08:00
SangBin Cho
df518849ed
Remove ray.wait timeout warning for milliseconds ( #6980 )
2020-01-30 19:07:52 -08:00
Amog Kamsetty
c8bf0715a6
[Parallel Iterator] Local Shuffle ( #6921 )
...
* adding local shuffle and corresponding tests
* fix quotes
* addressing comments and adding seed argument
* formatting
* fix formatting issues
* change test size from small to medium
* addressing comments
2020-01-30 12:27:38 -08:00
Ameer Haj Ali
b8135da122
Adding dependencies for scikit-learn in travis ( #6969 )
...
* Revert "Revert "Support of scikit-learn with ray joblib backend (#6925 )" (#6957 )"
This reverts commit 86100bc119
.
* adding scikit-learn to dependencies
2020-01-30 09:46:54 -08:00
Simon Mo
660eef6502
[Serve] Async Router ( #6873 )
2020-01-30 09:34:47 -08:00
Simon Mo
1e3a34b223
Rewrite the async api documentation ( #6936 )
...
* Rewrite the async api documentation
* Apply suggestions from code review
Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>
* clearify comment
* Add quickstart
* Add reference for async in ray.get ray.wait docstring
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
2020-01-30 09:34:09 -08:00
Richard Liaw
5ab395236b
[tune] Experiment stopping API ( #6886 )
2020-01-30 00:34:08 -08:00
Eric Liang
86100bc119
Revert "Support of scikit-learn with ray joblib backend ( #6925 )" ( #6957 )
...
This reverts commit a7ecda6017
.
2020-01-29 14:56:09 -08:00
ijrsvt
ff706660d2
Kill Actor UI addition ( #6955 )
2020-01-29 14:32:19 -08:00
Edward Oakes
c2be794f10
Remove try/except import asyncio for python 2 ( #6947 )
2020-01-29 09:17:07 -08:00
Richard Liaw
037aa2b961
[sgd] Refactor PyTorch SGD Documentation. ( #6910 )
...
* Refactor documentation and directory structurre
* update loss
* ,ore examples
* fix comments
* more code
* svgs
* formatting
* more_docs
* more writing
* comments ready
* move
* whitespace
* examples
* fix
* bold
* pytorch
* batch
* fix
* fix test
* Apply suggestions from code review
* quarantinegp
* tests/
* fix missing
2020-01-29 08:51:01 -08:00
Simon Mo
26d749bc18
[Dashboard] Render HTML inline ( #6932 )
2020-01-28 10:39:22 -08:00
Eric Liang
e659699ca9
[tune] Fix directory naming regression ( #6839 )
2020-01-27 15:53:40 -08:00
Alex Wu
d9a2294298
Ssh identities only ( #6931 )
2020-01-27 17:01:21 -06:00
Richard Liaw
e0078a0d78
[autoscaler][minor] default -> latest_dlami ( #6922 )
...
* config
* latest
* Update python/ray/autoscaler/aws/config.py
2020-01-27 14:34:07 -08:00
Ameer Haj Ali
a7ecda6017
Support of scikit-learn with ray joblib backend ( #6925 )
2020-01-27 15:00:00 -06:00
Simon Mo
396d7fafc8
UI improvement for asyncio ( #6905 )
2020-01-27 12:45:51 -08:00
mehrdadn
bde575b8dd
Revert "Use Boost.Process instead of pid_t ( #6510 )" ( #6909 )
...
This reverts commit fb8e3615d5
.
2020-01-26 10:26:44 -06:00
Eric Liang
2fb53396ad
[rllib] [experimental] Decentralized Distributed PPO for torch (DD-PPO) ( #6918 )
2020-01-25 22:36:43 -08:00
hyggan
552156f22d
[tune] Handles nan case for AsyncHyperBand ( #6916 )
2020-01-25 17:26:30 -08:00
Ujval Misra
ed9de8b2fa
[tune] Expose progress reporter to users ( #6915 )
...
* Pluggable progress reporter
* Fix types
* Fix bug, address comments
* lint
* Add convenience function and test
* lint
* Use trials instead of trial_runner
* Add docs
* Update docs
* Fix doc examples
* More doc updates
* Address comments, add configurable frequency
* use reward
2020-01-25 12:28:05 -08:00
Yunzhi Zhang
aa5427ca78
[Dashboard] Kill actor ( #6906 )
2020-01-24 17:21:44 -08:00
Mitchell Stern
33423627ca
[Dashboard] Add profiling button to logical view ( #6901 )
2020-01-24 11:52:14 -08:00
Daniel Edgecumbe
e516c50745
[autoscaler]: Kill workers if the monitor raises an exception ( #3977 )
...
Co-authored-by: CJosephides <cjosephides@gmail.com>
2020-01-23 14:12:52 -06:00
Ujval Misra
1558307ac4
[tune] Prevent MEMORY checkpoints from breaking trial FT ( #6691 )
...
* Prevent MEMORY checkpoints from breaking FT
* Add save/pause/resume/restore test
* change checkpoint return value based on status
* Fix test_checkpoint_manager_tests.
* Fix test + checkpoint manager bug
* lint
* Add docstring
* Add docstring to checkpoint_manager constructor
* Change variable name for clarity
* Revert on_checkpoint docstring wording
* Break after success
* nit: more informative warning
* Quarantine test
2020-01-22 23:17:09 -08:00
Yunzhi Zhang
0834bda8c1
[Dashboard] Display actor task execution info ( #6705 )
...
Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>
2020-01-22 22:33:55 -08:00
Sven Mika
ae9a3a2237
[RLlib] from_config util method for framework agnostic components; start moving RLlib tests into Bazel. ( #6865 )
2020-01-22 17:02:58 -08:00
Simon Mo
5f527816fe
Fix async actor high cpu utilization when idle ( #6877 )
2020-01-22 16:07:08 -08:00
Simon Mo
4dd41844d0
Ignore blocking ray.wait if timeout is zero ( #6891 )
2020-01-22 16:05:34 -08:00
Richard Liaw
2b0e93586f
[autoscaler] Auto-replace "DEFAULT" with most recent DLAMI ( #6848 )
...
* try_this
* fix
* actual fix
* default
2020-01-21 13:54:04 -08:00
Richard Liaw
4edfaf2f38
[tune] Support callable objects in variant generation ( #6849 )
...
* minorcallable
* format
2020-01-21 10:24:25 -08:00
Stephanie Wang
815cd0e39a
Task and actor fate sharing with the owner process ( #6818 )
...
* Add test
* Kill workers leased by failed workers
* merge
* shorten test
* Add node failure test case
* Fix FromBinary for nil IDs, add assertions
* Test
* Fate sharing on node removal, fix owner address bug
* lint
* Update src/ray/raylet/node_manager.cc
Co-Authored-By: Zhijun Fu <37800433+zhijunfu@users.noreply.github.com>
* fix
* Remove unneeded test
* fix IDs
Co-authored-by: Zhijun Fu <37800433+zhijunfu@users.noreply.github.com>
2020-01-20 16:44:04 -08:00