Antoni Baum
c40555c82b
[tune] Add define-by-run support to OptunaSearcher
( #17464 )
2021-08-03 16:11:58 +01:00
Kai Fricke
81d3d8705e
[tune] fix docs example for tune qloguniform ( #17539 )
2021-08-03 14:48:22 +01:00
Ian Rodney
7b1c207be3
[Dashboard] Allow agent to bind to Wildcard address. ( #17393 )
2021-08-03 02:03:19 -07:00
Antoni Baum
df2fce9ab6
[tune] Allow to pass searcher/scheduler string names to tune.run
( #17517 )
2021-08-03 09:28:03 +01:00
Eric Liang
fbd3f11533
OBOD log source error properly
2021-08-02 20:57:01 -07:00
Jiao
b13892e82a
Add initial 1.5.0 benchmark ( #17513 )
...
* add initial 1.5.0 benchmark
* add more logs
Co-authored-by: Jiao Dong <jiaodong@anyscale.com>
2021-08-02 18:19:02 -07:00
Eric Liang
f9552765cb
Avoid re-exporting same function repeatedly in dataset ( #17522 )
2021-08-02 18:15:25 -07:00
SangBin Cho
f1ccadbb27
Skip flaky windows object spilling tests ( #17510 )
2021-08-02 15:53:07 -07:00
matthewdeng
e89195bfb9
[SGD] add SGDv2 Trainer prototype implementation ( #17440 )
...
* wip
* formatting
* increase timeouts
* wip
* address comments
* comments
* fix
* address comments
* Update python/ray/util/sgd/v2/worker_group.py
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
* Update python/ray/util/sgd/v2/worker_group.py
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
* address comments
* formatting
* fix
* wip
* finish
* fix
* formatting
* remove reporting
* split TorchBackend
* fix tests
* address comments
* add file
* more fixes
* remove default value
* update run method doc
* add comment
* minor doc fixes
* lint
* add args to BaseWorker.execute
* address comments
* remove extra parentheses
* properly instantiate backend
* fix some of the tests
* fix torch setup
* fix type hint
* [SGD] add SGDv2 Trainer prototype implementation
* add fashion mnist test
* add HuggingFace example
* format
* formatting
* address comment
* address comments
* update comment
* Update python/ray/util/sgd/v2/examples/transformers/cluster.yaml
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
* update huggingface transformers
* update hugging face transformers
* fix shutdown on worker failure
* Update python/requirements/tune/requirements_tune.txt
* Update python/requirements/tune/requirements_tune.txt
* Update python/requirements/tune/requirements_tune.txt
* Update python/requirements/tune/requirements_tune.txt
* address comment and fix test
Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
2021-08-02 15:27:42 -07:00
Sven Mika
8a844ff840
[RLlib] Issues: 17397, 17425, 16715, 17174. When on driver, Torch|TFPolicy should not use ray.get_gpu_ids()
(b/c no GPUs assigned by ray). ( #17444 )
2021-08-02 17:29:59 -04:00
Alex Wu
af880378da
Lower threshold on scalability envelope many tasks ( #17511 )
2021-08-02 11:50:08 -07:00
Eric Liang
748cbbb23d
[hotfix] Parquet S3 reads broken due to pyarrow.lib.ArrowInvalid: S3 subsystem not initialized ( #17492 )
2021-08-02 11:48:48 -07:00
Ian Rodney
acde351cba
[GCP][GPU] Specify GPU name, not the full URL ( #17409 )
...
* convert gpu_name to URL
* update examples
* comment about scheduling
* fix node.py
* add test
2021-08-02 11:01:24 -04:00
Qingyun Wu
7678503d84
[Tune][docs]Correct reference name to CFO example ( #17503 )
2021-08-02 14:46:10 +01:00
Ian Rodney
b26ba7ba9e
[Dashboard] Allow Agent HTTP listening port to be specified. ( #17392 )
2021-08-02 02:09:50 -07:00
Richard Liaw
ecc7cf4c5e
[sgd] v2 documentation draft ( #17253 )
...
Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
Co-authored-by: Matthew Deng <matthew.j.deng@gmail.com>
Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>
2021-08-02 01:47:14 -07:00
Eric Liang
e812691909
Support top-level tensor values in dataset ( #17439 )
2021-08-01 22:45:21 -07:00
Lixin Wei
6f4c8ebdb2
[Core] Rmove the GetActorIfno RPC for Current Actor When Creating Actors ( #17334 )
2021-08-01 22:10:40 -07:00
Chen Shen
1b89fa8624
[object store refactor 2/n] More refactor on PlasmaAllocator, and add unit tests
2021-08-01 22:10:03 -07:00
Alex Wu
d9cd3800c7
Dataset speed up read ( #17435 )
2021-08-01 18:03:46 -07:00
Ivorius
6703091cdc
[Docs] Update example-full.yaml for ulimits as supported by docker. ( #17408 )
...
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-08-01 01:36:16 -07:00
Chen Shen
96c69f8c77
[object store refactor 1/n] Introduce IAllocator and PlasmaAllocator ( #17307 )
...
* initial commit
* address comments
2021-07-30 19:08:20 -07:00
Stephanie Wang
c9a2046287
[core] Update error message for hanging ray.get
( #17449 )
...
* Update error message
* x
2021-07-30 17:57:10 -07:00
Jin Dong
7197b26a3c
Update streaming example to use wait
2021-07-30 16:14:30 -07:00
matthewdeng
3a1aed28b7
[torch] fix process group timedelta ( #17468 )
2021-07-30 15:47:33 -07:00
SangBin Cho
9a696cc66a
Pin aioredis version ( #17472 )
2021-07-30 12:04:14 -07:00
Jiao
d67c57007b
change placement group report size to 1k ( #17216 )
...
Co-authored-by: Jiao Dong <jiaodong@anyscale.com>
2021-07-30 11:29:41 -07:00
Chen Shen
32803b53b0
Fix potential dead-lock ( #17396 )
2021-07-30 11:28:49 -07:00
Alex Wu
9e79301d35
Split scalability envelope + smoke tests ( #17455 )
...
* .
* done?
* done?
* sang comments
* .
Co-authored-by: Alex Wu <alex@anyscale.com>
2021-07-30 10:20:19 -07:00
Kai Fricke
b0f00b1b4b
[default] pin aioredis < 2 ( #17465 )
2021-07-30 17:57:17 +01:00
Eric Liang
cd13059691
[dataset] Implement random_shuffle() and split(equal=True) ( #17448 )
2021-07-30 09:51:21 -07:00
Patrick Ames
131710f9f9
[autoscaler] Add support for EC2 launch templates. ( #17236 )
2021-07-30 08:05:59 -07:00
Antoni Baum
23bdad01be
Fix XGBoost-Ray and LightGBM-Ray docs properly ( #17433 )
2021-07-30 15:47:41 +01:00
wanxing
705248f4ee
[CoreWorker]Remove plasma_objects_only parameter ( #17384 )
2021-07-30 14:48:36 +08:00
Qing Wang
b8baac3cb0
[Java] Filter error log for intentional system exit. ( #17289 )
2021-07-30 13:17:38 +08:00
Chen Shen
d856abb70d
[Test] increase memory for 5000 partitions shuffle ( #17429 )
2021-07-29 21:56:16 -07:00
matthewdeng
58c4fe727c
[SGD] TrainerV2 API interface ( #17447 )
...
Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>
2021-07-29 19:39:39 -07:00
Eric Liang
0373c54b3e
Add warning if get_gpu_ids() is called on the driver. ( #17436 )
2021-07-29 19:39:22 -07:00
Siyuan (Ryans) Zhuang
17c25345d0
[Workflow] Virtual actor writer - Part 2 ( #17336 )
...
* virtual actor writer
pass step_type around
simplify readonly actor
return different thing for a virtual actor
return state and output
WorkflowExecutionResult
simplify workflow execution
initial virtual actor writer
workflow_ref deeper integration
resume a step of a workflow
cache step output
Support dynamic workflow ref
* fix recovery tests
* fix
* fix get_output
* better error message
* pressure test
* fix
* verbose error message
* verbose error message
* fix get_cached_step issue
* update tests
* simplify readonly virtual actor
* fix storage tests
* workflow.resume returns state of an actor
* fix verbose
* fix comment
* make it more clear by renaming
* comment
* test init error in virtual actor
* update docs
* update docs
* update test_actor_manager/list_all
* fix comment
2021-07-29 19:29:28 -07:00
Amog Kamsetty
ff04a923ea
[SGD] v2 prototype: BackendExecutor
and TorchBackend
implementation ( #17357 )
...
* wip
* formatting
* increase timeouts
* wip
* address comments
* comments
* fix
* address comments
* Update python/ray/util/sgd/v2/worker_group.py
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
* Update python/ray/util/sgd/v2/worker_group.py
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
* address comments
* formatting
* fix
* wip
* finish
* fix
* formatting
* remove reporting
* split TorchBackend
* fix tests
* address comments
* add file
* more fixes
* remove default value
* update run method doc
* add comment
* minor doc fixes
* lint
* add args to BaseWorker.execute
* address comments
* remove extra parentheses
* properly instantiate backend
* fix some of the tests
* fix torch setup
* fix type hint
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
2021-07-29 14:38:44 -07:00
Kai Fricke
2ae6b944a2
[release tests] limit number of results fetched for alerting ( #17430 )
2021-07-29 18:43:44 +01:00
Tao Wang
411c49746d
Remove deprecated HEARTBEAT table ( #17405 )
...
* Remove deprecated HEARTBEAT table
* incr by 1
2021-07-29 10:14:59 -07:00
Jiao
3dc49c0b79
[serve] Add multi deployment to serve nightly tests ( #17411 )
...
Co-authored-by: Jiao Dong <jiaodong@anyscale.com>
2021-07-29 11:47:58 -05:00
Kai Fricke
44d209dd5f
[tune] re-enable tensorboardx without torch installed ( #17403 )
2021-07-29 10:39:38 +01:00
kimikuri
93172b535f
[doc][sgd] Broken Link in SGD's page. ( #17404 ) ( #17423 )
2021-07-29 01:13:23 -07:00
xwjiang2010
93d12b1b5e
[CLI] Fix ray submit
when --stop is supplied. ( #17385 )
...
* [cli] Fix `ray submit` when --stop is supplied.
* syntax sugar.
2021-07-29 00:01:25 -07:00
Eric Liang
7ed62ea0ad
Initial implementation of Dataset pipelining and docs ( #17309 )
2021-07-28 21:12:01 -07:00
SangBin Cho
65c0c3f3a4
[Test] Fix releaser bug ( #17418 )
...
* Fix a bug
* done
2021-07-28 18:15:00 -07:00
Eric Liang
9b4bcb3bc2
[hotfix] Fix merge conflict that caused test_dataset to failed.
2021-07-28 14:58:31 -07:00
matthewdeng
a2f23e5433
[docs] update docs with pip requirements ( #17317 )
2021-07-28 14:26:40 -07:00