Simon Mo
59867dad75
Move Jenkins test to Github action ( #7342 )
2020-04-09 10:27:19 -07:00
Richard Liaw
24bf6ad607
[raysgd] Improve raysgd examples ( #7818 )
...
* better_example
* test
* improve some usability things
* submit
* fix
* flake
* Update python/ray/util/sgd/torch/training_operator.py
* trythis
* fix
* fix
* smoke
* fail
* fix
* fix
2020-04-01 08:58:39 -07:00
Ujval Misra
6022eb53c4
[tune] Use newest checkpoint in normal operation ( #7563 )
...
* Use persistent checkpoint for failures
* Fix test
* Add unpause test
* move test
* Fix tests
* remove debug statement
* Mark test as flaky
2020-03-12 22:21:42 -07:00
Richard Liaw
d192ef0611
[raysgd] Cleanup User API ( #7384 )
...
* Init fp16
* fp16 and schedulers
* scheduler linking and fp16
* to fp16
* loss scaling and documentation
* more documentation
* add tests, refactor config
* moredocs
* more docs
* fix logo, add test mode, add fp16 flag
* fix tests
* fix scheduler
* fix apex
* improve safety
* fix tests
* fix tests
* remove pin memory default
* rm
* fix
* Update doc/examples/doc_code/raysgd_torch_signatures.py
* fix
* migrate changes from other PR
* ok thanks
* pass
* signatures
* lint'
* Update python/ray/experimental/sgd/pytorch/utils.py
* Apply suggestions from code review
Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>
* should address most comments
* comments
* fix this ci
* first_pass
* add overrides
* override
* fixing up operators
* format
* sgd
* constants
* rm
* revert
* save
* failures
* fixes
* trainer
* run test
* operator
* code
* op
* ok done
* operator
* sgd test fixes
* ok
* trainer
* format
* Apply suggestions from code review
Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>
* Update doc/source/raysgd/raysgd_pytorch.rst
* docstring
* dcgan
* doc
* commits
* nit
* testing
* revert
* Start renaming pytorch to torch
* Rename PyTorchTrainer to TorchTrainer
* Rename PyTorch runners to Torch runners
* Finish renaming API
* Rename to torch in tests
* Finish renaming docs + tests
* Run format + fix DeprecationWarning
* fix
* move tests up
* benchmarks
* rename
* remove some args
* better metrics output
* fix up the benchmark
* benchmark-yaml
* horovod-benchmark
* benchmarks
* Remove benchmark code for cleanups
* makedatacreator
* relax
* metrics
* autosetsampler
* profile
* movements
* OK
* smoothen
* fix
* nitdocs
* loss
* comments
* fix
* fix
* runner_tests
* codes
* example
* fix_test
* fix
* tests
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Co-authored-by: Maksim Smolin <maximsmol@gmail.com>
2020-03-10 08:41:42 -07:00
Anthony Yu
89ec4adb72
[tune] Dragonfly Optimizer ( #5955 )
...
* Add sample example
* Copy relevant lines of ask from inherited Optimizer
* Ignore strategy
* Additional changes
* Add DragonflySearch for tune connector for Dragonfly
* Add example and fix small errors
* lint
* Remove skopt references
* Update example based off of Dragonfly changes
* Edit example for final Dragonfly edits
* Formatting and documentation edits
* Add documentation and add to test pipeline
* Address PR comments
* Fix Jenkins test
* Adjust Dragonfly to PR#7366
* Lint
* fix_tests
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-03-10 08:40:36 -07:00
Maksim Smolin
3a134c7224
[RaySGD] Rename PyTorch API endpoints to start with Torch ( #7425 )
...
* Start renaming pytorch to torch
* Rename PyTorchTrainer to TorchTrainer
* Rename PyTorch runners to Torch runners
* Finish renaming API
* Rename to torch in tests
* Finish renaming docs + tests
* Run format + fix DeprecationWarning
* fix
* move tests up
* rename
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-03-03 16:44:42 -08:00
Eric Liang
5df801605e
Add ray.util package and move libraries from experimental ( #7100 )
2020-02-18 13:43:19 -08:00
mehrdadn
3bd82d0bcd
Fix various issues/warnings that come up on Jenkins ( #7147 )
...
* Avoid warning about swap being unlimited
Currently we get the following message on Jenkins:
"Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap."
Since we're not limiting swap anyway, we might as well avoid trying to.
https://docs.docker.com/config/containers/resource_constraints/#--memory-swap-details
* Fix escaping in re.search()
* Fix escaping in _noisy_layer()
* Raise a more descriptive error when dashboard data isn't found
* Don't error on dashboard files not being found when webui isn't required
* Change dashboard error to a warning instead
2020-02-17 16:08:55 -08:00
Richard Liaw
94e2fcea2e
[sgd] fp16 (apex) and scheduler support + move examples page ( #7061 )
...
* Init fp16
* fp16 and schedulers
* scheduler linking and fp16
* to fp16
* loss scaling and documentation
* more documentation
* add tests, refactor config
* moredocs
* more docs
* fix logo, add test mode, add fp16 flag
* fix tests
* fix scheduler
* fix apex
* improve safety
* fix tests
* fix tests
* remove pin memory default
* rm
* fix
* Update doc/examples/doc_code/raysgd_torch_signatures.py
* fix
* migrate changes from other PR
* ok thanks
* pass
* signatures
* lint'
* Update python/ray/experimental/sgd/pytorch/utils.py
* Apply suggestions from code review
Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>
* should address most comments
* comments
* fix this ci
* fix tests'
* testmode
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
2020-02-16 19:04:08 -08:00
Richard Liaw
037aa2b961
[sgd] Refactor PyTorch SGD Documentation. ( #6910 )
...
* Refactor documentation and directory structurre
* update loss
* ,ore examples
* fix comments
* more code
* svgs
* formatting
* more_docs
* more writing
* comments ready
* move
* whitespace
* examples
* fix
* bold
* pytorch
* batch
* fix
* fix test
* Apply suggestions from code review
* quarantinegp
* tests/
* fix missing
2020-01-29 08:51:01 -08:00
Richard Liaw
5719a05757
[sgd] Add support for multi-model multi-optimizer training ( #6317 )
2019-12-15 15:19:45 -08:00
Yuhao Yang
ad4da17899
[Tune] Add example and tutorial for DCGAN ( #6400 )
2019-12-13 14:15:44 -08:00
Eric Liang
be5dd8eb5e
Enable direct calls by default ( #6367 )
...
* wip
* add
* timeout fix
* const ref
* comments
* fix
* fix
* Move actor state into actor handle
* comments 2
* enable by default
* temp reorder
* some fixes
* add debug code
* tmp
* fix
* wip
* remove dbg
* fix compile
* fix
* fix check
* remove non direct tests
* Increment ref count before resolving value
* rename
* fix another bug
* tmp
* tmp
* Fix object pinning
* build change
* lint
* ActorManager
* tmp
* ActorManager
* fix test component failures
* Remove old code
* Remove unused
* fix
* fix
* fix resources
* fix advanced
* eric's diff
* blacklist
* blacklist
* cleanup
* annotate
* disable tests for now
* remove
* fix
* fix
* clean up verbosity
* fix test
* fix concurrency test
* Update .travis.yml
* Update .travis.yml
* Update .travis.yml
* split up analysis suite
* split up trial runner suite
* fix detached direct actors
* fix
* split up advanced tesT
* lint
* fix core worker test hang
* fix bad check fail which breaks test_cluster.py in tune
* fix some minor diffs in test_cluster
* less workers
* make less stressful
* split up test
* retry flaky tests
* remove old test flags
* fixes
* lint
* Update worker_pool.cc
* fix race
* fix
* fix bugs in node failure handling
* fix race condition
* fix bugs in node failure handling
* fix race condition
* nits
* fix test
* disable heartbeatS
* disable heartbeatS
* fix
* fix
* use worker id
* fix max fail
* debug exit
* fix merge, and apply [PATCH] fix concurrency test
* [patch] fix core worker test hang
* remove NotifyActorCreation, and return worker on completion of actor creation task
* remove actor diied callback
* Update core_worker.cc
* lint
* use task manager
* fix merge
* fix deadlock
* wip
* merge conflits
* fix
* better sysexit handling
* better sysexit handling
* better sysexit handling
* check id
* better debug
* task failed msg
* task failed msg
* retry failed tasks with delay
* retry failed tasks with delay
* clip deps
* fix
* fix core worker tests
* fix task manager test
* fix all tests
* cleanup
* set to 0 for direct tests
* dont check worker id for ownership rpc
* dont check worker id for ownership rpc
* debug messages
* add comment
* remove debug statements
* nit
* check worker id
* fix test
* owner
* fix tests
2019-12-13 13:58:04 -08:00
Eric Liang
e5863d7914
Force tune tests to run in direct call mode ( #6301 )
...
* force tune direct mode
* force tune
* fix
* Update run_multi_node_tests.sh
2019-11-27 19:58:33 -08:00
daiyaanarfeen
8f6d73a93a
[sgd] Extend distributed pytorch functionality ( #5675 )
...
* raysgd
* apply fn
* double quotes
* removed duplicate TimerStat
* removed duplicate find_free_port
* imports in pytorch_trainer
* init doc
* ray.experimental
* remove resize example
* resnet example
* cifar
* Fix up after kwargs
* data_dir and dataloader_workers args
* formatting
* loss
* init
* update code
* lint
* smoketest
* better_configs
* fix
* fix
* fix
* train_loader
* fixdocs
* ok
* ok
* fix
* fix_update
* fix
* fix
* done
* fix
* fix
* fix
* small
* lint
* fix
* fix
* fix_test
* fix
* validate
* fix
* fi
2019-11-05 11:16:46 -08:00
Richard Liaw
e94bebb1de
[tune] Fix Jenkins tests ( #6028 )
2019-11-01 16:42:04 -07:00
Richard Liaw
48ba484640
[tune] Test TF2.0, TF1.14, TF1.12 Tensorboard support ( #5931 )
2019-10-18 13:50:42 -07:00
Richard Liaw
9f23620412
[tune] tf2.0 mnist example ( #5898 )
...
* tfmnistexample
* tfmnist
* add_to_ci
* format
* exampledownlaod
* fix
2019-10-15 22:25:01 -07:00
Richard Liaw
1650f7b174
[tune] Remove TF MNIST example + add TrialRunner hook to execut… ( #5868 )
...
* remove test
* add trial runner
* remvoerestore
* Remove other mnist examples
* tunetest
* revert
* v1
* Revert "v1"
This reverts commit c8bddaf2db7a8270c43c02021cac0e75df15ed20.
* Revert "revert"
This reverts commit b58f56884a0c288d3a6f997d149ab4d496ddd7a3.
* errors
* format
2019-10-13 20:33:56 -07:00
Anthony Yu
b99cdf4e39
[tune] PBT + Memnn example ( #5723 )
...
* Add example file
* Move into train function
* Somewhat working example of MemNN, still has some failed trials
* Reorganize into a class
* Small fixes
* Iteration decrease and fix hyperparam_mutations
* Add example file
* Move into train function
* Somewhat working example of MemNN, still has some failed trials
* Reorganize into a class
* Small fixes
* Iteration decrease and fix hyperparam_mutations
* Some style edits
* Address PR changes without modifying learning rate
* Add configs and hyperparameter mutations
* Add tune test
* Modify import locations
* Some parameter changes for testing
* Update memnn example
* Add tensorboard support and address PR comment
* Final changes
* lint
* generator
2019-10-05 09:22:37 -07:00
Richard Liaw
baf85c6665
[tune/sgd] Fix Jenkins ( #5765 )
2019-09-27 09:59:08 -07:00
Richard Liaw
e00071721a
[tune] tf2.0 testing and supporting callables ( #5738 )
2019-09-21 17:01:14 -07:00
Richard Liaw
cdc9227f1b
[tune] ASHA xgboost and lightgbm examples ( #5500 )
2019-08-22 10:37:59 -07:00
Richard Liaw
d7b309223b
[tune] MLFlow Logger ( #5438 )
2019-08-14 15:58:18 -07:00
Lisa Dunlap
b7d0733362
[tune] Implement BOHB ( #5382 )
2019-08-13 12:32:07 -07:00
Richard Liaw
1eaa57c98f
[tune] Distributed example + walkthrough ( #5157 )
2019-08-02 09:17:20 -07:00
Richard Liaw
0b540ab492
[tune] Test example checkpointing ( #4728 )
2019-07-10 01:58:26 -07:00
Richard Liaw
b1827d5fbe
[tune] Update MNIST Example ( #4991 )
2019-06-25 22:50:15 -07:00
Richard Liaw
bd8aceb896
[ci] Change Jenkins to py3 ( #5022 )
...
* conda3
* integration
* add nevergrad, remotedata
* pytest 0.3.1
* otherdockers
* setup
* tune
2019-06-24 21:50:37 -07:00
Eric Liang
d5f4698305
[tune] Avoid scheduler blocking, add reuse_actors optimization ( #4218 )
2019-03-12 23:49:31 -07:00
Eric Liang
437459f40a
[build] Make travis logs not as long ( #4213 )
...
* clean it up
* Update .travis.yml
* Update .travis.yml
* update
* fix example
* suppress
* timeout
* print periodic progress
* Update suppress_output
* Update run_silent.sh
* Update suppress_output
* Update suppress_output
* manually do timeout
* sleep 300
* fix test
* Update run_silent.sh
* Update suppress_output
* Update .travis.yml
2019-03-07 12:09:03 -08:00
Richard Liaw
a27cb225b6
Modularize Tune tests from multi-node tests ( #4204 )
2019-03-02 19:21:08 -08:00