Richard Liaw
037aa2b961
[sgd] Refactor PyTorch SGD Documentation. ( #6910 )
...
* Refactor documentation and directory structurre
* update loss
* ,ore examples
* fix comments
* more code
* svgs
* formatting
* more_docs
* more writing
* comments ready
* move
* whitespace
* examples
* fix
* bold
* pytorch
* batch
* fix
* fix test
* Apply suggestions from code review
* quarantinegp
* tests/
* fix missing
2020-01-29 08:51:01 -08:00
Richard Liaw
5719a05757
[sgd] Add support for multi-model multi-optimizer training ( #6317 )
2019-12-15 15:19:45 -08:00
Yuhao Yang
ad4da17899
[Tune] Add example and tutorial for DCGAN ( #6400 )
2019-12-13 14:15:44 -08:00
Eric Liang
be5dd8eb5e
Enable direct calls by default ( #6367 )
...
* wip
* add
* timeout fix
* const ref
* comments
* fix
* fix
* Move actor state into actor handle
* comments 2
* enable by default
* temp reorder
* some fixes
* add debug code
* tmp
* fix
* wip
* remove dbg
* fix compile
* fix
* fix check
* remove non direct tests
* Increment ref count before resolving value
* rename
* fix another bug
* tmp
* tmp
* Fix object pinning
* build change
* lint
* ActorManager
* tmp
* ActorManager
* fix test component failures
* Remove old code
* Remove unused
* fix
* fix
* fix resources
* fix advanced
* eric's diff
* blacklist
* blacklist
* cleanup
* annotate
* disable tests for now
* remove
* fix
* fix
* clean up verbosity
* fix test
* fix concurrency test
* Update .travis.yml
* Update .travis.yml
* Update .travis.yml
* split up analysis suite
* split up trial runner suite
* fix detached direct actors
* fix
* split up advanced tesT
* lint
* fix core worker test hang
* fix bad check fail which breaks test_cluster.py in tune
* fix some minor diffs in test_cluster
* less workers
* make less stressful
* split up test
* retry flaky tests
* remove old test flags
* fixes
* lint
* Update worker_pool.cc
* fix race
* fix
* fix bugs in node failure handling
* fix race condition
* fix bugs in node failure handling
* fix race condition
* nits
* fix test
* disable heartbeatS
* disable heartbeatS
* fix
* fix
* use worker id
* fix max fail
* debug exit
* fix merge, and apply [PATCH] fix concurrency test
* [patch] fix core worker test hang
* remove NotifyActorCreation, and return worker on completion of actor creation task
* remove actor diied callback
* Update core_worker.cc
* lint
* use task manager
* fix merge
* fix deadlock
* wip
* merge conflits
* fix
* better sysexit handling
* better sysexit handling
* better sysexit handling
* check id
* better debug
* task failed msg
* task failed msg
* retry failed tasks with delay
* retry failed tasks with delay
* clip deps
* fix
* fix core worker tests
* fix task manager test
* fix all tests
* cleanup
* set to 0 for direct tests
* dont check worker id for ownership rpc
* dont check worker id for ownership rpc
* debug messages
* add comment
* remove debug statements
* nit
* check worker id
* fix test
* owner
* fix tests
2019-12-13 13:58:04 -08:00
Eric Liang
e5863d7914
Force tune tests to run in direct call mode ( #6301 )
...
* force tune direct mode
* force tune
* fix
* Update run_multi_node_tests.sh
2019-11-27 19:58:33 -08:00
daiyaanarfeen
8f6d73a93a
[sgd] Extend distributed pytorch functionality ( #5675 )
...
* raysgd
* apply fn
* double quotes
* removed duplicate TimerStat
* removed duplicate find_free_port
* imports in pytorch_trainer
* init doc
* ray.experimental
* remove resize example
* resnet example
* cifar
* Fix up after kwargs
* data_dir and dataloader_workers args
* formatting
* loss
* init
* update code
* lint
* smoketest
* better_configs
* fix
* fix
* fix
* train_loader
* fixdocs
* ok
* ok
* fix
* fix_update
* fix
* fix
* done
* fix
* fix
* fix
* small
* lint
* fix
* fix
* fix_test
* fix
* validate
* fix
* fi
2019-11-05 11:16:46 -08:00
Richard Liaw
e94bebb1de
[tune] Fix Jenkins tests ( #6028 )
2019-11-01 16:42:04 -07:00
Richard Liaw
48ba484640
[tune] Test TF2.0, TF1.14, TF1.12 Tensorboard support ( #5931 )
2019-10-18 13:50:42 -07:00
Richard Liaw
9f23620412
[tune] tf2.0 mnist example ( #5898 )
...
* tfmnistexample
* tfmnist
* add_to_ci
* format
* exampledownlaod
* fix
2019-10-15 22:25:01 -07:00
Richard Liaw
1650f7b174
[tune] Remove TF MNIST example + add TrialRunner hook to execut… ( #5868 )
...
* remove test
* add trial runner
* remvoerestore
* Remove other mnist examples
* tunetest
* revert
* v1
* Revert "v1"
This reverts commit c8bddaf2db7a8270c43c02021cac0e75df15ed20.
* Revert "revert"
This reverts commit b58f56884a0c288d3a6f997d149ab4d496ddd7a3.
* errors
* format
2019-10-13 20:33:56 -07:00
Anthony Yu
b99cdf4e39
[tune] PBT + Memnn example ( #5723 )
...
* Add example file
* Move into train function
* Somewhat working example of MemNN, still has some failed trials
* Reorganize into a class
* Small fixes
* Iteration decrease and fix hyperparam_mutations
* Add example file
* Move into train function
* Somewhat working example of MemNN, still has some failed trials
* Reorganize into a class
* Small fixes
* Iteration decrease and fix hyperparam_mutations
* Some style edits
* Address PR changes without modifying learning rate
* Add configs and hyperparameter mutations
* Add tune test
* Modify import locations
* Some parameter changes for testing
* Update memnn example
* Add tensorboard support and address PR comment
* Final changes
* lint
* generator
2019-10-05 09:22:37 -07:00
Richard Liaw
baf85c6665
[tune/sgd] Fix Jenkins ( #5765 )
2019-09-27 09:59:08 -07:00
Richard Liaw
e00071721a
[tune] tf2.0 testing and supporting callables ( #5738 )
2019-09-21 17:01:14 -07:00
Richard Liaw
cdc9227f1b
[tune] ASHA xgboost and lightgbm examples ( #5500 )
2019-08-22 10:37:59 -07:00
Richard Liaw
d7b309223b
[tune] MLFlow Logger ( #5438 )
2019-08-14 15:58:18 -07:00
Lisa Dunlap
b7d0733362
[tune] Implement BOHB ( #5382 )
2019-08-13 12:32:07 -07:00
Richard Liaw
1eaa57c98f
[tune] Distributed example + walkthrough ( #5157 )
2019-08-02 09:17:20 -07:00
Richard Liaw
0b540ab492
[tune] Test example checkpointing ( #4728 )
2019-07-10 01:58:26 -07:00
Richard Liaw
b1827d5fbe
[tune] Update MNIST Example ( #4991 )
2019-06-25 22:50:15 -07:00
Richard Liaw
bd8aceb896
[ci] Change Jenkins to py3 ( #5022 )
...
* conda3
* integration
* add nevergrad, remotedata
* pytest 0.3.1
* otherdockers
* setup
* tune
2019-06-24 21:50:37 -07:00
Eric Liang
d5f4698305
[tune] Avoid scheduler blocking, add reuse_actors optimization ( #4218 )
2019-03-12 23:49:31 -07:00
Eric Liang
437459f40a
[build] Make travis logs not as long ( #4213 )
...
* clean it up
* Update .travis.yml
* Update .travis.yml
* update
* fix example
* suppress
* timeout
* print periodic progress
* Update suppress_output
* Update run_silent.sh
* Update suppress_output
* Update suppress_output
* manually do timeout
* sleep 300
* fix test
* Update run_silent.sh
* Update suppress_output
* Update .travis.yml
2019-03-07 12:09:03 -08:00
Richard Liaw
a27cb225b6
Modularize Tune tests from multi-node tests ( #4204 )
2019-03-02 19:21:08 -08:00