ray/python
Maksim Smolin e95455b7d7
[RaySGD] Add tqdm logging to TorchTrainer (#7588)
* Update issue templates

* Init fp16

* fp16 and schedulers

* scheduler linking and fp16

* to fp16

* loss scaling and documentation

* more documentation

* add tests, refactor config

* moredocs

* more docs

* fix logo, add test mode, add fp16 flag

* fix tests

* fix scheduler

* fix apex

* improve safety

* fix tests

* fix tests

* remove pin memory default

* rm

* fix

* Update doc/examples/doc_code/raysgd_torch_signatures.py

* fix

* migrate changes from other PR

* ok thanks

* pass

* signatures

* lint'

* Update python/ray/experimental/sgd/pytorch/utils.py

* Apply suggestions from code review

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* should address most comments

* comments

* fix this ci

* first_pass

* add overrides

* override

* fixing up operators

* format

* sgd

* constants

* rm

* revert

* Checkpoint the basics

* End of day checkpoint

* Checkpoint log-to-head implementation

* Checkpoint

* Add actor-based batch log reporting, currently segfaults

* Work around progress segfault

* Fix some stuff in quicktorch

* Make things more customizable

* Quality of life fixes

* More quality of life

* Move tqdm logic to training_operator

* Update examples

* Fix some minor bugs

* Fix merge

* Fix small things, add pbar to dcgan

* Run format.sh

* Fix missing epoch number for batch pbar

* Address PR comments

* Fix float is not subscriptable

* Add train_loss to pbar by default

* Isolate tqdm code into a handler system

* Format

* Remove the batch_logs_reporter from distributed runner as well

* Check if the train_loss is avaialbale before using it

* Enable tqdm in the dcgan example

* Fix a crash in no-handler trainers

* Fix

* Allow not calling set_reporters for tests

Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
2020-03-24 23:43:56 -07:00
..
ray [RaySGD] Add tqdm logging to TorchTrainer (#7588) 2020-03-24 23:43:56 -07:00
asv.conf.json [asv] Pushing to s3 (#2246) 2018-06-20 10:43:44 -07:00
build-wheel-macos.sh Add __commit__ field to ray package in wheels (#7305) 2020-02-26 17:54:22 -08:00
build-wheel-manylinux1.sh Add __commit__ field to ray package in wheels (#7305) 2020-02-26 17:54:22 -08:00
MANIFEST.in [autoscaler] Replace cluster yaml validation with json schema v… (#7261) 2020-03-10 18:58:55 -07:00
README-building-wheels.md fix wheel building doc (#4360) 2019-03-13 23:11:30 -07:00
setup.py [tune] Cancel Experiment via Client (#7719) 2020-03-24 20:30:12 -07:00