Commit graph

2220 commits

Author SHA1 Message Date
Edward Oakes
d8f5b52265
[serve] Don't use mixin class for class-based backends (#7957) 2020-04-10 12:01:14 -05:00
marload
e3ffb8ac28
[tune] Refactoring: Deduplicate (#7918)
* refactoring: Deduplication

* refactoring: Deduplication

* refactoring: Deduplication

* refactoring: Deduplication

* lint fix: Variable naming case

* fix: Remove White Space

* fix_lint

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-04-09 20:19:04 -07:00
Edward Oakes
305eb74a86
[serve] Make HTTP proxy fault tolerant (#7936) 2020-04-09 17:07:22 -05:00
Simon Mo
870271d51f
[Serve] Call serve.init in function handler (#7947) 2020-04-09 11:46:15 -07:00
Simon Mo
59867dad75
Move Jenkins test to Github action (#7342) 2020-04-09 10:27:19 -07:00
David Chan
6521e92a95
[RaySGD] Honor the use_gpu flag (#7942) 2020-04-08 20:20:09 -07:00
ijrsvt
44825d81e9
Change Proctitle to IDLE after an Error (#7863) 2020-04-08 11:33:43 -07:00
fyrestone
fc6259a656
Cross language serialization for primitive types (#7711)
* Cross language serialization for Java and Python

* Use strict types when Python serializing

* Handle recursive objects in Python; Pin msgpack >= 0.6.0, < 1.0.0

* Disable gc for optimizing msgpack loads

* Fix merge bug

* Java call Python use returnType; Fix ClassLoaderTest

* Fix RayMethodsTest

* Fix checkstyle

* Fix lint

* prepare_args raises exception if try to transfer a non-deserializable object to another language

* Fix CrossLanguageInvocationTest.java, Python msgpack treat float as double

* Minor fixes

* Fix compile error on linux

* Fix lint in java/BUILD.bazel

* Fix test_failure

* Fix lint

* Class<?> to Class<T>; Refine metadata bytes.

* Rename FST to Fst; sort java dependencies

* Change Class<?>[] to Optional<Class<?>>; sort requirements in setup.py

* Improve CrossLanguageInvocationTest

* Refactor MessagePackSerializer.java

* Refactor MessagePackSerializer.java; Refine CrossLanguageInvocationTest.java

* Remove unnecessary dependencies for Java; Add getReturnType() for RayFunction in Java

* Fix bug

* Remove custom cross language type support

* Replace Serializer.Meta with MutableBoolean

* Remove @SuppressWarnings support from checkstyle.xml; Add null test in CrossLanguageInvocationTest.java

* Refine MessagePackSerializer.pack

* Ray.get support RayObject as input

* Improve comments and error info

* Remove classLoader argument from serializer

* Separate msgpack from pickle5 in Python

* Pair<byte[], MutableBoolean> to Pair<byte[], Boolean>

* Remove public static <T> T get(RayObject<T> object), use RayObject.get() instead

* Refine test

* small fixes

Co-authored-by: 刘宝 <po.lb@antfin.com>
Co-authored-by: Hao Chen <chenh1024@gmail.com>
2020-04-08 21:10:57 +08:00
Edward Oakes
85481d635d
[serve] Call serve.init() before initializing backends (#7922) 2020-04-07 17:22:52 -05:00
Edward Oakes
1be87c7fbb
[serve] Remove global state, instead access the master actor directly (#7914)
* Move _scale() to master actor

* move create_backend

* Move set_backend_config

* Move get_backend_config

* Remove backend_table from global_state

* Remove global_state, just access master directly

* Remove accidental addition
2020-04-07 15:21:40 -05:00
Edward Oakes
d3c310f408
[serve] Only access backend_table in master actor (#7913) 2020-04-07 10:12:39 -05:00
Kai Yang
48b48cc8c2
Support multiple core workers in one process (#7623) 2020-04-07 11:01:47 +08:00
Richard Liaw
f63b4c1110
[sgd] make ddp optional (#7875)
* loosen

* devices

* tryitout

* fix

* fix

* fix

* easy

* test

* fix

* fix

* better visibility

* fix
2020-04-06 11:41:36 -07:00
SangBin Cho
73fd78316d
[Dashboard] Authentication (#7888)
* Change authentication schema.

Authentication implementation.

* Formatting.

* Fix a minor style.

* Fix tests.

* Removed url validation.
2020-04-04 19:40:54 -07:00
Allen
3c91ff1f63
[autoscaler] Allowing users to provide extra configs for AWS (#7844)
* Allowing users to provide custom key names & security group inbound rules

* linting

* getting aws credentials passed in

* one more thing

* one more thing part 2

* formatting

* addressing comments

* update

* update

* update

* update

* update

* update

* remove tests

* rerun tests

Co-authored-by: Allen Yin <allenyin@anyscale.io>
2020-04-04 18:36:51 -07:00
acxz
7827d2c2de
Add wheel build dependency (#7877) 2020-04-03 18:10:34 -07:00
ijrsvt
e03f687b84
Cleaning up remaining Local Mode Code (#7865) 2020-04-03 19:54:15 -05:00
Markus Cozowicz
b853df7a3b
[autoscaler] Switch to ARM for Azure deployment (#7717)
* switch to ARM templates for config and VMs

* switch to ARM templates for config and VMs

* auto-formatting

* addressed Scotts comment

* added missing imports

* fixed gpu templates
fixed wheel reference

* added missing reference

* cleanup wording and yamls

* Update doc/source/autoscaling.rst

Co-Authored-By: Scott Graham <5720537+gramhagen@users.noreply.github.com>

Co-authored-by: Ubuntu <marcozo@marcozodev2.zqvgrdyupqrudayw1il1agipig.jx.internal.cloudapp.net>
Co-authored-by: Scott Graham <5720537+gramhagen@users.noreply.github.com>
2020-04-03 15:51:56 -07:00
SangBin Cho
1d532d1cb8
[Dashboard ]Action Implementation. (#7826) 2020-04-02 18:02:37 -07:00
Edward Oakes
7f9ddfcfd8
Only access route_table and policy_table in master actor (#7835) 2020-04-02 14:44:53 -07:00
Edward Oakes
cbe494ab13
[flaky test] Fix flaky test_heartbeats_single (#7857) 2020-04-02 16:23:28 -05:00
ijrsvt
9bfc2c4b54
Moving Local Mode to C++ (#7670) 2020-04-01 15:50:57 -05:00
mehrdadn
65054a2c7c
Python 3.8 compatibility (#7754) 2020-04-01 10:03:23 -07:00
Richard Liaw
24bf6ad607
[raysgd] Improve raysgd examples (#7818)
* better_example

* test

* improve some usability things

* submit

* fix

* flake

* Update python/ray/util/sgd/torch/training_operator.py

* trythis

* fix

* fix

* smoke

* fail

* fix

* fix
2020-04-01 08:58:39 -07:00
Edward Oakes
f4239d27fa
[serve] Create all other actors in master actor (#7791) 2020-04-01 10:15:04 -05:00
Robert Nishihara
b011c604d7
Remove ray.tasks() from API. (#7807) 2020-04-01 10:10:40 -05:00
SangBin Cho
c23e56ce9a
Metrics Export Service (#7809) 2020-03-30 23:28:32 -07:00
mehrdadn
8958728139
Windows bug fixes (#7740) 2020-03-30 20:39:23 -05:00
Simon Mo
dc9b62e007
Deserialize Args in Event Loop Thread (#7806) 2020-03-30 18:28:13 -07:00
Richard Liaw
fbf02fa7f7
[Hotfix] Lint for Documentation (#7817) 2020-03-30 11:49:05 -07:00
Richard Liaw
18327254b6
[docs] Fix readthedocs rendering (#7810) 2020-03-30 11:40:08 -07:00
Richard Liaw
86cff17e7e
[tune/raysgd] Tune API for TorchTrainer + Fix State Restoration (#7547) 2020-03-30 12:58:49 -05:00
Edward Oakes
3a53ea60d9
[Serve] Push route table updates to HTTP proxy (#7774) 2020-03-30 09:53:05 -07:00
Philipp Moritz
eb61036ba2
Revert "Pyarrow Segfault Regression Test (#7568)" (#7805)
This reverts commit 57599f075c.
2020-03-29 20:59:05 -07:00
ijrsvt
57599f075c
Pyarrow Segfault Regression Test (#7568) 2020-03-29 16:15:24 -07:00
Simon Mo
353d7e107f
[Serve] Improve Serialization (#7688) 2020-03-29 14:57:19 -07:00
mehrdadn
fc23f79f82
Windows process issues (#7739) 2020-03-29 12:48:32 -07:00
Edward Oakes
d87563937e
Revert "[Dashboard] Metrics Export Service. (#7728)" (#7789) 2020-03-28 19:27:34 -07:00
Maksim Smolin
7b27ce2b23
[RaySGD] Convert the head worker to a local model (#7746)
Why are these changes needed?

Running a worker on head (locally, not as a Ray actor) allows for easier handling of stateful stuff like logging and for easier debugging.
2020-03-27 20:19:15 -07:00
Mitchell Stern
090a8474b0
[Dashboard] Update dependencies and add linting rules (#7779) 2020-03-27 16:53:49 -07:00
SangBin Cho
86e19959a5
[Dashboard] Tune dashboard bug fix (#7766)
* Figured out why Tune was unavailable.

* Minor fix.
2020-03-27 09:02:30 -07:00
SangBin Cho
7a0befb0a7
[Dashboard] Metrics Export Service. (#7728) 2020-03-26 14:03:00 -07:00
hhoke
af3a5705ca
--redis-address -> --address (#7760)
Exception tells user to use --redis-address, but it deprecated. This tells the user to use the current --address.
2020-03-26 13:52:39 -07:00
Cloud Han
c1b05b720d
calling register_custom_serializer require ray to be initialized (#7752) 2020-03-26 10:24:06 -07:00
fangfengbin
e196fcdbaf
Add gcs_service_enabled function to avoid getting environment variable directly (#7742) 2020-03-26 22:02:53 +08:00
Richard Liaw
ca6eabc9cb
[tune] Fail Fast (#7528)
* pytest

* init cancel

* testing

* Update python/ray/tune/tests/test_tune_server.py

Co-Authored-By: Richard Liaw <rliaw@berkeley.edu>

* change-test

* Apply suggestions from code review

* Apply suggestions from code review

* finished

* set_finished

* tune

* fix

Co-authored-by: ijrsvt <ian.rodney@gmail.com>
2020-03-26 00:04:09 -07:00
Eric Liang
23b6fdcda1
ray memory should collect statistics from all nodes (#7721) 2020-03-25 16:31:31 -07:00
Stephanie Wang
46404d8a0b
[core] Pin lineage of plasma objects that are still in scope (#7690)
* Fix deadlock in DrainAndShutdown

* Revert "[core] Revert lineage pinning (#7499) (#7692)"

This reverts commit ba86a02b37.

* debug rllib

* debug rllib

* turn on all rllib tests again

* debug rllib

* Fix drain bug, check number of pending tasks

* revert rllib debug

* remove todo

* Trigger rllib tests

* revert rllib debug commit
2020-03-25 09:29:32 -07:00
Richard Liaw
82b792be33
[tune] IP Check, Flatten Results for TBX (#7705)
* support_flattened

* loggers

* Format logger changes

Co-authored-by: Kristian Hartikainen <kristian.hartikainen@gmail.com>
2020-03-25 09:18:03 +00:00
Maksim Smolin
e95455b7d7
[RaySGD] Add tqdm logging to TorchTrainer (#7588)
* Update issue templates

* Init fp16

* fp16 and schedulers

* scheduler linking and fp16

* to fp16

* loss scaling and documentation

* more documentation

* add tests, refactor config

* moredocs

* more docs

* fix logo, add test mode, add fp16 flag

* fix tests

* fix scheduler

* fix apex

* improve safety

* fix tests

* fix tests

* remove pin memory default

* rm

* fix

* Update doc/examples/doc_code/raysgd_torch_signatures.py

* fix

* migrate changes from other PR

* ok thanks

* pass

* signatures

* lint'

* Update python/ray/experimental/sgd/pytorch/utils.py

* Apply suggestions from code review

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* should address most comments

* comments

* fix this ci

* first_pass

* add overrides

* override

* fixing up operators

* format

* sgd

* constants

* rm

* revert

* Checkpoint the basics

* End of day checkpoint

* Checkpoint log-to-head implementation

* Checkpoint

* Add actor-based batch log reporting, currently segfaults

* Work around progress segfault

* Fix some stuff in quicktorch

* Make things more customizable

* Quality of life fixes

* More quality of life

* Move tqdm logic to training_operator

* Update examples

* Fix some minor bugs

* Fix merge

* Fix small things, add pbar to dcgan

* Run format.sh

* Fix missing epoch number for batch pbar

* Address PR comments

* Fix float is not subscriptable

* Add train_loss to pbar by default

* Isolate tqdm code into a handler system

* Format

* Remove the batch_logs_reporter from distributed runner as well

* Check if the train_loss is avaialbale before using it

* Enable tqdm in the dcgan example

* Fix a crash in no-handler trainers

* Fix

* Allow not calling set_reporters for tests

Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
2020-03-24 23:43:56 -07:00