hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Markus Cozowicz	b853df7a3b	[autoscaler] Switch to ARM for Azure deployment (#7717 ) * switch to ARM templates for config and VMs * switch to ARM templates for config and VMs * auto-formatting * addressed Scotts comment * added missing imports * fixed gpu templates fixed wheel reference * added missing reference * cleanup wording and yamls * Update doc/source/autoscaling.rst Co-Authored-By: Scott Graham <5720537+gramhagen@users.noreply.github.com> Co-authored-by: Ubuntu <marcozo@marcozodev2.zqvgrdyupqrudayw1il1agipig.jx.internal.cloudapp.net> Co-authored-by: Scott Graham <5720537+gramhagen@users.noreply.github.com>	2020-04-03 15:51:56 -07:00
mehrdadn	65054a2c7c	Python 3.8 compatibility (#7754 )	2020-04-01 10:03:23 -07:00
SangBin Cho	c23e56ce9a	Metrics Export Service (#7809 )	2020-03-30 23:28:32 -07:00
Edward Oakes	d87563937e	Revert "[Dashboard] Metrics Export Service. (#7728 )" (#7789 )	2020-03-28 19:27:34 -07:00
SangBin Cho	7a0befb0a7	[Dashboard] Metrics Export Service. (#7728 )	2020-03-26 14:03:00 -07:00
Richard Liaw	54a892bb84	[tune] Cancel Experiment via Client (#7719 ) * init cancel * testing * Update python/ray/tune/tests/test_tune_server.py Co-Authored-By: Richard Liaw <rliaw@berkeley.edu> * Apply suggestions from code review * Apply suggestions from code review * finished * set_finished Co-authored-by: ijrsvt <ian.rodney@gmail.com>	2020-03-24 20:30:12 -07:00
Robert Nishihara	2b80310e6f	Remove setup.py dependence on packaging. (#7714 )	2020-03-23 16:21:17 -07:00
Robert Nishihara	ee8c9ff732	Remove six and cloudpickle from setup.py. (#7694 )	2020-03-23 11:42:05 -07:00
Robert Nishihara	1a0c9228d0	Remove pytest from setup.py and other minor changes. (#7700 )	2020-03-23 08:46:56 -07:00
Robert Nishihara	4d722bf003	Remove dependence on funcsigs. (#7701 )	2020-03-22 21:37:24 -07:00
Clark Zinzow	c37f6e745a	Remove duplicate jsonschema from setup.py (#7665 )	2020-03-19 13:12:47 -07:00
Scott Graham	37e4d29f87	[autoscaler] Adding Azure Support (#7080 ) * adding directory and node_provider entry for azure autoscaler * adding initial cut at azure autoscaler functionality, needs testing and node_provider methods need updating * adding todos and switching to auth file for service principal authentication * adding role / scope to service principal * resolving issues with app credentials * adding retry for setting service principal role * typo and adding retry to nic creation * adding nsg to config, moving nic/public ip to node provider, cleanup node_provider, leaving in NodeProvider stub for testing * linting * updating cleanup and fixing bugs * adding directory and node_provider entry for azure autoscaler * adding initial cut at azure autoscaler functionality, needs testing and node_provider methods need updating * adding todos and switching to auth file for service principal authentication * adding role / scope to service principal * resolving issues with app credentials * adding retry for setting service principal role * typo and adding retry to nic creation * adding nsg to config, moving nic/public ip to node provider, cleanup node_provider, leaving in NodeProvider stub for testing * linting * updating cleanup and fixing bugs * minor fixes * first working version :) * added tag support * added msi identity intermediate * enable MSI through user managed identity * updated schema * extend yaml schema remove service principal code add re-use of managed user identity * fix rg_id * fix logging * replace manual cluster yaml validation with json schema - improved error message - support for intellisense in VSCode (or other IDEs) * run linting * updating yaml configs and formatting * updating yaml configs and formatting * typo in example config * pulling default config from example-full * resetting min, init worker prop * adding docs for azure autoscaler and fixing status * add azure to docs, fix config for spot instances, update azure provider to avoid caching issues during deployment * fix for default subscription in azure node provider * vm dev image build * minor change * keeping example-full.yaml in autoscaler/azure, updating azure example config * linting azure config * extending retries on azure config * lint * support for internal ips, fix to azure docs, and new azure gpu example config * linting * Update python/ray/autoscaler/azure/node_provider.py Co-Authored-By: Richard Liaw <rliaw@berkeley.edu> * revert_this * remove_schema * updating configs and removing ssh keygen, tweak azure node provider terminate * minor tweaks Co-authored-by: Markus Cozowicz <marcozo@microsoft.com> Co-authored-by: Ubuntu <marcozo@mc-ray-jumpbox.chcbtljllnieveqhw3e4c1ducc.xx.internal.cloudapp.net> Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2020-03-15 14:48:27 -07:00
Markus Cozowicz	ea99063c10	added json schema to setup.py (#7554 )	2020-03-11 09:53:21 -07:00
Markus Cozowicz	49439611f1	[autoscaler] Replace cluster yaml validation with json schema v… (#7261 ) * replace manual cluster yaml validation with json schema - improved error message - support for intellisense in VSCode (or other IDEs) - run linting - moved schema to ray/autoscaler - fixed typo - remove importlib dependency * Update python/ray/autoscaler/autoscaler.py * read * restrict allowed properties * added unit test for invalid yaml added ray[test] package (remove pytest from default dependencies) * updated autoscaler test to use ValidationError exception * add missing dependency * added pytest * replace manual cluster yaml validation with json schema - improved error message - support for intellisense in VSCode (or other IDEs) - run linting - moved schema to ray/autoscaler - fixed typo - remove importlib dependency * Update python/ray/autoscaler/autoscaler.py * read * restrict allowed properties * added unit test for invalid yaml added ray[test] package (remove pytest from default dependencies) * updated autoscaler test to use ValidationError exception * add missing dependency * added pytest * removed parameterized dependency reverted ray[test] intro * removed parameterized * fix_tests * format Co-authored-by: Ubuntu <marcozo@mc-ray-jumpbox.chcbtljllnieveqhw3e4c1ducc.xx.internal.cloudapp.net> Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2020-03-10 18:58:55 -07:00
Sven Mika	510c850651	[RLlib] SAC add discrete action support. (#7320 ) * Exploration API (+EpsilonGreedy sub-class). * Exploration API (+EpsilonGreedy sub-class). * Cleanup/LINT. * Add `deterministic` to generic Trainer config (NOTE: this is still ignored by most Agents). * Add `error` option to deprecation_warning(). * WIP. * Bug fix: Get exploration-info for tf framework. Bug fix: Properly deprecate some DQN config keys. * WIP. * LINT. * WIP. * Split PerWorkerEpsilonGreedy out of EpsilonGreedy. Docstrings. * Fix bug in sampler.py in case Policy has self.exploration = None * Update rllib/agents/dqn/dqn.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * WIP. * Update rllib/agents/trainer.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * WIP. * Change requests. * LINT * In tune/utils/util.py::deep_update() Only keep deep_updat'ing if both original and value are dicts. If value is not a dict, set * Completely obsolete syn_replay_optimizer.py's parameters schedule_max_timesteps AND beta_annealing_fraction (replaced with prioritized_replay_beta_annealing_timesteps). * Update rllib/evaluation/worker_set.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Review fixes. * Fix default value for DQN's exploration spec. * LINT * Fix recursion bug (wrong parent c'tor). * Do not pass timestep to get_exploration_info. * Update tf_policy.py * Fix some remaining issues with test cases and remove more deprecated DQN/APEX exploration configs. * Bug fix tf-action-dist * DDPG incompatibility bug fix with new DQN exploration handling (which is imported by DDPG). * Switch off exploration when getting action probs from off-policy-estimator's policy. * LINT * Fix test_checkpoint_restore.py. * Deprecate all SAC exploration (unused) configs. * Properly use `model.last_output()` everywhere. Instead of `model._last_output`. * WIP. * Take out set_epsilon from multi-agent-env test (not needed, decays anyway). * WIP. * Trigger re-test (flaky checkpoint-restore test). * WIP. * WIP. * Add test case for deterministic action sampling in PPO. * bug fix. * Added deterministic test cases for different Agents. * Fix problem with TupleActions in dynamic-tf-policy. * Separate supported_spaces tests so they can be run separately for easier debugging. * LINT. * Fix autoregressive_action_dist.py test case. * Re-test. * Fix. * Remove duplicate py_test rule from bazel. * LINT. * WIP. * WIP. * SAC fix. * SAC fix. * WIP. * WIP. * WIP. * FIX 2 examples tests. * WIP. * WIP. * WIP. * WIP. * WIP. * Fix. * LINT. * Renamed test file. * WIP. * Add unittest.main. * Make action_dist_class mandatory. * fix * FIX. * WIP. * WIP. * Fix. * Fix. * Fix explorations test case (contextlib cannot find its own nullcontext??). * Force torch to be installed for QMIX. * LINT. * Fix determine_tests_to_run.py. * Fix determine_tests_to_run.py. * WIP * Add Random exploration component to tests (fixed issue with "static-graph randomness" via py_function). * Add Random exploration component to tests (fixed issue with "static-graph randomness" via py_function). * Rename some stuff. * Rename some stuff. * WIP. * update. * WIP. * Gumbel Softmax Dist. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP * WIP. * WIP. * Hypertune. * Hypertune. * Hypertune. * Lock-in. * Cleanup. * LINT. * Fix. * Update rllib/policy/eager_tf_policy.py Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com> * Update rllib/agents/sac/sac_policy.py Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com> * Update rllib/agents/sac/sac_policy.py Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com> * Update rllib/models/tf/tf_action_dist.py Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com> * Update rllib/models/tf/tf_action_dist.py Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com> * Fix items from review comments. * Add dm_tree to RLlib dependencies. * Add dm_tree to RLlib dependencies. * Fix DQN test cases ((Torch)Categorical). * Fix wrong pip install. Co-authored-by: Eric Liang <ekhliang@gmail.com> Co-authored-by: Kristian Hartikainen <kristian.hartikainen@gmail.com>	2020-03-06 10:37:12 -08:00
chaokunyang	8b6784de06	[Streaming] Streaming Python API (#6755 )	2020-02-25 10:33:33 +08:00
Simon Mo	b804d40c04	Stop vendoring pyarrow (#7233 )	2020-02-19 19:01:26 -08:00
Simon Mo	7bef7031c2	Revert "Revert "Revert "Removing Pyarrow dependency (#7146 )" (#7209 ) (#7214 )" (#7232 )	2020-02-19 13:35:29 -08:00
Simon Mo	e8941b1b79	Revert "Revert "Removing Pyarrow dependency (#7146 )" (#7209 ) (#7214 )	2020-02-19 10:08:52 -08:00
Eric Liang	0aa9373d62	Revert "Removing Pyarrow dependency (#7146 )" (#7209 ) This reverts commit `2116fd3bca`.	2020-02-18 14:12:06 -08:00
ijrsvt	2116fd3bca	Removing Pyarrow dependency (#7146 )	2020-02-17 18:00:13 -08:00
Qing Wang	94a286ef1d	[Java] Add `session_dir` as temp_dir for logs, socket files like Python (#7044 ) * Support * Add gcs_server support * Fix ut * Fix * Remove unused py code * Fix linting * Fix cross language ci * Fix CI * Add docstring * Fix * Fix linting * Add a singleton for config * Refine * fix * Fix * linting * Remove FileUnit * Fix * Fix * Fix * Update java/runtime/src/main/java/org/ray/runtime/config/RayConfig.java Co-Authored-By: Hao Chen <chenh1024@gmail.com> * Fix streaming singleprocess CI * Fix checkstyle Co-authored-by: Hao Chen <chenh1024@gmail.com>	2020-02-13 17:49:52 +08:00
ijrsvt	0826f95e1c	Including psutil & setproctitle (#7031 )	2020-02-05 14:16:58 -08:00
fangfengbin	ade7ebfc0c	Add service based gcs client (#6686 )	2020-02-05 12:06:25 +08:00
Richard Liaw	341ddd0a09	[tune] Default to TensorboardX and include in requirements. (#6836 )	2020-01-19 01:49:33 -08:00
Mitchell Stern	763818b476	[Dashboard] Add static assets for speedscope v1.5.3 (#6822 )	2020-01-17 20:53:53 -08:00
chaokunyang	4097d076d4	Package ray java jars into wheels (#6600 )	2020-01-10 11:41:00 +08:00
Sven	60d4d5e1aa	Remove future imports (#6724 ) * Remove all __future__ imports from RLlib. * Remove (object) again from tf_run_builder.py::TFRunBuilder. * Fix 2xLINT warnings. * Fix broken appo_policy import (must be appo_tf_policy) * Remove future imports from all other ray files (not just RLlib). * Remove future imports from all other ray files (not just RLlib). * Remove future import blocks that contain `unicode_literals` as well. Revert appo_tf_policy.py to appo_policy.py (belongs to another PR). * Add two empty lines before Schedule class. * Put back __future__ imports into determine_tests_to_run.py. Fails otherwise on a py2/print related error.	2020-01-09 00:15:48 -08:00
Chaokun Yang	7bbfa85c66	[Streaming] Streaming data transfer java (#6474 )	2019-12-22 10:56:05 +08:00
Philipp Moritz	4d71ab83cf	require packaging (#6517 )	2019-12-17 12:01:14 -08:00
Philipp Moritz	afae8406da	Make sure numpy >= 1.16.0 is installed for fast pickling support (#6486 ) * Make sure numpy >= 1.16.0 is installed * Works for 1.15.4 * lint * formatting * update * put check into the right place * lint	2019-12-14 16:36:49 -08:00
alindkhare	76e678d775	[Serve] Added deadline awareness (#6442 ) * [Serve] Added deadline awareness Added deadline awareness while enqueuing a query Using Blist sorted-list implementation (ascending order) to get queries according to their specified deadlines. [buffer_queues] Exposed slo_ms via handle/http request Added slo example The queries in example will be executed in almost the opposite order of which they are fired Added slo pytest Added check for slo_ms to not be negative Included the changes suggested * Linting Corrections * Adding the code changes suggested by format.sh * Added the suggested changes Added justification for blist Added blist in travis/ci/install-dependencies.sh * Fixed linting issues * Added blist to ray/doc/requirements-doc.txt	2019-12-11 16:41:54 -08:00
Chaokun Yang	6272907a57	[Streaming] Streaming data transfer and python integration (#6185 )	2019-12-10 20:33:24 +08:00
Simon Mo	22b305223a	Build Docker Containers for Linux Wheels (#6233 )	2019-11-27 17:05:36 -08:00
Philipp Moritz	decaa65cd6	Use pickle by default for serialization (#5978 )	2019-11-10 18:12:18 -08:00
Philipp Moritz	f7455839bf	Expose raylet info to dashboard (#6045 )	2019-10-31 17:36:59 -07:00
Ujval Misra	a851d7eb87	[tune] Readable trial progress output (#5822 ) * Cleaner, tabulated progress output. * Minor HTML changes, trial ID instead of name * Revert basic variant changes * Cleanup, address richard's comments, add progress_reporter.py * Add tabulate dependency * Added more info to table, auto-hide columns with no data. * lint * Address comments * Replace experiment tag w/ trial ID * Fixed tests. * Fixed test * Added requirement * Fix formatting	2019-10-08 16:38:39 -07:00
Simon Mo	9bb3633cd9	[Serve] Implement metric interface (#5852 ) * Implement metric interface * Address comment: made actor_handles a dict * Fix iteration * Lint * Mark lightweight actors as num_cpus=0 to prevent resource starvation * Be more explicit about the readiness condition * Make task_runner non-blocking * Lint	2019-10-07 09:29:26 -07:00
Simon Mo	e8570874b6	[Serve] Implement flask_request and named python request (#5849 ) * Implement flask_request and named python request * Forgot to include missing files * Address comment * Add flask to requirements for doc (lint failed) * Update doc requirement so lint will build * Install flask in CI * Fix typo in .travis.yml	2019-10-06 15:12:30 -07:00
Edward Oakes	972dddd776	[autoscaler] Kubernetes autoscaler backend (#5492 ) * Add Kubernetes NodeProvider to autoscaler * Split off SSHCommandRunner * Add KubernetesCommandRunner * Cleanup * More config options * Check if auth present * More auth checks * Better output * Always bootstrap config * All working * Add k8s-rsync comment * Clean up manual k8s examples * Fix up submit.yaml * Automatically configure permissisons * Fix get_node_provider arg * Fix permissions * Fill in empty auth * Remove ray-cluster from this PR * No hard dep on kubernetes library * Move permissions into autoscaler config * lint * Fix indentation * namespace validation * Use cluster name tag * Remove kubernetes from setup.py * Comment in example configs * Same default autoscaling config as aws * Add Kubernetes quickstart * lint * Revert changes to submit.yaml (other PR) * Install kubernetes in travis * address comments * Improve autoscaling doc * kubectl command in setup * Force use_internal_ips * comments * backend env in docs * Change namespace config * comments * comments * Fix yaml test	2019-10-03 10:17:00 -07:00
Mitchell Stern	b03147e7bf	Update call to py-spy to conform to new API (#5758 )	2019-09-23 14:52:23 -07:00
Mitchell Stern	98dcc1d440	[Dashboard] Add initial version of new dashboard (#5730 )	2019-09-23 08:50:40 -07:00
Philipp Moritz	a6dd794818	[Projects] Fix template path (#5716 )	2019-09-16 19:58:54 -07:00
Simon Mo	5f88823c49	[Serve] Rewrite Ray.Serve From Scratch (#5562 ) * Commit and format files * address stylistic concerns * Replcae "Usage" by "Example" in doc * Rename srv to serve * Add serve to CI process; Fix 3.5 compat * Improve determine_tests_to_run.py * Quick cosmetic for determien_tests * Address comments * Address comments * Address comment * Fix typos and grammar Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com> * Update python/ray/experimental/serve/global_state.py Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com> * Use __init__ for Query and WorkIntent class * Remove dataclasses dependency * Rename oid to object_id for clarity * Rename produce->enqueue_request, consume->dequeue_request * Address last round of comment	2019-09-13 21:36:56 -07:00
Devin Petersohn	c33d6662ce	Remove Modin from Ray wheels. (#5647 ) There are several reasons for this: * We no longer support python2 * There should be only 1 way of installing Modin * Issue management on these wheels * I have never heard of anyone using this feature * It is rarely kept up to date * Modin depends on specific versions of Ray because of past API changes	2019-09-05 23:46:27 -07:00
Simon Mo	d9b45cceec	[Project] Implementing Project CLI (#5397 )	2019-08-08 21:28:25 -07:00
Philipp Moritz	e8d9cfc1f1	Ray projects schema and validation (#5329 )	2019-08-06 14:36:04 -07:00
Simon Mo	196495a4de	Fix Redis Test (#5302 )	2019-07-30 00:22:16 -07:00
Hao Chen	8a30b93e42	Define common data structures with protobuf. (#5121 )	2019-07-08 22:41:37 +08:00
Eric Liang	0448847a02	Update protobuf version (#5128 )	2019-07-06 15:59:55 -07:00

... 2 3 4 5 6

285 commits