hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-08 11:31:40 -05:00

Author	SHA1	Message	Date
Richard Liaw	acf4d53b55	[autoscaler] Fix redirects, fix submit (#4085 )	2019-02-20 21:35:33 -08:00
Yuhong Guo	3549cd8195	Add the Delete function in GCS (#4081 ) * Add the Delete function in GCS * Unify BatchDelete and Delete * Fix comment * Lint * Refine according to comments * Unify test. * Address comment * C++ lint * Update ray_redis_module.cc	2019-02-21 13:33:37 +08:00
Yuhong Guo	1f864a02bc	Add option of load_code_from_local which is required in cross-language ray call. (#3675 )	2019-02-21 12:37:17 +08:00
Eric Liang	e3066d1fa5	[autoscaler] Try making GCP node provider thread-safe	2019-02-20 16:35:20 -08:00
Csordás Róbert	b2677fabc0	[tune] Fix not saving a checkpoint in certain cases (issue #4041 ) (#4053 ) ## What do these changes do? It saves checkpoint if needed regardless of what the scheduler have returned. Until now, it have not saved the checkpoint when scheduler returned TrialScheduler.PAUSE, which caused PopulationBasedTraining preventing to save any checkpoints in certain cases. See issue #4041 for more details. ## Related issue number #4041	2019-02-20 11:54:28 -08:00
mika	64c95aea85	[rllib] Update README.md for qmix (#4101 ) ## What do these changes do? Fixed PyMARL repository path. ## Related issue number N/A	2019-02-20 10:21:08 -08:00
Robert Nishihara	e7651b1117	Fix excessive buffering of worker stdout/stderr. (#4094 ) * Start workers with 'python -u' to prevent buffering of prints. * Set sys.stdout and sys.stderr. * Add comment.	2019-02-19 20:20:47 -08:00
Eric Liang	e9ee38ace2	More compact format for worker logs (#4092 )	2019-02-19 19:53:43 -08:00
Robert Nishihara	c92a867c8b	Fix log monitor CPU utilization. (#4091 )	2019-02-19 12:19:21 -08:00
Wang Qing	794a093249	Add runtime_context to get some runtime fields in worker (#4065 )	2019-02-19 15:57:30 +08:00
Wang Qing	7574757391	Fix crash for Java task's `task.argument()` in state. (#4063 )	2019-02-19 12:46:07 +08:00
Philipp Moritz	cfc7e2c5a9	Fix modin test (#4069 )	2019-02-18 12:17:36 -08:00
Eric Liang	6e46d75554	[tune] Remove slow gzip of checkpoints; ignore jupyter stop errors (#4076 ) * fix gzip * ignore jupyter	2019-02-18 01:30:13 -08:00
Eric Liang	f8bef004da	[rllib] Improve error message for bad envs, add remote env docs (#4044 ) * commit * fix up rew	2019-02-18 01:28:19 -08:00
Philipp Moritz	f51969964d	Fix linting on master (#4077 )	2019-02-17 13:55:40 -08:00
Megan Kawakami	346885068c	[rllib] add torch pg (#3857 ) * add torch pg * add torch imports * added torch pg * working torch pg implementation * add pg pytorch * Update a3c.py * Update a3c.py * Update torch_policy_graph.py * Update torch_policy_graph.py	2019-02-16 19:54:14 -08:00
Zekun Shi	a708ab66f5	Add simplex action space and dirichlet action distribution (#4070 ) * add simplex action space and dirichlet action distribution * Update and rename spaces.py to extra_spaces.py * Update __init__.py * Update catalog.py * Fix python 2 * Update extra_spaces.py * change Simplex.contains() to return False	2019-02-16 12:44:59 -08:00
Kristian Hartikainen	0cc5c88075	[tune] Add number of trials to the trial runner logger (#4068 )	2019-02-16 01:12:59 -08:00
Yu Kobayashi	d2d66c576e	Support non ascii characters in the source code (#4047 )	2019-02-16 11:45:44 +08:00
Hao Chen	de17443dc2	Propagate backend error to worker (#4039 )	2019-02-16 11:39:15 +08:00
Robert Nishihara	2d07df7f3f	Replace '__main__' with "__main__". (#4055 )	2019-02-15 13:32:43 -08:00
Robert Nishihara	5f71751891	API cleanups. Remove worker argument. Remove some deprecated arguments. (#4025 ) * Remove worker argument from API methods. * Remove deprecated arguments and deprecate redirect_output and redirect_worker_output. * Fix	2019-02-15 10:49:16 -08:00
Hao Chen	042ad84573	Simplify Cython ID types and fix bug of ActorCheckpointID (#4045 )	2019-02-15 20:15:16 +08:00
Richard Liaw	bb7c4ce9c4	[tune] Improve error message when Ray crashes (#3795 )	2019-02-15 01:04:17 -08:00
Richard Liaw	7cf62a10cd	[tune] Fix TF checkpointing example (#4043 ) Closes #3912, closes #3963.	2019-02-15 00:30:27 -08:00
Eric Liang	0c0bd4d41c	[rllib] Use model.value_function() in MARWIL (#4036 ) * fix marwil * add ph * fix	2019-02-14 19:35:21 -08:00
Philipp Moritz	077ffd99bf	Bump version from 0.6.3 to 0.7.0.dev0 in docs and .yaml (#4042 )	2019-02-14 12:08:48 -08:00
Si-Yuan	2de31eb489	minor fix (#4040 )	2019-02-13 17:22:45 -08:00
Eric Liang	2dccf383dd	[rllib] Basic infrastructure for off-policy estimation (IS, WIS) (#3941 )	2019-02-13 16:25:05 -08:00
Kristian Hartikainen	729d0b2825	[autoscaler] docker run options (#3921 ) Adds support for docker options, allowing for use of nvidia-docker. Closes #2657.	2019-02-13 12:26:28 -08:00
bjg2	0e37ac6d1d	[wingman -> rllib] Remote and entangled environments (#3968 ) * added all our environment changes * fixed merge request comments and remote env * fixed remote check * moved remote_worker_envs to correct config section * lint * auto wrap impl * fix * fixed the tests	2019-02-13 10:08:26 -08:00
Hao Chen	f31a79f3f7	Implement actor checkpointing (#3839 ) * Implement Actor checkpointing * docs * fix * fix * fix * move restore-from-checkpoint to HandleActorStateTransition * Revert "move restore-from-checkpoint to HandleActorStateTransition" This reverts commit 9aa4447c1e3e321f42a1d895d72f17098b72de12. * resubmit waiting tasks when actor frontier restored * add doc about num_actor_checkpoints_to_keep=1 * add num_actor_checkpoints_to_keep to Cython * add checkpoint_expired api * check if actor class is abstract * change checkpoint_ids to long string * implement java * Refactor to delay actor creation publish until checkpoint is resumed * debug, lint * Erase from checkpoints to restore if task fails * fix lint * update comments * avoid duplicated actor notification log * fix unintended change * add actor_id to checkpoint_expired * small java updates * make checkpoint info per actor * lint * Remove logging * Remove old actor checkpointing Python code, move new checkpointing code to FunctionActionManager * Replace old actor checkpointing tests * Fix test and lint * address comments * consolidate kill_actor * Remove __ray_checkpoint__ * fix non-ascii char * Loosen test checks * fix java * fix sphinx-build	2019-02-13 19:39:02 +08:00
Andrew Tan	57dcd3033e	[tune] Trial reporter fix (#3951 ) Fixes #3949.	2019-02-13 01:03:54 -08:00
William Ma	e1a479b137	Add teardown_module to test_queue.py (#4012 )	2019-02-12 22:43:09 -08:00
Si-Yuan	21472b890a	Integrate "tempfile_service" into "ray.node.Node" (#3953 )	2019-02-12 17:34:04 -08:00
Adi Zimmerman	dac1969647	[tune] Add Nevergrad to Tune (#3985 )	2019-02-12 11:00:04 -08:00
Wang Qing	c523bc04ad	Enable redis password in Java worker (#3943 ) * Support Java redis password * Fix * Refine * Fix lint.	2019-02-12 13:11:25 +08:00
Adi Zimmerman	9797028a91	[tune] Add scikit-optimize to Tune (#3924 )	2019-02-11 17:06:02 -08:00
Eric Liang	8df772867c	[rllib] rename compute_apply to learn_on_batch	2019-02-11 15:22:15 -08:00
Eric Liang	c4182463f6	[rllib] Add helper to iterate over envs in a vectorized environment (#4001 ) * add foreach env func * fix * add test	2019-02-11 10:40:47 -08:00
Ion	3c32343c63	Ray signal (#3624 )	2019-02-11 10:14:48 -08:00
Zhijun Fu	7097ba393b	protect raylet against bad messages (#4003 ) * protect raylet against bad messages * address comments * linting and regression test	2019-02-12 00:39:38 +08:00
Philipp Moritz	ab809bd927	update ray version to 0.7.0dev (#3995 )	2019-02-10 19:56:42 -08:00
Eric Liang	8e9f2c923f	[autoscaler] Use RLock in addition to FileLock	2019-02-10 19:16:43 -08:00
Yuhong Guo	5fb1efd60d	Fix CI test failures (#4007 )	2019-02-11 11:01:14 +08:00
bjg2	e703b9f49d	[wingman -> rllib] Improved stats changes in AsyncSamplesOptimizer (#3966 ) * added stats changes to optimizer * changes timers * fix python 2 compat * improved optimizer throughput stats * Update async_samples_optimizer.py * fix python2 compat	2019-02-10 01:25:22 -08:00
Eric Liang	29322c7389	[rllib] Replay buffer for IMPALA should default to 0 slots. (#3971 ) * disable replay * make lq configurable * leak test * Update run_multi_node_tests.sh	2019-02-08 10:03:11 -08:00
Robert Nishihara	6a32b410bb	Update versions from 0.6.2 -> 0.6.3 in the documentation. (#3981 )	2019-02-07 20:57:37 -08:00
Robert Nishihara	ef527f84ab	Stream logs to driver by default. (#3892 ) * Stream logs to driver by default. * Fix from rebase * Redirect raylet output independently of worker output. * Fix. * Create redis client with services.create_redis_client. * Suppress Redis connection error at exit. * Remove thread_safe_client from redis. * Shutdown driver threads in ray.shutdown(). * Add warning for too many log messages. * Only stop threads if worker is connected. * Only stop threads if they exist. * Remove unnecessary try/excepts. * Fix * Only add new logging handler once. * Increase timeout. * Fix tempfile test. * Fix logging in cluster_utils. * Revert "Increase timeout." This reverts commit b3846b89040bcd8e583b2e18cb513cb040e71d95. * Retry longer when connecting to plasma store from node manager and object manager. * Close pubsub channels to avoid leaking file descriptors. * Limit log monitor open files to 200. * Increase plasma connect retries. * Add comment.	2019-02-07 19:53:50 -08:00
Philipp Moritz	0aa74fb1fd	Update cloudpickle to 0.8.0.dev0 (#3964 )	2019-02-07 15:24:06 -08:00

... 2 3 4 5 6 ...

1284 commits