hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Eric Liang	59f4743f20	[rllib] Run simple regressions tests for all algs in jenkins (#3498 )	2018-12-11 17:21:53 -08:00
Richard Liaw	e0fbb68e47	[tune] Custom Logging, Trial Name (#3465 ) Adds support for custom loggers, custom trial strings, and custom sync commands. Closes #3034, #2985, and #3390.	2018-12-11 13:41:59 -08:00
Robert Nishihara	74c3370bd5	Show slowest tests in travis. (#3507 )	2018-12-11 11:25:04 -08:00
Eric Liang	52df4dfc6f	[rllib] Fix multiagent_two_trainer test (#3509 ) * update * fix * dict ordre * fix * fix	2018-12-11 00:16:39 -08:00
Richard Liaw	1f4a01cff6	[tune] Fix PyTorch example after PyTorch v1 (#3500 ) * [tune] * fix * lint * fix	2018-12-10 12:00:53 -08:00
Eric Liang	962f18756b	[autoscaler] Use fixed timestamp to check against health timeouts (#3503 )	2018-12-10 14:58:27 -05:00
Yuhong Guo	abd781d607	Make stress test time shorter. (#3506 )	2018-12-10 14:46:40 -05:00
Eric Liang	ce388a45cf	[rllib] Learner should not see clipped actions (#3496 )	2018-12-09 21:57:11 -08:00
Philipp Moritz	87c0d24579	[sgd] Add file lock to protect compilation of sgd op (#3486 ) * add file lock to protect compilation of sgd op * lint * update * fix * fix * lint * update * rebase on arrow * Update sgd_worker.py	2018-12-09 13:52:40 -08:00
Eric Liang	cffe8f9806	Add option to evict keys LRU from the sharded redis tables (#3499 ) * wip * wip * format * wip * note * lint * fix * flag * typo * raise timeout * fix * optional get * fix flag * increase timeout in test * update docs * format	2018-12-09 05:48:52 -08:00
Yuhong Guo	0136af5aac	Add return value for recontruction RPC. (#3493 ) * Add return value for recontruct RPC. * Fix comment function name	2018-12-09 00:08:44 -08:00
Eric Liang	7aec357501	[rllib] Multi-GPU support for Multi-Agent PPO (#3479 ) * wip * fix * remove check * fix null * revert * lint and kl * also fix rollout	2018-12-08 18:02:33 -08:00
Eric Liang	8b5827b9da	[rllib] Better document which methods are abstract and which ones are overrides (#3480 )	2018-12-08 16:28:58 -08:00
Eric Liang	462e6ef066	[rllib] Use smoothed version of collect metrics for DQN (#3491 ) * fix * lint	2018-12-07 18:36:23 -08:00
Tianming Xu	f6490f9bef	Resolve no handlers could be found for logger 'ray.worker' when importing ray (#3483 )	2018-12-06 20:46:53 -08:00
Eric Liang	8395523f81	[rllib] Copy data before passing to Ape-X learner thread (fixes transient plasma crashes) (#3484 )	2018-12-06 18:01:11 -08:00
Si-Yuan	c2c501bbe6	Experimental asyncio support (#2015 ) * Init commit for async plasma client * Create an eventloop model for ray/plasma * Implement a poll-like selector base on `ray.wait`. Huge improvements. * Allow choosing workers & selectors * remove original design * initial implementation of epoll-like selector for plasma * Add a param for `worker` used in `PlasmaSelectorEventLoop` * Allow accepting a `Future` which returns object_id * Do not need `io.py` anymore * Create a basic testing model * fix: `ray.wait` returns tuple of lists * fix a few bugs * improving performance & bug fixing * add test * several improvements & fixing * fix relative import * [async] change code format, remove old files * [async] Create context wrapper for the eventloop * [async] fix: context should return a value * [async] Implement futures grouping * [async] Fix bugs & replace old functions * [async] Fix bugs found in tests * [async] Implement `PlasmaEpoll` * [async] Make test faster, add tests for epoll * [async] Fix code format * [async] Add comments for main code. * [async] Fix import path. * [async] Fix test. * [async] Compatibility. * [async] less verbose to not annoy the CI. * [async] Add test for new API * [async] Allow showing debug info in some of the test. * [async] Fix test. * [async] Proper shutdown. * [async] Lint~ * [async] Move files to experimental and create API * [async] Use async/await syntax * [async] Fix names & styles * [async] comments * [async] bug fixing & use pytest * [async] bug fixing & change tests * [async] use logger * [async] add tests * [async] lint * [async] type checking * [async] add more tests * [async] fix bugs on waiting a future while timeout. Add more docs. * [async] Formal docs. * [async] Add typing info since these codes are compatible with py3.5+. * [async] Documents. * [async] Lint. * [async] Fix deprecated call. * [async] Fix deprecated call. * [async] Implement a more reasonable way for dealing with pending inputs. * [async] Fix docs * [async] Lint * [async] Fix bug: Type for time * [async] Set our eventloop as the default eventloop so that we can get it through `asyncio.get_event_loop()`. * [async] Update test & docs. * [async] Lint. * [async] Temporarily print more debug info. * [async] Use `Poll` as a default option. * [async] Limit resources. * new async implementation for Ray * implement linked list * bug fix * update * support seamless async operations * update * update API * fix tests * lint * bug fix * refactor names * improve doc * properly shutdown async_api * doc * Change the table on the index page. * Adjust table size. * Only keeps `as_future`. * change how we init connection * init connection in `ray.worker.connect` * doc * fix * Move initialization code into the module. * Fix docs & code * Update pyarrow version. * lint * Restore index.rst * Add known issues. * Apply suggestions from code review Co-Authored-By: suquark <suquark@gmail.com> * rename * Update async_api.rst * Update async_api.py * Update async_api.rst * Update async_api.py * Update worker.py * Update async_api.rst * fix tests * lint * lint * replace the magic number	2018-12-06 17:39:05 -08:00
Devin Petersohn	970babf31a	Removing the check about the size re: ray-project/ray#3450 (#3464 ) * Removing the check about the size re: ray-project/ray#3450 * Addressing comments * Update services.py	2018-12-06 16:59:24 -08:00
Eugene Vinitsky	7a7c6e53c8	[tune/rllib] Use cloudpickle to dump config (#3462 ) <!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> ## What do these changes do? JSON Logger now uses cloudpickle to dump the configs as welll, which pkls the functions needed for multi-agent replay. ## Related issue number <!-- Are there any issues opened that will be resolved by merging this change? -->	2018-12-06 15:52:44 -08:00
Yuhong Guo	b9e1977fae	Fix failure of test_free_objects_multi_node (#3481 ) It is possible that `test_free_objects_multi_node` would fail sometimes. If we run this test 20 times, we may found at least one failure. The cause is that the test is based on function tasks. One raylet may create more than one worker to execute the tasks. So flush operations may be separated to several workers and not clean all the worker objects held by the plasma client. In this PR, I change function task to actor tasks, which guarantee all the tasks are executed in one worker of a raylet.	2018-12-06 15:55:49 -05:00
Eric Liang	412aaa5195	[tune] Deprecate ambiguous function values (use tune.function / tune.sample_from instead) (#3457 ) * wip * exclude	2018-12-06 11:35:20 -08:00
Eric Liang	d864f299d7	[rllib] fixes from dogfooding multi-agent (#3456 ) auto wrap multi-agent dict and tuple spaces by keeping a policy -> preprocessor in the sampler add some Q-learning debug stats report min, max of custom metrics better errors	2018-12-05 23:31:45 -08:00
shane	7a79b7f62c	increase container memory and shm to 20G (#3475 ) * increase container memory and shm to 20G * variables are POWERFUL	2018-12-05 14:59:07 -08:00
Si-Yuan	2e6f9bedf2	Add the extra fallback for serialization (#3468 ) * Add the extra fallback for serialization. * Better comments & warnings. quotes. * Update test/runtest.py Co-Authored-By: suquark <suquark@gmail.com> * Update test/runtest.py Co-Authored-By: suquark <suquark@gmail.com> * linting * Don't hijack too much errors. * simplify the test * Update runtest.py * simplify	2018-12-05 13:09:08 -08:00
Philipp Moritz	06f6431765	Make test_actor_multiple_gpus_from_multiple_tasks less stressful in travis	2018-12-04 17:44:33 -08:00
Eric Liang	93a9d32288	[docs] Switch docs to use rllib train instead of train.py	2018-12-04 17:36:06 -08:00
Richard Liaw	9d0bd50e78	[tune] Component notification on node failure + Tests (#3414 ) Changes include: - Notify Components on Requeue - Slight refactoring of Node Failure handling - Better tests	2018-12-04 14:47:31 -08:00
Eric Liang	ce355d13d4	[rllib] Allow envs to be auto-registered; add on_train_result callback with curriculum example (#3451 ) * train step and docs * debug * doc * doc * fix examples * fix code * integration test * fix * ... * space * instance * Update .travis.yml * fix test	2018-12-03 23:15:43 -08:00
Kristian Hartikainen	be6567e6fd	Tweak/exec attach info (#3447 ) * Add custom cluster name to exec info * Update submit info to match exec info	2018-12-03 21:39:43 -08:00
Eric Liang	d8205976e8	[rllib] Auto clip actions to Box space range; deprecate squash_to_range (#3426 ) * fix clip * tweak wording * remove squash entirely * Update rllib-models.rst * fix argument order * Apply suggestions from code review Co-Authored-By: ericl <ekhliang@gmail.com>	2018-12-03 19:55:25 -08:00
Eric Liang	7abfbfd2f7	[rllib] Better error message for unsupported non-atari image observation sizes (#3444 )	2018-12-03 01:24:36 -08:00
Stephanie Wang	4abafd7e62	Fix bug in ray.wait (#3445 ) ray.wait depends on callbacks from the GCS to decide when an object has appeared in the cluster. The raylet crashes if a callback is received for a wait request that has already completed, but this actually can happen, depending on the order of calls. More precisely: 1. Objects A and B are put in the cluster. 2. Client calls ray.wait([A, B], num_returns=1). 3. Client subscribes to locations for A and B. Locations are cached for both, so callbacks are posted for each. 4. Callback for A fires. The wait completes and the request is removed. 5. Callback for B fires. The wait request no longer exists and raylet crashes.	2018-12-01 19:40:33 -08:00
Eric Liang	13c8ce4d84	Update README.rst with 0.6.0 version number. (#3453 )	2018-12-01 19:16:45 -08:00
Philipp Moritz	c5b5cdae33	Upgrade Arrow to include Plasma TensorFlow Op release fix (#3448 ) This includes a fix so the TensorFlow op releases memory properly (https://github.com/apache/arrow/pull/3061) and the possibility to store arrow data structures in plasma (https://github.com/apache/arrow/pull/2832). https://github.com/ray-project/ray/issues/3404	2018-12-01 16:15:09 -08:00
Hao Chen	abd37df41e	Add stress test for Java worker (#3424 )	2018-12-01 16:11:09 -08:00
Robert Nishihara	0603e0b73a	Bump version from 0.5.3 to 0.6.0. (#3420 )	2018-12-01 11:39:36 -08:00
Devin Petersohn	57512616e1	Update readme to contain logo (#3443 ) * Adding logo to readme * Updating link * Add badge * Addressing comments * Moving logo * Change align * Move image	2018-11-30 18:28:35 -08:00
GiliR4t1qbit	454d3aa07d	[docs] Snippet did not have a code-block tag above it (#3442 )	2018-11-30 16:39:40 -08:00
Stephanie Wang	447604a9fe	Use actor ID for the dummy object (#3437 )	2018-11-29 22:31:04 -08:00
Eric Liang	07d8cbf414	[rllib] Support batch norm layers (#3369 ) * batch norm * lint * fix dqn/ddpg update ops * bn model * Update tf_policy_graph.py * Update multi_gpu_impl.py * Apply suggestions from code review Co-Authored-By: ericl <ekhliang@gmail.com>	2018-11-29 13:33:39 -08:00
Devin Petersohn	4d2010a852	Ship Modin with Ray. (#3109 )	2018-11-29 20:05:24 +01:00
Stephanie Wang	48a5935224	Fault tolerance for actor creation (#3422 ) * Add regression test * Request actor creation if no actor location found * Comments * Address comments * Increase test timeout * Trigger test	2018-11-29 10:48:35 -08:00
Chunyang Wen	fd7e494344	Remove: duplicate feed_dict constructing (#3431 )	2018-11-29 10:21:46 -08:00
Kristian Hartikainen	7e319dbf0c	Automatically indent tune logger params (#3399 )	2018-11-29 00:15:50 -08:00
Eric Liang	c46ea2ff4b	Click 0.7 changes the naming convention for commands; fix this	2018-11-28 14:59:58 -08:00
Tianming Xu	139fbf7884	Initialize client_id_ in ObjectManager constructor that takes user-defined ObjectDirectory (#3403 )	2018-11-27 23:51:18 -08:00
Robert Nishihara	82863b5251	[autoscaler] Update autoscaler to use heartbeat batches. (#3409 )	2018-11-27 23:46:27 -08:00
Eric Liang	f0df97db6f	[rllib] example and docs on how to use parametric actions with DQN / PG algorithms (#3384 )	2018-11-27 23:35:19 -08:00
Eric Liang	c2108ca64f	Don't put entire actor registry in debug string since it's too long (#3395 )	2018-11-27 16:48:12 -08:00
Eric Liang	0d56fc10cc	Move setproctitle to ray[debug] package (#3415 )	2018-11-27 09:50:59 -08:00

1 2 3 4 5 ...

2322 commits