hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-09 12:56:46 -04:00

Author	SHA1	Message	Date
Robert Nishihara	301e0b0db8	Bump version to 0.1.1 in preparation for uploading wheels to PyPI. (#630 )	2017-06-03 02:17:39 +00:00
Robert Nishihara	2694337c0f	Fix large memory tests. (#632 ) * Log the driver ID in hex instead of binary. * Fix large memory test and add more tests to it. * Remove tests that are too stressful.	2017-06-03 01:12:56 +00:00
Robert Nishihara	23b0c80967	Rename linux wheels so they can be uploaded to PyPI. (#629 )	2017-06-02 20:20:34 +00:00
Robert Nishihara	1a682e2807	Enable starting and stopping ray with "ray start" and "ray stop". (#628 ) * Install start_ray and stop_ray scripts in setup.py. * Update documentation. * Fix docker tests. * Implement stop_ray script in python. * Fix linting.	2017-06-02 20:17:48 +00:00
Robert Nishihara	a4d8e13094	Suppress excess warning messages related to intentional actor deaths. (#627 ) * Don't submit the actor destructor tasks when the job is exiting. * Don't propagate error messages to the driver when an actor exits intentionally.	2017-06-01 20:10:40 +00:00
Robert Nishihara	d0bfc0a849	Clean up actor workers when actor handle goes out of scope. (#617 )	2017-06-01 07:02:43 +00:00
Robert Nishihara	bcaab78908	Add script for building MacOS wheels. (#601 ) * Add script for building MacOS wheels. * Small cleanups to script. * Fix setting of PATH before building wheel. * Create symbolic link to correct Python executable so Ray installation finds the right Python. * Address comments. * Rename readme.	2017-06-01 00:30:46 +00:00
Richard Shin	609b5c1a4c	Add script to build manylinux1 .whl files (#600 ) * Add manylinux setup * Switch to cp27mu * python/MANIFEST.in * Fix MANIFEST.in * Add build-wheel-manylinux1.sh * Update readme * Install correct version of numpy * Fix typo in README-manylinux1.md * Don't install cmake * Remove commented line from setup.py * Delete unused manylinux1.sh * Run setup.py bdist_wheel twice * Don't use package_data and MANIFEST.in. * Small aesthetic change. * Trigger build_ext in setup.py. * Remove nonexistent file from MANIFEST.in. * Manually copy files in MANIFEST.in to where Python expects them in order to prevent setup.py from having to be run twice. * Only run setup.py once when building wheels. * Aesthetic change to readme. * Copy generated flatbuffer Python files in build_ext. * Fix permission denied error by making sure to preserve executableness when copying files. * Remove unnecessary argument to setup.py. * Remove MANIFEST.in and move files to include into list in setup.py. * Fix numpy version when building wheels and replace rm with git clean.	2017-05-27 21:35:48 -07:00
Chelsea Finn	f97d0393cc	Fix to json decoding bug (#597 ) * fix json decoding bug * Fix linting error.	2017-05-25 18:48:39 -07:00
Robert Nishihara	997aa35721	Remove cloudpickle customization and just use plain cloudpickle. (#588 ) * Remove augmentations of cloudpickle. * Entirely remove cloudpickle modifications. Just use plain cloudpickle.	2017-05-24 20:22:28 -07:00
Robert Nishihara	c5bc76193f	Remove Ray environment variables from codebase. (#590 )	2017-05-24 18:29:40 -07:00
Robert Nishihara	c647dd5f6c	Make it possible to use actor definitions within remote functions and other actors. (#587 ) * Enable remote function and actor definitions to close over actor definitions. * Give better error message if actor objects are pickled. * Add tests for closing over actor definitions. * Fix linting.	2017-05-24 15:43:32 -07:00
Robert Nishihara	c440010cbd	Bump version to 0.1.0. (#581 )	2017-05-20 23:25:01 -07:00
Stephanie Wang	ee08c8274b	Shard Redis. (#539 ) * Implement sharding in the Ray core * Single node Python modifications to do sharding * Do the sharding in redis.cc * Pipe num_redis_shards through start_ray.py and worker.py. * Use multiple redis shards in multinode tests. * first steps for sharding ray.global_state * Fix problem in multinode docker test. * fix runtest.py * fix some tests * fix redis shard startup * fix redis sharding * fix * fix bug introduced by the map-iterator being consumed * fix sharding bug * shard event table * update number of Redis clients to be 64K * Fix object table tests by flushing shards in between unit tests * Fix local scheduler tests * Documentation * Register shard locations in the primary shard * Add plasma unit tests back to build * lint * lint and fix build * Fix * Address Robert's comments * Refactor start_ray_processes to start Redis shard * lint * Fix global scheduler python tests * Fix redis module test * Fix plasma test * Fix component failure test * Fix local scheduler test * Fix runtest.py * Fix global scheduler test for python3 * Fix task_table_test_and_update bug, from actor task table submission race * Fix jenkins tests. * Retry Redis shard connections * Fix test cases * Convert database clients to DBClient struct * Fix race condition when subscribing to db client table * Remove unused lines, add APITest for sharded Ray * Fix * Fix memory leak * Suppress ReconstructionTests output * Suppress output for APITestSharded * Reissue task table add/update commands if initial command does not publish to any subscribers. * fix * Fix linting. * fix tests * fix linting * fix python test * fix linting	2017-05-18 17:40:41 -07:00
Philipp Moritz	28f0882387	Expose function table to python global control state API (#542 ) * expose function table to python global control state API * fix * fix linting * add test for function table	2017-05-16 20:06:13 -07:00
Robert Nishihara	5572561704	Do not start web UI by default, and remove web UI from documentation. (#554 )	2017-05-16 19:29:07 -07:00
Robert Nishihara	ec2534422b	Remove register_class from API. (#550 ) * Perform ray.register_class under the hood. * Fix bug. * Release worker lock when waiting for imports to arrive in get. * Remove calls to register_class from examples and tests. * Clear serialization state between tests. * Fix bug and add test for multiple custom classes with same name. * Fix failure test. * Fix linting and cleanups to python code. * Fixes to documentation. * Implement recursion depth for recursively registering classes. * Fix linting. * Push warning to user if waiting for class for too long. * Fix typos. * Don't export FunctionToRun if pickling the function fails. * Don't broadcast class definition when pickling class.	2017-05-16 18:38:52 -07:00
Philipp Moritz	08e988aee5	Modernize plasma store (C to C++ changes). (#546 )	2017-05-15 01:19:44 -07:00
Robert Nishihara	9f91eb8c91	Change API for remote function declaration, actor instantiation, and actor method invocation. (#541 ) * Direction substitution of @ray.remote -> @ray.task. * Changes to make '@ray.task' work. * Instantiate actors with Class.remote() instead of Class(). * Convert actor instantiation in tests and examples from Class() to Class.remote(). * Change actor method invocation from object.method() to object.method.remote(). * Update tests and examples to invoke actor methods with .remote(). * Fix bugs in jenkins tests. * Fix example applications. * Change @ray.task back to @ray.remote. * Changes to make @ray.actor -> @ray.remote work. * Direct substitution of @ray.actor -> @ray.remote. * Fixes. * Raise exception if @ray.actor decorator is used. * Simplify ActorMethod class.	2017-05-14 00:01:20 -07:00
Robert Nishihara	22c6a22f28	Add flatbuffers dependency to setup.py. (#540 )	2017-05-11 23:39:34 -07:00
Robert Nishihara	b4788ae518	Only export actor classes once. (#510 ) * Only export actor classes once. * Fix linting. * Fixes after rebase.	2017-05-09 19:49:23 -07:00
Robert Nishihara	1f991b6389	Change /tmp/raylogs permissions so multiple users can log there. (#532 )	2017-05-09 12:15:31 -07:00
Robert Nishihara	f32368bcbe	Prevent actors from being placed on removed nodes or nodes with no CPUs. (#527 ) * Make note about bug in which actor creation notification message is not received. * Prevent actors from being created on removed nodes. * Prevent actors from being created on nodes with no CPUs. * Fix linting. * Add test for scheduling actors on local schedulers with no CPUs. * Improve error message when actors created before ray.init called.	2017-05-08 20:39:43 -07:00
Robert Nishihara	c688a64235	Expose GPU IDs to remote functions. (#496 ) * Change local scheduler bookkeeping to use GPU IDs. * Update actor test. * Add tests for actors and tasks simultaneously using GPUs. * Add additional task GPU ID test. * Fix linting. * Make redis GPU assignment ignore GPU IDs. * Small fix.	2017-05-07 13:03:49 -07:00
Robert Nishihara	35dbdcc4f5	Make all export IDs unique. (#522 ) * Make all export IDs unique. * Work around test failure.	2017-05-06 21:17:25 -07:00
Philipp Moritz	1dddd5336a	Fix actor bug arising from overwriting task specifications in the local scheduler (#513 ) * copy task specifications put into the actor task cache so it won't get overwritten when the scheduler receives the next task * cleanup * cleanup and fix * linting * fix jenkins test * fix linting	2017-05-06 17:39:35 -07:00
Robert Nishihara	8532ba4272	Serialize lambdas, sets, and types with pickle by default. (#511 ) * Serialize lambdas with pickle by default. * Serialize sets with pickle by default. * Serialize types with pickle by default. * Small update to documentation. * Update tests.	2017-05-04 00:16:35 -07:00
Robert Nishihara	245c8ab888	Make sure user seeding does not affect actor ID generation. (#506 ) * Make sure user seeding does not affect actor ID generation. * Fix linting. * Add test.	2017-05-03 16:29:55 -07:00
Robert Nishihara	1627f89945	Fix problem in which actors and workers running tasks are not killed by driver exit. (#490 ) * Augment test to verify that relevant workers and actors are killed during driver cleanup. * Fix bug in which we were only killing one worker when a driver exited. * Fix remove driver test. * Fix and augment test.	2017-04-26 15:13:39 -07:00
Robert Nishihara	0ac125e9b2	Clean up when a driver disconnects. (#462 ) * Clean up state when drivers exit. * Remove unnecessary field in ActorMapEntry struct. * Have monitor release GPU resources in Redis when driver exits. * Enable multiple drivers in multi-node tests and test driver cleanup. * Make redis GPU allocation a redis transaction and small cleanups. * Fix multi-node test. * Small cleanups. * Make global scheduler take node_ip_address so it appears in the right place in the client table. * Cleanups. * Fix linting and cleanups in local scheduler. * Fix removed_driver_test. * Fix bug related to vector -> list. * Fix linting. * Cleanup. * Fix multi node tests. * Fix jenkins tests. * Add another multi node test with many drivers. * Fix linting. * Make the actor creation notification a flatbuffer message. * Revert "Make the actor creation notification a flatbuffer message." This reverts commit af99099c8084dbf9177fb4e34c0c9b1a12c78f39. * Add comment explaining flatbuffer problems.	2017-04-24 18:10:21 -07:00
Robert Nishihara	3a2eb1467b	Fix failure to propagate error message. (#479 )	2017-04-23 16:12:25 -07:00
Robert Nishihara	c802e51d36	Re-enable recursive remote functions in a limited form. (#453 ) * Re-enable recursive remote functions in a limited form. * Fix linting.	2017-04-13 01:47:33 -07:00
Robert Nishihara	f4c1adae17	Unify function signature handling between remote functions and actor … (#441 ) * Unify function signature handling between remote functions and actor methods. * Fixes. * Fix tests.	2017-04-08 21:34:13 -07:00
Alexey Tumanov	b6c4ae82c0	Increase redis client pubsub buffer size. (#442 )	2017-04-08 15:24:07 -07:00
Robert Nishihara	7cd00741b1	Suppress irrelevant Redis connection errors. (#434 ) * Suppress error messages in worker import thread when Redis terminates. * Suppress some warnings from one of the tests.	2017-04-07 23:19:24 -07:00
Robert Nishihara	05fd4c2c37	Changes to local scheduler client protocol. (#435 ) * Make local scheduler clients receive reply upon registration. * Fix tests and linting.	2017-04-07 23:03:37 -07:00
Robert Nishihara	7af6f462fb	Add API for querying global control state. (#431 ) * Add API for querying global control state. * Fix linting. * Fix errors in Python 2. * Fix bug in test. * Fix bug in test.	2017-04-06 23:51:12 -07:00
Robert Nishihara	320109a5bd	By default, start a number of workers equal to the number of CPUs. (#430 ) * By default, start a number of workers equal to the number of CPUs. * Fix stress tests.	2017-04-06 00:02:58 -07:00
Stephanie Wang	93679df724	Stopped nodes can rejoin immediately (#428 ) * Ignore deleted clients when reading address info from Redis * Remove self from db_client table when exiting cleanly * Fix valgrind test * Do not call plasma_perform_release when disconnecting	2017-04-05 23:50:38 -07:00
Philipp Moritz	4043769ba2	Make putting large objects work. (#411 ) * putting large objects * add more checks * support large objects * fix test * fix linting * upgrade to latest arrow version * check malloc return code * print mmap file sizes * printing * revert to dlmalloc * add prints * more prints * add printing * printing * fix * update * fix * update * print * initialization * temp * fix * update * fix linting * comment out object_store_full tests * fix test * fix test * evict objects if dlmalloc fails * fix stresstests * Fix linting. * Uncomment large-memory tests. * Increase memory for docker image for jenkins tests. * Reduce large memory tests. * Further reduce large memory tests.	2017-04-05 01:04:05 -07:00
Robert Nishihara	0925e11c48	Exclude function source from function ID hash in Python interpreter. (#395 ) * Exclude function source code from function ID hash in Python interpreter. * Remove try except block.	2017-03-25 11:31:21 -07:00
Robert Nishihara	ba02fc0eb0	Run flake8 in Travis and make code PEP8 compliant. (#387 )	2017-03-21 12:57:54 -07:00
Stephanie Wang	083e7a28ad	Push an error to the driver when the workload hangs on `ray.put` reconstruction (#382 ) * Fix worker blocked bug * tmp * Push an error to the driver on ray.put for non-driver tasks * Fix result table tests * Fix test, logging * Address comments * Fix suppression bug * Fix redis module test * Edit error message * Get values in chunks during reconstruction * Test case for driver ray.put errors * Error for evicting ray.put objects from the driver * Fix tests * Reduce verbosity * Documentation	2017-03-21 00:16:48 -07:00
Stephanie Wang	12c9618c0c	Plasma and worker node failure. (#373 ) * Failing test case * Local scheduler exits cleanly after plasma store dies * Tolerate one plasma store failure * Tolerate plasma store failures on all nodes except head node * Plasma manager heartbeats * Component failure tests * Don't run the helper for Python testing * Fix C test * Fix hanging plasma transfer test * Fix python3 * Consolidate ClientConnection code * Fix valgrind test * fix c test * We can restart worker nodes! * Fix flatbuffers bug * Address comments * Only register actual workers with the local scheduler * Fix bug * Fix segfaults * Add test case that tests for driver liveness, fix local scheduler bug * Clean up after tests * Allocate retry info on the stack * Send SIGKILL before waiting * Relax unit test conditions * Driver liveness test case and documentation	2017-03-17 17:03:58 -07:00
Robert Nishihara	f1d4dda8cb	Put all log files in redis and visualize them in UI. (#350 ) * Start process for monitoring log files and push changes to redis. * Display log files in UI. * Bug fix for recent tasks. * Use flatbuffers to parse local scheduler heartbeats.	2017-03-16 15:27:00 -07:00
Robert Nishihara	3333e1d6b9	Fix bug in parsing of tasks in monitor. (#372 )	2017-03-15 20:32:23 -07:00
Robert Nishihara	3b7788bf88	Disallow calling ray.put on an object ID. (#353 )	2017-03-11 12:09:28 -08:00
Robert Nishihara	53dffe0bf2	Use flatbuffers for some messages from Redis. (#341 ) * Compile the Ray redis module with C++. * Redo parsing of object table notifications with flatbuffers. * Update redis module python tests. * Redo parsing of task table notifications with flatbuffers. * Fix linting. * Redo parsing of db client notifications with flatbuffers. * Redo publishing of local scheduler heartbeats with flatbuffers. * Fix linting. * Remove usage of fixed-width formatting of scheduling state in channel name. * Reply with flatbuffer object to task table queries, also simplify redis string to flatbuffer string conversion. * Fix linting and tests. * fix * cleanup * simplify logic in ReplyWithTask	2017-03-10 18:35:25 -08:00
Wapaul1	c66178bcd7	Resnet Adapted to Ray (#229 ) * Initial conversion * Further changes * fixes * some changes * Fixes * Added data pipeline * Added updates to cifar * Currently borken need sep pr * Added test for retriving variables from an optimizer * Removed FlAG ref in environment variables * Added comments to test * Addressed comments * Added updates * Made further changes for tfutils * Fixed finalized bug * Removed ipython * Added accuracy printing * Temp commit * added fixes * changes * Added writing to file * Fixes for gpus * Cleaned up code * Temp commit * Gpu support fully implemented * Updated to use num_gpus for actors * Finished testing gpus implementation * Changed to be more in line with origin implementation * Updated test to use actors * Added support for cpu only systems * Now works with no cpus * Minor changes and some documentation.	2017-03-07 01:07:32 -08:00
Stephanie Wang	da06b4db82	Warn the user when a nondeterministic task is detected. (#339 ) * WARN instead of FATAL for object hash mismatches, push error to driver * Document the callback signature for object_table_add/remove * Error table * Wait for all errors in python test * Fix doc * Fix state test	2017-03-07 00:32:15 -08:00

... 74 75 76 77 78

3894 commits