hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-05 10:01:43 -05:00

Author	SHA1	Message	Date
fangfengbin	c17404918c	[GCS]Add gcs table storage interface (#7949 )	2020-04-15 10:48:12 +08:00
ZhuSenlin	4a81793ba5	GCS-Based actor management implementation (#6763 ) * add gcs actor manager * fix test_metrics.py * fix TestTaskInfo * fix comment * fix comment * fix comment * fix comment * fix comment * fix comment * fix compile error * fix merge error Co-authored-by: senlin.zsl <senlin.zsl@antfin.com>	2020-04-13 09:48:48 -07:00
micafan	c222d64ca1	[GCS] Add MessagePublisher to GCS (#7771 )	2020-04-13 19:32:28 +08:00
mehrdadn	07002825aa	Proper command-line parsing (#7603 ) * Command-line parsing functions * Work around bug in MSVCRT for passing command-lines to programs * Polishing * Fix std::regex_replace() overload compatibility issue with GCC 4.8.x * Try to work around linker error * Implement ScanToken() * Parse command-lines via ScanToken * Merge src/ray/util.cc and src/ray/url.cc Co-authored-by: Mehrdad <noreply@github.com>	2020-04-11 23:07:07 -07:00
Stephanie Wang	d7eef808b8	[core] Reconstruction for lost plasma objects (#7733 ) * Add a lineage_ref_count to References * Refactor TaskManager to store TaskEntry as a struct * Refactor to fix deadlock between TaskManager and ReferenceCounter Add references to task specs * Pin TaskEntries and References in the lineage of any ObjectIDs in scope * Fix deadlock, convert num_plasma_returns to a set of object IDs * fix unit tests * Feature flag * Do not release lineage for objects that were promoted to plasma * fix build * fix build * Remove num executions * Remove num executions * Add pinned locations to ReferenceCounter, empty handler for node death * Fix num returns for actor tasks, fix Put return value * Add regression test * Clear pinned locations and callbacks on node removal * Clear pinned locations and callbacks on node removal * Simplify num return values * Remove unused * doc * tmp * Set num returns * Move lineage pinning flag to ReferenceCounter * comments * Recover from plasma failures by pinning a new copy * Basic object reconstruction, no concurrent reqs yet * reconstruction test suite and a few fixes: - fix for disabling lineage - fix for updating submitted task refs * Handle concurrent attempts to recover the same object * Fix deadlock in DrainAndShutdown * Revert "[core] Revert lineage pinning (#7499) (#7692)" This reverts commit `ba86a02b37`. * debug rllib * debug rllib * turn on all rllib tests again * debug rllib * Fix drain bug, check number of pending tasks * revert rllib debug * remove todo * Trigger rllib tests * revert rllib debug commit * Split out logic into ObjectRecoveryManager * Fix python tests * Refactor to remove dependency on gcs client * Unit tests * Move pinned at node ID to direct memory store * Unit test fixes and lint * simplify and more tests * Add ResubmitTask test for TaskManager * Doc * fix build * comments * Fix * debug * Update * fix * Fix * Fix bad status handling, unit test * Fix build	2020-04-11 16:52:57 -07:00
Kai Yang	48b48cc8c2	Support multiple core workers in one process (#7623 )	2020-04-07 11:01:47 +08:00
micafan	e91595f955	[GCS] Add ObjectLocator to gcs server (#7557 )	2020-04-07 10:37:24 +08:00
micafan	780c1c3b08	[GCS] impl RedisStoreClient for GCS Service (#7675 )	2020-04-01 21:18:19 +08:00
SangBin Cho	c23e56ce9a	Metrics Export Service (#7809 )	2020-03-30 23:28:32 -07:00
mehrdadn	f86e623095	Fix & improve GitHub Actions CI builds (#7784 )	2020-03-30 16:29:54 -07:00
SongGuyang	c195dc8f88	Basic C++ worker implementation (#6125 )	2020-03-27 23:01:08 +08:00
mehrdadn	e69664b74b	Miscellaneous Windows compatibility bugfixes (#7658 ) * Windows compatibility bug fixes * Use WSASend/WSARecv as WSASendMsg/WSARecvMsg do not work with TCP sockets * Clean up some TODOs * Fix duplicate compilations * RedisAsioClient boost::asio::error::connection_reset Co-authored-by: Mehrdad <noreply@github.com>	2020-03-19 19:32:53 -07:00
Scott Graham	37e4d29f87	[autoscaler] Adding Azure Support (#7080 ) * adding directory and node_provider entry for azure autoscaler * adding initial cut at azure autoscaler functionality, needs testing and node_provider methods need updating * adding todos and switching to auth file for service principal authentication * adding role / scope to service principal * resolving issues with app credentials * adding retry for setting service principal role * typo and adding retry to nic creation * adding nsg to config, moving nic/public ip to node provider, cleanup node_provider, leaving in NodeProvider stub for testing * linting * updating cleanup and fixing bugs * adding directory and node_provider entry for azure autoscaler * adding initial cut at azure autoscaler functionality, needs testing and node_provider methods need updating * adding todos and switching to auth file for service principal authentication * adding role / scope to service principal * resolving issues with app credentials * adding retry for setting service principal role * typo and adding retry to nic creation * adding nsg to config, moving nic/public ip to node provider, cleanup node_provider, leaving in NodeProvider stub for testing * linting * updating cleanup and fixing bugs * minor fixes * first working version :) * added tag support * added msi identity intermediate * enable MSI through user managed identity * updated schema * extend yaml schema remove service principal code add re-use of managed user identity * fix rg_id * fix logging * replace manual cluster yaml validation with json schema - improved error message - support for intellisense in VSCode (or other IDEs) * run linting * updating yaml configs and formatting * updating yaml configs and formatting * typo in example config * pulling default config from example-full * resetting min, init worker prop * adding docs for azure autoscaler and fixing status * add azure to docs, fix config for spot instances, update azure provider to avoid caching issues during deployment * fix for default subscription in azure node provider * vm dev image build * minor change * keeping example-full.yaml in autoscaler/azure, updating azure example config * linting azure config * extending retries on azure config * lint * support for internal ips, fix to azure docs, and new azure gpu example config * linting * Update python/ray/autoscaler/azure/node_provider.py Co-Authored-By: Richard Liaw <rliaw@berkeley.edu> * revert_this * remove_schema * updating configs and removing ssh keygen, tweak azure node provider terminate * minor tweaks Co-authored-by: Markus Cozowicz <marcozo@microsoft.com> Co-authored-by: Ubuntu <marcozo@mc-ray-jumpbox.chcbtljllnieveqhw3e4c1ducc.xx.internal.cloudapp.net> Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2020-03-15 14:48:27 -07:00
mehrdadn	a87199d240	Fix cyclic dependency between ray/util and ray/common (#7581 ) * Fix cyclic dependency Headers in ray/util should not depend on those in ray/common * Move random generations to ray/common/test_util.h * Add license header Co-authored-by: Mehrdad <noreply@github.com> Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>	2020-03-14 12:44:53 -07:00
mehrdadn	fc76586518	Redis on Windows (#7509 ) * Switch hiredis on Windows to that of the Windows port of Redis * Use boost::asio::ip::tcp::socket::native_handle_type * Use normal hiredis instead of Windows-specific one * Finish up using normal hiredis Co-authored-by: Mehrdad <noreply@github.com>	2020-03-09 18:49:54 -07:00
mehrdadn	5fb5be0ba5	Some bug fixes for Windows (#7374 ) * Fix MAP_SHARED check in sys/mman.h * Fix missing :platform_shims dependency for ray_util * dlmalloc patch for Arrow	2020-02-28 10:22:32 -08:00
mehrdadn	0efaa9b310	Use Redis for Windows (#7364 )	2020-02-28 10:18:56 -08:00
mehrdadn	8730996682	Windows changes (#7315 )	2020-02-27 15:14:10 -08:00
fangfengbin	ba494b5281	Fix gcs client rpc operation disorder bug (#7283 )	2020-02-26 19:24:24 +08:00
Edward Oakes	d190e73727	Use our own implementation of parallel_memcopy (#7254 )	2020-02-21 11:03:50 -08:00
Eric Liang	5df801605e	Add ray.util package and move libraries from experimental (#7100 )	2020-02-18 13:43:19 -08:00
mehrdadn	e09f63ad65	Fix build errors and add more targets to Windows builds (#6811 ) * Fix common.fbs rename (due to apache/arrow/commit/bef9a1c251397311a6415d3dc362ef419d154caa) * Add missing COPTS * Use socketpair(AF_INET) if boost::asio::local is unavailable (e.g. on Windows) * Fix compile bug in service_based_gcs_client_test.cc (fix build breakage in #6686) * Work around googletest/gmock inability to specify override to avoid -Werror,-Winconsistent-missing-override * Fix missing override on IsPlasmaBuffer() * Fix missing libraries for streaming * Factor out install-toolchains.sh * Put some Bazel flags into .bazelrc * Fix jni_md.h missing inclusion * Add ~/bin to PATH for Bazel * Change echo $$(date) > $@ to date > $@ * Fix lots of unquoted paths * Add system() call checks for Windows Co-authored-by: GitHub Web Flow <noreply@github.com>	2020-02-11 16:49:33 -08:00
mehrdadn	83c4e947c7	Make Cython rules more consistent for Bazel (#6840 )	2020-02-10 10:45:54 -08:00
mehrdadn	ad4ac9aa70	Add clang-iwyu (#7081 ) * Add iwyu Co-authored-by: GitHub Web Flow <noreply@github.com>	2020-02-07 16:19:46 -08:00
fangfengbin	ade7ebfc0c	Add service based gcs client (#6686 )	2020-02-05 12:06:25 +08:00
mehrdadn	bde575b8dd	Revert "Use Boost.Process instead of pid_t (#6510 )" (#6909 ) This reverts commit `fb8e3615d5`.	2020-01-26 10:26:44 -06:00
Yunzhi Zhang	aa5427ca78	[Dashboard] Kill actor (#6906 )	2020-01-24 17:21:44 -08:00
Yunzhi Zhang	0834bda8c1	[Dashboard] Display actor task execution info (#6705 ) Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>	2020-01-22 22:33:55 -08:00
mehrdadn	139bf8908e	Replace UNIX sockets with TCP sockets in Ray on Windows (#6823 ) * Replace UNIX sockets with TCP sockets in Ray	2020-01-20 17:28:11 -08:00
mehrdadn	fb8e3615d5	Use Boost.Process instead of pid_t (#6510 ) * Use Boost.Process instead of pid_t This will let us handle child processes (mostly) uniformly across platforms. TODO: There is no SIGTERM on Windows; achieving something equivalent is fairly involved.	2020-01-15 20:05:02 -08:00
mehrdadn	76c986bdc7	Windows compatibility stubs (#6706 )	2020-01-05 21:21:17 -08:00
micafan	970cd78701	[GCS] refactor the GCS Client Dynamic Resource Interface (#6266 )	2020-01-03 14:07:37 +08:00
micafan	a492333f4e	[GCS] refactor the GCS Client Object Interface (#5695 )	2019-12-27 15:18:54 +08:00
micafan	b98b288ffd	[GCS] Change GCS Test to cc_test (#6596 )	2019-12-26 14:34:35 +08:00
Chaokun Yang	7bbfa85c66	[Streaming] Streaming data transfer java (#6474 )	2019-12-22 10:56:05 +08:00
fangfengbin	3c0164419b	Add gcs server job info & actor info handler (#6469 )	2019-12-20 14:28:04 +08:00
mehrdadn	7a24144bfd	Polish Bazel build scripts (#6424 ) * Polish Bazel build scripts * Remove glog references from streaming_logging.cc * Move out COPTS and reference them * Disable streaming on Windows * Remove -fno-gnu-unique	2019-12-17 02:38:36 -08:00
mehrdadn	74b2e871b7	Tentative workaround for some forks and signals on Windows (#6362 ) * Platform shims for Windows * Tentative workaround for some forks and signals on Windows * Rewrite WorkerPool::StartProcess by moving spawnvp wrapper to a separate function * Separate spawnvp the wrappers for POSIX and Windows * Fix rv use	2019-12-16 16:57:49 -08:00
ZhuSenlin	6c0531683f	Add gcs server as well as the unit test (#6401 )	2019-12-15 13:23:42 +08:00
micafan	8c1520d18e	[GCS] refactor the GCS Client Job Interface (#5503 )	2019-12-12 16:57:32 +08:00
Chaokun Yang	6272907a57	[Streaming] Streaming data transfer and python integration (#6185 )	2019-12-10 20:33:24 +08:00
micafan	668ce47360	[GCS]Add abstract interface of actor to GCS Client (#6269 )	2019-12-05 13:38:29 +08:00
mehrdadn	75cc994e0a	Update various build options relating to Windows (#6315 ) * Update .bazelrc for Windows compatibility * Block inclusion of (legacy) WinSock.h to avoid errors * Suppress warnings for Windows code * Include boost::asio in includes so that it is passed as -isystem to avoid warnings * Link with -lpthread only on non-Windows * Undefine BOOST_FALLTHROUGH, which is unnecessary and causes macro redefinition warnings * Define RAY_STATIC and ARROW_STATIC to compile for Windows * Add WinSock import library for Arrow	2019-12-01 15:05:50 -08:00
mehrdadn	b8cfdba752	Bazelify hiredis (#6203 )	2019-11-29 15:32:45 -08:00
Stephanie Wang	f6a0408173	Track pending tasks with TaskManager (#6259 ) * TaskStateManager to track and complete pending tasks * Convert actor transport to use task state manager * Refactor direct actor transport to use TaskStateManager * rename * Unit test * doc * IsTaskPending * Fix? * Shared ptr * HUH? * Update src/ray/core_worker/task_manager.cc Co-Authored-By: Zhijun Fu <37800433+zhijunfu@users.noreply.github.com> * Revert "HUH?" This reverts commit f80f0ba204ff4da5e0b03191fa0d5a4d9f552434. * Fix memory issue * oops	2019-11-25 16:37:26 -08:00
Eric Liang	53641f1f74	Move more unit tests to bazel (#6250 ) * move more unit tests to bazel * move to avoid conflict * fix lint * fix deps * seprate * fix failing tests * show tests * ignore mismatch * try combining bazel runs * build lint * remove tests from install * fix test utils * better config * split up * exclusive * fix verbosity * fix tests class * cleanup * remove flaky * fix metrics test * Update .travis.yml * no retry flaky * split up actor * split basic test * split up trial runner test * split stress * fix basic test * fix tests * switch to pytest runner for main * make microbench not fail * move load code to py3 * test is no longer package * bazel to end	2019-11-24 11:43:34 -08:00
Ion	68ac08332b	Initial commit of new cluster resource scheduler (#6178 )	2019-11-22 11:14:46 -08:00
mehrdadn	ba86c75c21	Patch Cython in grpc to use our COPTS (#6223 )	2019-11-21 15:32:48 -08:00
Simon Mo	29ba6bfc64	Basic Async Actor Call (#6183 ) * Start trying to figure out where to put fibers * Pass is_async flag from python to context * Just running things in fiber works * Yield implemented, need some debugging to make it work * It worked! * Remove debug prints * Lint * Revert the clang-format * Remove unnecessary log * Remove unncessary import * Add attribution * Address comment * Add test * Missed a merge conflict * Make test pass and compile * Address comment * Rename async -> asyncio * Move async test to py3 only * Fix ignore path	2019-11-21 11:56:46 -08:00
Stephanie Wang	c0be9e6738	Resolve dependencies locally before submitting direct actor tasks (#6191 ) * Priority queue in direct actor transport by task number * Move LocalDependencyResolver out to separate file, share with direct actor transport * works * Test case for ordering * Cleanups * Remove priority queue * comment * Share ClientFactoryFn with direct actor transport * Unit test * fix	2019-11-20 16:45:19 -08:00

1 2 3

106 commits