Commit graph

4393 commits

Author SHA1 Message Date
Sven Mika
6043ce710d
Fix old exploration configs. (#7240) 2020-02-20 08:39:16 -08:00
chaokunyang
1ae7c03e86
fix concurrently extract file (#7225) 2020-02-20 20:38:51 +08:00
chaokunyang
d7f8d18a86
fix symbols conflict and add symbols check (#7227) 2020-02-20 19:31:16 +08:00
Siyuan (Ryans) Zhuang
0d210a99c3
Ensure deserialized numpy arrays are immutable (#7181)
* ensure numpy arrays are immutable when deserialized from the memory buffer
2020-02-19 23:30:10 -08:00
Stephanie Wang
7e3819a27a
[core] Eagerly evict objects that are no longer in scope (#7220)
* Batch free requests, and free when object is unpinned

* rename

* note
2020-02-19 20:51:38 -08:00
Simon Mo
b804d40c04
Stop vendoring pyarrow (#7233) 2020-02-19 19:01:26 -08:00
Siyuan (Ryans) Zhuang
48c06f5042
Enhance the serialization refcount test for dynamic classes (#7222)
* enhance the test for dynamic classes
2020-02-19 18:35:35 -08:00
Eric Liang
46af992efd
[rllib] [experimental] custom RL training pipelines (PG_pl, A2C_pl) (#7213) 2020-02-19 16:07:37 -08:00
Simon Mo
7bef7031c2
Revert "Revert "Revert "Removing Pyarrow dependency (#7146)" (#7209) (#7214)" (#7232) 2020-02-19 13:35:29 -08:00
Sven Mika
d537e9f0d8
[RLlib] Exploration API: merge deterministic flag with exploration classes (SoftQ and StochasticSampling). (#7155) 2020-02-19 12:18:45 -08:00
Eric Liang
399424c418
[rllib] Fix broken check in eval mode for IMPALA #7217 2020-02-19 11:54:30 -08:00
Simon Mo
e8941b1b79
Revert "Revert "Removing Pyarrow dependency (#7146)" (#7209) (#7214) 2020-02-19 10:08:52 -08:00
Stephanie Wang
f76ce836b2
Distributed ref counting for serialized ObjectIDs (#6945)
* Skeleton plus a unit test for simple borrower case

* First unit test passes - forward an ID and task returns with 1 submitted task pending on the inner ID

* Invariant for contained_in

* Unit test passes for testing task return without creating a borrower

* Wrap ref count functionality in test case

* Fix bad delete

* Unit test and fix for borrowers creating more borrowers

* Unit test and fix for simple borrowing, but owner sends call after borrower's ref count goes to 0

* Refactor:
- keep a sentinel ref count for task argument IDs
- keep contained_in_borrowed in addition to contained_in_owned

* Unit test for nested IDs passes

* Refactor so that an object ID can only be contained in 1 borrowed ID at a time

* Add check

* Fix

* Unit test (passes) to test nesting object IDs but no borrowers created

* Unit test for nested objects from different owners passes, refactor to unset contained_in when popping refs

* Unit tests for borrowers receiving an ObjectID from multiple sources,
skip adding ownership info if we already have it to handle duplicate
refs

* Unit test for returning object ID passes

* More unit tests for returning object IDs pass

* Add serialized ID tests

* fix serialization issue

* remove swap

* It builds!

* debugging and some fixes:
- register handler for WaitForRefRemoved
- don't create a python reference for arg IDs
- pass in client factory into ReferenceCounter
- fix bad decrement in PopBorrowerRefs

* Fix accounting for serialized IDs:
- don't decrement for IDs on dependency resolution, wait until task finished
- add object IDs that were inlined when building the arguments to the task spec, pin these on the task executor until task finishes

* mu_ -> mutex_

* lint

* fix build

* clear outer_object_id

* add direct call type check

* Fix test for direct call IDs and return IDs for actor calls

* Fix CoreWorkerClient.Addr()

* Remove unneeded lock

* Remove unnecessary ObjectID refs

* Fix worker holding serialized refs test

* Fix hex IDs

* fix

* fix tests

* fix tests

* refactor and cleanups

* lint

* Put inlined Ids in task args and some cleanup

* Add back gc.collect() line for test case

* Refactor and fixes:
- store inlined IDs in RayObject
- allow storing objects with inlined IDs in memory store
- pin objects that were promoted to plasma

* oops

* make sure worker ID is set in address, pass in rpc::Address to CoreWorkerClient

* todos

* cleanups and test builds

* Fix tests

* Add feature flag

* cleanups

* address comments and some cleanups

* cleanup

* fix recursive test

* Comments for tests

* Turn off ref counting by default

* Skip tests

* Fix some bugs for test_array.py, java build

* Don't include nested objects in the ref count when the feature flag is off

* C++ feature flag does not work...

* Remove

* Turn on python tests and add a warning when plasma objects are evicted before being pinned

* Fix build and remove irrelevant test

* Fix for java

* Revert "Fix build and remove irrelevant test"

This reverts commit 056cca9b263ed05b0f9ab2250907338edcbca2d5.

* Fix ray.internal.free

* Fixes and skip some flaky tests

* fix java build

* fix windows build

* Add IDs contained in owned objects

* Update src/ray/protobuf/core_worker.proto

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* Update src/ray/core_worker/reference_count.cc

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* Update src/ray/protobuf/core_worker.proto

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* Update src/ray/protobuf/core_worker.proto

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* Update src/ray/core_worker/reference_count.h

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* Update src/ray/core_worker/reference_count.h

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* Update src/ray/core_worker/reference_count.cc

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* Apply suggestions from code review

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* update

* Try to fix ::test_direct_call_serialized_id_eviction

Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
2020-02-18 18:21:34 -08:00
mehrdadn
4a12243336
Use Process instead of pid_t (round 2) (#6882)
* Revert "Revert "Use Boost.Process instead of pid_t (#6510)" (#6909)"

This reverts commit bde575b8dd.

* Process wrapper, using Boost.Process on Windows

- Reverts bde575b8dd.
- Re-applies fb8e3615d5 after some refactoring.

* Remove Boost.Process dependency

* Don't open /proc file on Linux

* Change FATAL to ERROR and modify error message when process doesn't exist
2020-02-18 17:44:46 -08:00
Eric Liang
0aa9373d62
Revert "Removing Pyarrow dependency (#7146)" (#7209)
This reverts commit 2116fd3bca.
2020-02-18 14:12:06 -08:00
Eric Liang
5df801605e
Add ray.util package and move libraries from experimental (#7100) 2020-02-18 13:43:19 -08:00
Eric Liang
fae99ecb8e
[core] Make sure to unsubscribe get dependencies for direct task calls. (#7201)
* fix

* remove assert
2020-02-17 18:35:25 -08:00
ijrsvt
2116fd3bca
Removing Pyarrow dependency (#7146) 2020-02-17 18:00:13 -08:00
mehrdadn
3bd82d0bcd
Fix various issues/warnings that come up on Jenkins (#7147)
* Avoid warning about swap being unlimited

Currently we get the following message on Jenkins:
"Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap."

Since we're not limiting swap anyway, we might as well avoid trying to.
https://docs.docker.com/config/containers/resource_constraints/#--memory-swap-details

* Fix escaping in re.search()

* Fix escaping in _noisy_layer()

* Raise a more descriptive error when dashboard data isn't found

* Don't error on dashboard files not being found when webui isn't required

* Change dashboard error to a warning instead
2020-02-17 16:08:55 -08:00
Alex Wu
734629b4ea
Ssh command format (#7176) 2020-02-17 14:15:42 -08:00
Alind Khare
c6d768be14
[Serve] Added support for no http route services (#7010) 2020-02-17 11:31:30 -08:00
Eric Liang
42aea966ff
[rllib] Convert torch state arrays to tensors during compute actions (#7162)
* convert to tensor

* normalize fix
2020-02-17 10:26:58 -08:00
fyrestone
a6b8bd47b0
[xlang] Cross language serialize ActorHandle (#7134) 2020-02-17 20:44:56 +08:00
Edward Oakes
b079787c59
Fix flaky test_get_with_timeout (#7175) 2020-02-16 21:10:16 -08:00
Richard Liaw
94e2fcea2e
[sgd] fp16 (apex) and scheduler support + move examples page (#7061)
* Init fp16

* fp16 and schedulers

* scheduler linking and fp16

* to fp16

* loss scaling and documentation

* more documentation

* add tests, refactor config

* moredocs

* more docs

* fix logo, add test mode, add fp16 flag

* fix tests

* fix scheduler

* fix apex

* improve safety

* fix tests

* fix tests

* remove pin memory default

* rm

* fix

* Update doc/examples/doc_code/raysgd_torch_signatures.py

* fix

* migrate changes from other PR

* ok thanks

* pass

* signatures

* lint'

* Update python/ray/experimental/sgd/pytorch/utils.py

* Apply suggestions from code review

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* should address most comments

* comments

* fix this ci

* fix tests'

* testmode

Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
2020-02-16 19:04:08 -08:00
Sven Mika
f0e62d733f
Bazel exclude rllib-option fix. (#7185) 2020-02-16 11:26:03 -08:00
Eric Liang
b7016504e8
[rllib] Only run one set of tests unless rllib or tune dirs are changed. (#7179)
* full filter

* lint
2020-02-16 08:52:49 -08:00
Siyuan (Ryans) Zhuang
6745459f96
Apply cpython patch bpo-39492 for the reference counting issue in pickle5 (#7177)
* apply cpython patch bpo-39492 for the reference count issue
2020-02-15 21:16:13 -08:00
Eric Liang
b6233dff3c
[rllib] Fix bad sample count assert 2020-02-15 17:22:23 -08:00
Sven Mika
2e60f0d4d8
[RLlib] Move all jenkins RLlib-tests into bazel (rllib/BUILD). (#7178)
* commit

* comment
2020-02-15 14:50:44 -08:00
Edward Oakes
dc5a27dac0
Move ray.experimental.multiprocessing to ray.util.multiprocessing (#7149) 2020-02-14 16:17:05 -08:00
Richard Liaw
52d9189d5d
[autoscaler] port-forward for attach + redis_port (#7145)
* port-forward

* fixport

* force redis port in init mode

* test

* Update python/ray/tests/test_ray_init.py
2020-02-14 15:17:00 -08:00
Simon Mo
30de1286bd
Use pip install setup.py (#7158) 2020-02-14 13:53:36 -08:00
Adrian O'Grady
fe6ce714a0
[rllib] - TaskPool.completed_prefetch() no longer returns stale object ids after an error (#7139) 2020-02-13 22:30:44 -08:00
Qing Wang
f3703bafa3
[Java] Support concurrent actor calls API. (#7022)
* WIP

Temp change

Attach native thread to jvm

* Fix run mode

* Address comments.
2020-02-14 13:02:39 +08:00
Alex Wu
0d3687a10d
No warning for docker memory > system memory (#7151) 2020-02-13 15:21:44 -08:00
Edward Oakes
b81b93a9c0
Convert stress tests to projects (#6495) 2020-02-13 09:19:24 -08:00
Qing Wang
94a286ef1d
[Java] Add session_dir as temp_dir for logs, socket files like Python (#7044)
* Support

* Add gcs_server support

* Fix ut

* Fix

* Remove unused py code

* Fix linting

* Fix cross language ci

* Fix CI

* Add docstring

* Fix

* Fix linting

* Add a singleton for config

* Refine

* fix

* Fix

* linting

* Remove FileUnit

* Fix

* Fix

* Fix

* Update java/runtime/src/main/java/org/ray/runtime/config/RayConfig.java

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

* Fix streaming singleprocess CI

* Fix checkstyle

Co-authored-by: Hao Chen <chenh1024@gmail.com>
2020-02-13 17:49:52 +08:00
Sven Mika
5518a738b3
[RLlib] Fix erroneous use of LinearSchedule (in DDPG's exploration annealing). (#7125)
* Fix erroneous use of LinearSchedule (in DDPG's exploration annealing).
Erase schedules_obsoleted.py.

* Trigger re-test.

* Re-test.
2020-02-12 23:46:49 -08:00
wanxing
9fc3e2e50f
[Streaming]Add RefreshChannelInfo to support flow-control (#7071)
* add RefreshChannelInfo

* fix name

* add override

* fix

* fix return value
2020-02-13 09:30:56 +08:00
Edward Oakes
e904711e74
Add python tests for serialized object ID reference counting (#7038) 2020-02-12 16:52:07 -08:00
Edward Oakes
d91d3ea936
Split half of test_actor into test_actor_advanced (#7143) 2020-02-12 15:17:25 -08:00
Simon Mo
0e94e1dc2a
[Asyncio] Increase recursion limit manually (#7142) 2020-02-12 14:15:36 -08:00
Sven Mika
f41a9b9813
[RLlib] Fix KL method of MultiCategorial tf distribution (issue #7009). (#7119)
* Fix KL method of MultiCategorial tf distribution.

* Fix KL method of MultiCategorial tf distribution.

* Merge AsyncReplayOptimizer fixes into this branch.
2020-02-12 12:46:15 -08:00
Edward Oakes
275fd343fb
Change CI to properly list python3.6 (#7126) 2020-02-12 11:15:46 -08:00
Mitchell Stern
5dda0b66bf
[Dashboard] Refactor dialogs to use parent component state instead of routes (#7129) 2020-02-12 10:59:47 -08:00
aannadi
d941ac6c89
Updating package-lock.json with latest npm (#7128) 2020-02-12 09:54:20 -08:00
Richard Liaw
fc9352c588
[docs] Make walkthrough and starting Ray materials clear (#7099)
* make starting ray a separate page

* concept

* Apply suggestions from code review

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* more fics

* Apply suggestions from code review

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
2020-02-11 23:17:30 -08:00
Eric Liang
305eaaabe9
Fix hang if actor object id is returned from a task that exits (#6885) 2020-02-11 20:28:13 -08:00
mehrdadn
e09f63ad65
Fix build errors and add more targets to Windows builds (#6811)
* Fix common.fbs rename (due to apache/arrow/commit/bef9a1c251397311a6415d3dc362ef419d154caa)

* Add missing COPTS

* Use socketpair(AF_INET) if boost::asio::local is unavailable (e.g. on Windows)

* Fix compile bug in service_based_gcs_client_test.cc (fix build breakage in #6686)

* Work around googletest/gmock inability to specify override to avoid -Werror,-Winconsistent-missing-override

* Fix missing override on IsPlasmaBuffer()

* Fix missing libraries for streaming

* Factor out install-toolchains.sh

* Put some Bazel flags into .bazelrc

* Fix jni_md.h missing inclusion

* Add ~/bin to PATH for Bazel

* Change echo $$(date) > $@ to date > $@

* Fix lots of unquoted paths

* Add system() call checks for Windows

Co-authored-by: GitHub Web Flow <noreply@github.com>
2020-02-11 16:49:33 -08:00