we fixed groupby issue in cuj2; sync the change into nightly test. this test doesn't need to use gpu at all. it returns soon after data ingestion finishes.
This PR does two things:
merge latest groupby based filtering to CUJ2
add a debug mode so we only run dummy trainer for measure data processing performance.
This PR is mostly for implementing "fixture" for nightly test. Note that the current fixture implementation is not that great, and we can probably improve this in the future after refactoring e2e.py.
The ray-ml image depends on numpy ~=1.19.2 via the tensorflow==2.6 requirement. Unfortunately that's incompatible with Dataset (see here #20258 (comment)).
This PR upgrades the numpy dependency only for the nightly test.
Why are these changes needed?
In the past, there was a regression the placement group creation time gets slower as time goes. I believe the issue is fixed in the master, but this PR verifies if that's actually fixed.
This PR adds a long running test for the placement group. There are 2 purposes of the test.
Make sure the placement group creation / removal doesn't get slower as time goes. The test basically measure the first 20 iteration P50 creation time and run very long iteration. After all iteration, it checks if the p50 creation time is not too slow compared to the initial round.
Make sure placement group removal / creation works consistently for a long time without an issue.
Q: Should we make it a real long running test? (that runs for a day?)
* use nightly
* switch ml cpu to ray cpu
* fix
* add pytest
* add more pytest
* add constraint
* add tensorflow
* fix merge conflict
* add tblib
* fix
* add back uninstall
## Why are these changes needed?
In the nightly test we see
```
Command returned non-success status: 1; Command logs:Traceback (most recent call last): File "dask_on_ray/large_scale_test.py", line 17, in from ray._private.test_utils import monitor_memory_usage File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/_private/test_utils.py", line 18, in import pytest ModuleNotFoundError: No module named 'pytest'
```
This PR fixes this error.
## Related issue number
## Why are these changes needed?
We have concern that grpc based broadcasting might have negative impact on pg related workload. This test is to ensure it's running well before merging.
## Related issue number
#19438
## Why are these changes needed?
There are two issues fixed in this PR:
- make sure wait for session count alive node
- upgrade the machine to match what's tested in oss ray.
## Related issue number
https://github.com/ray-project/ray/issues/19084
When testing it we should minimize unnecessary env vars (and it's better working with the default config). This PR removes unnecessary env vars that are set.
* Revert "[nightly] Deflaky nightly test many_nodes_actor_test (#18582)"
This reverts commit fc6a739e4b.
* move to large test
Co-authored-by: Yi Cheng <chengyidna@gmail.com>
* in progerss
* ASAN tests.
* d
* in progress
* in progress without the asan wheel
* Support the asan wheel.
* Support the asan wheels
* Not build a binary for asan
* Fix issues
* Remove a wrong build
* Separate out asan wheel build
* Try preparing more deps.
* ip
* Try different version
* done
* d
* Trial
* Another try
* Another try
* skip cpp build to see what happens
* add more des
* ip
* abc
* Try next
* completed
* try
* Try without static libasan
* dbg
* Try static link
* Fix issues
* abc