* use a sole thread to handle heartbeat
* separate signal thread
* use work to avoid exiting when task is underway
* protect shared data structure to avoid deadlock
* add comments
* decrease io service num
* minor changes
* fix test
* per stephanie's comments
* use single io service instead of 1-size io service pool
* typo
* Revert "[dist] swap mac/linux wheel build order (#9746)"
This reverts commit a9340565ff.
* Revert "Fix package and upload ray jar (#9742)"
This reverts commit c290c308fe.
* ray worker metrics gauge init
* ray java metric mapping
* add jni source files for gauge and tagkey
* mapping all metric classes to stats object
* check non-null for tags and name
* lint
* add symbol for native metric JNI
* extern c for symbol
* add tests for all metrics
* Update Metric.java
use metricNativePointer instead.
* unify metric native stuff to one class
* fix jni file
* add comments for metric transform function in jni utils
* move metric function to native metric file
* remove unused disconnect jni
* Add a metric registry for java metircs
* Restore install-bazel.sh
* Add some comments for metric registry
* Fix thread safe problem of metrics
* Fix metric tests and remove sleep code from tests
* Fix comments of metrics
Co-authored-by: lingxuan.zlx <skyzlxuan@gmail.com>
* Separate out file_mounts contents hashing into its own separate hash
Add an option to continuously sync file_mounts from head node to worker nodes:
monitor.py will re-sync file mounts whenver contents change but will only run setup_commands if the config also changes
* add test and default value for file_mounts_sync_continuously
* format code
* Update comments
* Add param to skip setup commands when only file_mounts content changed during monitor.py's update tick
Fixed so setup commands run when ray up is run and file_mounts content changes
* Refactor so that runtime_hash retains previous behavior
runtime_hash is almost identical as before this PR. It is used to determine if setup_commands need to run
file_mounts_contents_hash is an additional hash of the file_mounts content that is used to detect when only file syncing has to occur.
Note: runtime_hash value will have changed from before the PR because we hash the hash of the contents of the file_mounts as a performance optimization
* fix issue with hashing a hash
* fix bug where trying to set contents hash when it wasn't generated
* Fix lint error
Fix bug in command_runner where check_output was no longer returning the output of the command
* clear out provider between tests to get rid of flakyness
* reduce chance of race condition from node_launcher launching a node in the middle of an autoscaler.update call