Commit graph

405 commits

Author SHA1 Message Date
Simon Mo
feb8c29063
Revert "Revert "Revert "use an agent-id rather than the process PID (#24968)"… (#25376)" (#25669)
This reverts commit cb151d5ad6.
2022-06-13 09:22:52 -07:00
SangBin Cho
856bea31fb
[State Observability] Ray log CLI / API (#25481)
This PR implements the basic log APIs. For the better APIs (like higher level APIs like ray logs actors), it will be implemented after the internal API review is done.

# If there's only 1 match, print a file content. Otherwise, print all files that match glob.
ray logs [glob_filter] --node-id=[head node by default]

Args:
    --tail: Tail the last X lines
    --follow: Follow the new logs
    --actor-id: The actor id
    --pid --node-ip: For worker logs
    --node-id: The node id of the log
    --interval: When --follow is specified, logs are printed with this interval. (should we remove it?)
2022-06-13 05:52:57 -07:00
mwtian
65d7a610ab
[Core] Push message to driver when a Raylet dies (#25516)
Currently when Raylets die, it is hard to figure out:

if a Raylet died at all in a cluster. Usually we have to check on nodes where a number of workers died and see if the Raylet has died as well.
reason of Raylet's death.
With this PR, if a Raylet dies from a reason other than SIGTERM, the dashboard agent will report the failure along with last 20 lines of the Raylet log.
2022-06-09 05:54:34 -07:00
shrekris-anyscale
f3c2bd6718
[Serve] Make REST API deployments inherit top-level runtime_env (#25502) 2022-06-08 15:58:00 -07:00
Archit Kulkarni
6d2806f951
[Jobs] [Test] Add integration tests to cover runtime_env inheritance with working_dir and with Tune (#25562)
The current inheritance behavior for runtime_envs enables the following workflow for Jobs:  A working_dir can be set in the Jobs API, and then inside the driver script, if a new per-task runtime_env is defined, it will automatically inherit the driver's working_dir.

There is an ongoing discussion about the best approach for runtime_env inheritance going forward: https://github.com/ray-project/ray/issues/25484, in which we noted that there were no tests covering this behavior.

This PR adds integration tests for the above behavior. If we ultimately decide to abandon the current inheritance behavior and instead have child runtime envs completely overwrite the parent runtime env, this test will fail, reminding us to do the following:

- Update the internal runtime_env usage in Ray Tune to use the `ray.get_runtime_context().runtime_env.update` API
- Update the documentation for Ray Jobs telling users to use `ray.get_runtime_context().runtime_env.update` and update this test
2022-06-08 13:54:06 -07:00
mwtian
1ce0ab7b7c
[Core] Export additional metrics for workers and Raylet memory (#25418)
Add visibility into the following to help Ray users and developers debug performance and OOM issues:

    Raylet memory usage broken down by USS vs remaining RSS.
    Total workers' count, CPU percentage usage, and memory usage.
2022-06-06 10:58:14 -07:00
SangBin Cho
00e3fd75f3
[State Observability] Ray log alpha API (#24964)
This is the PR to implement ray log to the server side. The PR is continued from #24068.

The PR supports two endpoints;

/api/v0/logs # list logs of the node id filtered by the given glob. 
/api/v0/logs/{[file | stream]}?filename&pid&actor_id&task_id&interval&lines # Stream the requested file log. The filename can be inferred by pid/actor_id/task_id
Some tests need to be re-written, I will do it soon.

As a follow-up after this PR, there will be 2 PRs.

PR to add actual CLI
PR to remove in-memory cached logs and do on-demand query for actor/worker logs
2022-06-04 05:10:23 -07:00
SangBin Cho
54496d7705
[State Observability API] Support Filtering (#25281)
This PR adds a filtering support. The filtering is done from the API server side (not from the source side). Source side filtering is a bit complicated to write an elegant solution, and we will handle it in the future (no optimization for alpha APIs).

We will also support limited types of columns for each API.

The API is as follows

ray list [resources] -- filter [key] [value] => filter data that's key==value. 
In the future, we can also support more complicated filtering like !=, And, Or , or etc.
2022-06-03 17:17:30 -07:00
shrekris-anyscale
16bdfe6a39
Restore "[Serve] Deploy Serve deployment graphs via REST API" (#25073) (#25333) 2022-06-02 11:06:53 -07:00
SangBin Cho
cb151d5ad6
Revert "Revert "use an agent-id rather than the process PID (#24968)"… (#25376) 2022-06-01 16:28:48 -07:00
Simon Mo
61099faa58
[CI] Fix dashboard tests broken due to dep version upgrade (#25357) 2022-06-01 12:14:49 -07:00
Eric Liang
905258dbc1
Clean up docstyle in python modules and add LINT rule (#25272) 2022-06-01 11:27:54 -07:00
Eric Liang
517f78e2b8
[minor] Add a job submission hook by env var (#25343) 2022-06-01 11:15:43 -07:00
SangBin Cho
3385d19cbb
Revert "use an agent-id rather than the process PID (#24968)" (#25342)
This reverts commit 02f220b755.

<!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. -->

## Why are these changes needed?

Looks like this commit makes `test_ray_shutdown` way more flaky.  cc @mattip for further investigation after revert
<img width="760" alt="Screen Shot 2022-05-31 at 11 14 48 PM" src="https://user-images.githubusercontent.com/18510752/171339737-f48e6e90-391a-4235-bfac-a0aa0e563eb7.png">


## Related issue number

<!-- For example: "Closes #1234" -->

## Checks

- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
- [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(
2022-06-01 05:35:30 -07:00
shrekris-anyscale
7754645c83
Revert "[Serve] Deploy Serve deployment graphs via REST API (#25073)" (#25330)
This reverts commit 47709b3300.
2022-05-31 15:37:55 -07:00
shrekris-anyscale
47709b3300
[Serve] Deploy Serve deployment graphs via REST API (#25073) 2022-05-31 10:57:08 -07:00
Matti Picus
02f220b755
use an agent-id rather than the process PID (#24968)
When using ray inside a virtualenv on windows, python.exe as reported by sys.executable is a PEP397 launcher to the actual python as reported by os.getpid():

>>> import sys, os, psutil
>>> >>> print(sys.executable)
C:\temp\issue24361\Scripts\python.exe
>>> os.getpid()
2208
>>> child = psutil.Process(2208)
>>> child.cmdline()
['C:\\oss\\CPython38\\python.exe']
>>> child.parent().cmdline()
['C:\\temp\\issue24361\\Scripts\\python.exe']
>>> child.parent().pid
6424
When the agent_manager launches the agent process via Process::Process(), it gets the PID of the launcher process (6424), which is what is expected as an ID when registering the agent in the gRPC callback. But inside agent.py, the child process reports the PID via os.getpid(), which is 2208, and this is the wrong PID to register the agent.

The solution proposed here is another version of #24905 that creates a int agent_id = rand(); before starting the python process, and passes the agent_id to the process.
2022-05-26 22:10:35 -07:00
mwtian
fa32cb7c40
Revert "[core] Resubscribe GCS in python when GCS restarts. (#24887)" (#25168)
This reverts commit 7cf4233858.
2022-05-24 18:13:40 -07:00
mwtian
f79b826f31
[Dashboard] avoid showing disk info when it is unavailable (#24992) 2022-05-24 17:13:47 -07:00
Philipp Moritz
323605d169
Support file:// for runtime_env working directories in jobs (#25062)
This makes it possible to use an NFS file system that is shared on a cluster for runtime_env working directories.

Co-authored-by: shrekris-anyscale <92341594+shrekris-anyscale@users.noreply.github.com>
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2022-05-24 16:17:18 -07:00
shrekris-anyscale
8b3451318c
[Serve] Update Serve status formatting and processing (#24839) 2022-05-24 11:07:41 -07:00
Edward Oakes
65d21b7ae6
[job submission] Handle env_vars: None case properly in supervisor runtime_env logic (#25087) 2022-05-24 11:01:19 -05:00
SangBin Cho
a7e759317b
[State Observability API] Error handling (#24413)
This improves error handling per https://docs.google.com/document/d/1IeEsJOiurg-zctOcBjY-tQVbsCmURFSnUCTkx_4a7Cw/edit#heading=h.pdzl9cil9e8z (the RPC part).

Semantics
If all queries to the source failed, raise a RayStateApiException.

If partial queries are failed, warnings.warn the partial failure when print_api_stats=True. It is true for CLI. It is false when it is used within Python API or json / yaml format is required.
2022-05-24 03:56:49 -07:00
Yi Cheng
7cf4233858
[core] Resubscribe GCS in python when GCS restarts. (#24887)
This is a follow-up PRs of https://github.com/ray-project/ray/pull/24813 and https://github.com/ray-project/ray/pull/24628

Unlike the change in cpp layer, where the resubscription is done by GCS broadcast a request to raylet/core_worker and the client-side do the resubscription, in the python layer, we detect the failure in the client-side.

In case of a failure, the protocol is:

1. call subscribe
2. if timeout when doing resubscribe, throw an exception and this will crash the system. This is ok because when GCS has been down for a time longer than expected, we expect the ray cluster to be down.
3. continue to poll once subscribe ok.

However, there is an extreme case where things might be broken: the client might miss detecting a failure.

This could happen if the long-polling has been returned and the python layer is doing its own work. And before it sends another long-polling, GCS restarts and recovered. 

Here we are not going to take care of this case because:
1. usually GCS is going to take several seconds to be up and the python layer's work is simply pushing data into a queue (sync version). For the async version, it's only used in Dashboard which is not a critical component.
2. pubsub in python layer is not doing critical work: it handles logs/errors for ray job;
3. for the dashboard, it can just restart to fix the issue.


A known issue here is that we might miss logs in case of GCS failure due to the following reasons:

- py's pubsub is only doing best effort publishing. If it failed too many times, it'll skip publishing the message (lose messages from producer side)
- if message is pushed to GCS, but the worker hasn't done resubscription yet, the pushed message will be lost (lose messages from consumer side)

We think it's reasonable and valid behavior given that the logs are not defined to be a critical component and we'd like to simplify the design of pubsub in GCS.

Another things is `run_functions_on_all_workers`. We'll plan to stop using it within ray core and deprecate it in the longer term. But it won't cause a problem for the current cases because:

1. It's only set in driver and we don't support creating a new driver when GCS is down.
2. When GCS is down, we don't support starting new ray workers.

And `run_functions_on_all_workers` is only used when we initialize driver/workers.
2022-05-23 13:06:33 -07:00
Archit Kulkarni
a67c8a0739
[runtime_env] Add temporary URI reference to prevent URI deletion before job starts (#24719)
Packages are uploaded to the GCS for `runtime_env`.  These packages are garbage collected when their refcount becomes zero.

The problem is the reference doesn't get incremented until the job starts, which happens after the package is uploaded.  It's possible for the package's refcount to go to zero in between the upload and when the job starts, causing the package to be deleted before it's needed by the job.  It's likely the cause of https://github.com/ray-project/ray/issues/23423.

We can't just increment the refcount at the time of upload, because if the script is killed before the job is started (e.g. via Ctrl-C) then the reference will never be decremented and the package will never be deleted.

The solution in this PR is to increment the refcount at the time of upload, but automatically decrement after a configurable timeout (default 30s).  This should be enough time for the job to start.  When the job starts, it increments the refcount as usual and decrements it when the job finishes or is killed.

Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
2022-05-23 10:25:04 -05:00
SangBin Cho
ec653e3196
[Nightly test] Move two line downloads to one line. (#25061)
It fixes the mysterious error when all cluster env build is failing when pip uninstall / pip install is written in 2 lines. The root cause will be fixed later
2022-05-22 00:07:03 -07:00
Edward Oakes
cb7bcbd651
[job submission] Fix address defaulting behavior (#24970)
Per the discussion in https://github.com/ray-project/ray/issues/24858:

- If an address without a port is provided, don't append a port.
- Default to `http://localhost:8265` if nothing is provided.
2022-05-20 14:10:36 -05:00
SangBin Cho
b9c30529d8
[Core/Observability 1/N] Add a "running" state to task status (#24651)
This PR adds 2 more states into TaskStatus

enum TaskStatus {
  // The task is scheduled properly and waiting for execution.
  // It includes time to deliver the task to the remote worker + queueing time
  // from the execution side.
  WAITING_FOR_EXECUTION = 5;
  // The task that is running.
  RUNNING = 6;
}
2022-05-16 05:39:05 -07:00
Jiajun Yao
628f886af4
Don't show usage stats prompt in dashboard if prompt is disabled (#24700) 2022-05-12 07:55:28 -07:00
Qing Wang
259661042c
[runtime env] [java] Support jars in runtime env for Java (#24170)
This PR supports setting the jars for an actor in Ray API. The API looks like:
```java
class A {
    public boolean findClass(String className) {
      try {
        Class.forName(className);
      } catch (ClassNotFoundException e) {
        return false;
      }
      return true;
    }
}

RuntimeEnv runtimeEnv = new RuntimeEnv.Builder()
    .addJars(ImmutableList.of("https://github.com/ray-project/test_packages/raw/main/raw_resources/java-1.0-SNAPSHOT.jar"))
    .build();
ActorHandle<A> actor1 = Ray.actor(A::new).setRuntimeEnv(runtimeEnv).remote();
boolean ret = actor1.task(A::findClass, "io.testpackages.Foo").remote().get();
System.out.println(ret); // true
```
2022-05-12 09:34:40 +08:00
Jiajun Yao
1daad65568
[Doc] Add doc for usage stats collection (#24522) 2022-05-10 17:18:49 -07:00
Edward Oakes
4c1f27118a
[job submission] Don't set CUDA_VISIBLE_DEVICES in job driver (#24546)
Currently job drivers cannot use GPUs due to `CUDA_VISIBLE_DEVICES` being set (no resource request for job driver's supervisor actor). This is a regression from `ray submit`.

This is a temporary workaround -- in the future we should support a resource request for the job supervisor actor.
2022-05-10 11:43:04 -05:00
Kai Yang
4a999777fa
[Core] Allow accepting gRPC HTTP proxy via env variable (#23526) 2022-05-10 11:30:46 +08:00
Dmitri Gekhtman
6d09244a7e
[Dashboard][K8s] Add toggle to enable showing node disk usage on K8s (#24416)
https://github.com/ray-project/ray/pull/14676 disabled the disk usage/total display for Ray nodes on K8s, because Ray nodes on K8s are run as pods, which in general do not use up the entire machine.

However, in some situations, it is useful to run one Ray pod per K8s node and report the disk usage.

This PR adds a flag to enable displaying disk usage in those situations.
2022-05-03 10:58:05 -05:00
SangBin Cho
2bce07d4ce
[State API] List runtime env API (#24126)
This PR supports list runtime env API
2022-05-02 14:01:00 -07:00
Sihan Wang
59debac670
[Serve] Move deployment clean up under serve.run() api (#24306)
On the ServeHead level, it is talking to serve api and controller to do deployment and clean up now. With this pr, it hides the  deployment clean up logic into server.run() for code cleanness and easy to refactor in the future.
2022-05-02 12:10:11 -05:00
SangBin Cho
6f192b6e17
[Metrics] Allow to completely disable metrics collection (#24333)
This PR allows for Ray to disable metrics collection. It was possible with RAY_enable_metrics_collection, but it didn't fully disable collection because there was a metrics collection happening from agent that wasn't properly disabled. This PR also adds tests.
2022-05-02 05:33:03 -07:00
Philipp Moritz
27917f570d
[runtime_env] Extend runtime_env hook to also cover jobs (#24328)
This extends https://github.com/ray-project/ray/pull/24036 to also cover job submission.

Co-authored-by: Eric Liang <ekhliang@gmail.com>
2022-04-30 09:15:51 -07:00
Archit Kulkarni
1b67e6a8ae
[Jobs] [Dashboard] Add job submission id as field to job snapshot (#24303)
Closes https://github.com/ray-project/ray/issues/24300

Adds a field to the job submission snapshot that matches the job name in the existing snapshot.  Before this PR, the job submission name was camelcased because all snapshot keys are automatically camelcased.  This PR allows jobs from the old job field to be linked to ones in the new job submission snapshot.
2022-04-29 10:10:24 -05:00
Jiajun Yao
8fdde12e9e
Delay 1 minutes for the first usage stats report (#24291)
Delay the first report for 1 minutes so the system is probably set up and we can get the information to report.
2022-04-28 22:53:33 -07:00
Archit Kulkarni
cc864401fb
[Dashboard] Add environment variable flag to skip dashboard log processing (#24263) 2022-04-27 15:33:08 -07:00
Archit Kulkarni
12b9383d52
[Jobs] Reenable test_backwards_compatibility using Ray 1.12 (#24124)
Closes https://github.com/ray-project/ray/issues/23258
2022-04-26 13:53:51 -05:00
Archit Kulkarni
27e7c284ee
[Jobs] Change jobs start_time end_time from seconds to ms for consistency (#24123)
In the snapshot, all timestamps are given in ms except for Jobs:

```
wget -q -O - http://127.0.0.1:8265/api/snapshot

{
   "result":true,
   "msg":"hello",
   "data":{
      "snapshot":{
         "jobs":{
            "01000000":{
               "status":null,
               "statusMessage":null,
               "isDead":false,
               "startTime":1650315791249,
               "endTime":0,
               "config":{
                  "namespace":"_ray_internal_dashboard",
                  "metadata":{
                     
                  },
                  "runtimeEnv":{
                     
                  }
               }
            }
         },
         "jobSubmission":{
            "raysubmit9Bsej1Rtxqqetxup":{
               "status":"SUCCEEDED",
               "message":"Job finished successfully.",
               "errorType":null,
               "startTime":1650315925,
               "endTime":1650315926,
               "metadata":{
                  "creatorId":"usr_f6tgCaaFBJC6tZz1ZVzzAVf4"
               },
               "runtimeEnv":{
                  "workingDir":"gcs://_ray_pkg_6068c19fb3b8530f.zip"
               },
               "entrypoint":"ls"
            },
            "raysubmitEibragqkyg16Hpcj":{
               "status":"SUCCEEDED",
               "message":"Job finished successfully.",
               "errorType":null,
               "startTime":1650316039,
               "endTime":1650316041,
               "metadata":{
                  "creatorId":"usr_f6tgCaaFBJC6tZz1ZVzzAVf4"
               },
               "runtimeEnv":{
                  "workingDir":"gcs://_ray_pkg_6068c19fb3b8530f.zip"
               },
               "entrypoint":"echo hi"
            },
            "raysubmitSh1U7Grdsbqrf6Je":{
               "status":"SUCCEEDED",
               "message":"Job finished successfully.",
               "errorType":null,
               "startTime":1650316354,
               "endTime":1650316355,
               "metadata":{
                  "creatorId":"usr_f6tgCaaFBJC6tZz1ZVzzAVf4"
               },
               "runtimeEnv":{
                  "workingDir":"gcs://_ray_pkg_6068c19fb3b8530f.zip"
               },
               "entrypoint":"echo hi"
            }
         },
         "actors":{
            "8c8e28e642ba2cfd0457d45e01000000":{
               "jobId":"01000000",
               "state":"DEAD",
               "name":"_ray_internal_job_actor_raysubmit_9BSeJ1rTXQqEtXuP",
               "namespace":"_ray_internal_dashboard",
               "runtimeEnv":{
                  "uris":{
                     "workingDirUri":"gcs://_ray_pkg_6068c19fb3b8530f.zip"
                  },
                  "workingDir":"gcs://_ray_pkg_6068c19fb3b8530f.zip"
               },
               "startTime":1650315926620,
               "endTime":1650315927499,
               "isDetached":true,
               "resources":{
                  "node:172.31.73.39":0.001
               },
               "actorClass":"JobSupervisor",
               "currentWorkerId":"9628b5eb54e98353601413845fbca0a8c4e5379d1469ce95f3dfbace",
               "currentRayletId":"61ab3958258c82266b222f4691a53e71b6315e312408a21cb3350bc7",
               "ipAddress":"172.31.73.39",
               "port":10003,
               "metadata":{
                  
               }
            },
            "a7fd8354567129910c44298401000000":{
               "jobId":"01000000",
               "state":"DEAD",
               "name":"_ray_internal_job_actor_raysubmit_sh1u7grDsBQRf6je",
               "namespace":"_ray_internal_dashboard",
               "runtimeEnv":{
                  "uris":{
                     "workingDirUri":"gcs://_ray_pkg_6068c19fb3b8530f.zip"
                  },
                  "workingDir":"gcs://_ray_pkg_6068c19fb3b8530f.zip"
               },
               "startTime":1650316355718,
               "endTime":1650316356620,
               "isDetached":true,
               "resources":{
                  "node:172.31.73.39":0.001
               },
               "actorClass":"JobSupervisor",
               "currentWorkerId":"f07fd7a393898bf7d9027a5de0b0f566bb64ae80c0fcbcc107185505",
               "currentRayletId":"61ab3958258c82266b222f4691a53e71b6315e312408a21cb3350bc7",
               "ipAddress":"172.31.73.39",
               "port":10005,
               "metadata":{
                  
               }
            },
            "19ca9ad190f47bae963592d601000000":{
               "jobId":"01000000",
               "state":"DEAD",
               "name":"_ray_internal_job_actor_raysubmit_eibRAGqKyG16HpCj",
               "namespace":"_ray_internal_dashboard",
               "runtimeEnv":{
                  "uris":{
                     "workingDirUri":"gcs://_ray_pkg_6068c19fb3b8530f.zip"
                  },
                  "workingDir":"gcs://_ray_pkg_6068c19fb3b8530f.zip"
               },
               "startTime":1650316041089,
               "endTime":1650316041978,
               "isDetached":true,
               "resources":{
                  "node:172.31.73.39":0.001
               },
               "actorClass":"JobSupervisor",
               "currentWorkerId":"50b8e7e9a6981fe0270afd7f6387bc93788356822c9a664c2988f5ba",
               "currentRayletId":"61ab3958258c82266b222f4691a53e71b6315e312408a21cb3350bc7",
               "ipAddress":"172.31.73.39",
               "port":10004,
               "metadata":{
                  
               }
            }
         },
         "deployments":{
            
         },
         "sessionName":"session_2022-04-18_13-49-44_814862_139",
         "rayVersion":"1.12.0",
         "rayCommit":"f18fc31c7562990955556899090f8e8656b48d2d"
      }
   }
}
```

 This PR fixes the inconsistency by changing Jobs start/end timestamps to ms.
2022-04-26 08:37:41 -07:00
Jiajun Yao
3fb63847e2
Show usage stats prompt (#23822)
Show usage stats prompt when it's enabled.

Current UX are:

* The usage stats enabled or disabled message is shown every time in both terminal and dashboard.
* If users don't explicitly enable or disable usage stats, the first time they start a ray cluster interactively, they will be asked to confirm and will enable if no user action within 10s. If it's non-interactive, collection is enabled by default without confirmation.
* ray.init() doesn't collect usage stats
* Usage stats can be disabled via three approaches: 1. RAY_USAGE_STATS_ENABLED env var, 2. ray xxx --disable-usage-stats, 3. ray disable-usage-stats
2022-04-25 16:01:24 -07:00
SangBin Cho
73ed67e9e6
[State API] State api limit + Removing unnecessary modules (#24098)
This PR does

Move all routes into the same module, state_head.py
Support a limit feature.
2022-04-22 15:59:46 -07:00
SangBin Cho
30ab5458a7
[State Observability] Tasks and Objects API (#23912)
This PR implements ray list tasks and ray list objects APIs.

NOTE: You can ignore the merge conflict for now. It is because the first PR was reverted. There's a fix PR open now.
2022-04-21 18:45:03 -07:00
shrekris-anyscale
b51d0aa8b1
[serve] Introduce context.py and client.py (#24067)
Serve stores context state, including the `_INTERNAL_REPLICA_CONTEXT` and the `_global_client` in `api.py`. However, these data structures are referenced throughout the codebase, causing circular dependencies. This change introduces two new files:

* `context.py`
    * Intended to expose process-wide state to internal Serve code as well as `api.py`
    * Stores the `_INTERNAL_REPLICA_CONTEXT` and the `_global_client` global variables
* `client.py`
    * Stores the definition for the Serve `Client` object, now called the `ServeControllerClient`
2022-04-21 18:35:09 -05:00
jon-chuang
ddcc252b51
[Core] Ray logs API (1/n) (#23435)
Expose HTTP endpoint to retrieve logs from ray cluster
2022-04-20 23:11:02 -07:00
Chu Xiangyang
6f74040b15
[Job] Fix typo in job sdk docstring (#23940) 2022-04-20 12:30:32 -05:00
SangBin Cho
082baa2342
[Test] Fix test_log (#24004)
The test verifies the first line 43~51 bytes are "dashboard"

But due to recent code addition to head.py, the line where logs are written became 2 digits -> 3 digits

Previously,
2022-04-18 23:23:56,946	INFO head.py:[less than 100] -- Dashboard head grpc address: 127.0.0.1:57208
 
Now
2022-04-18 23:23:56,946	INFO head.py:101 -- Dashboard head grpc address: 127.0.0.1:57208
So we should increase the bytes range.
2022-04-19 04:59:30 -07:00