Add object detection pipeline #243

RUFFY-369 · 2024-10-28T12:28:41Z

What does this PR do?

This PR works on adding a new object-detection pipeline for a real real-time 😉 detection and tracking with reduced latency and high accuracy, tested locally with the PekingU/rtdetr_r50vd model. After building the Docker image, the setup has been tested by running it locally on a Uvicorn server.

Expected Results:

Original video:

Expected merged pipeline output:

Bonus expected merged pipeline output for a high speed chase 😉 🥂 :

cc @rickstaa

…ction

ad-astra-video · 2024-11-26T23:03:05Z

I guess my initial comments did not post here! Not sure what happened, sorry about that.

Overall this PR looks good but would like to adjust some to provide more data back that may assist with analysis of the objects detected and speed up the processing.

The conversion of frames to base64 encoded strings takes a long time. The timing results from test video posted in go-livepeer ObjectDetection PR are below. I think it would be better to return a base64 encoded video file from the ai-runner that will be converted to binary url download at the ai-worker (likely on same machine so bandwidth delays should be negligible) when the ai-worker posts the results back to the Orchestrator.

2024-11-26 21:58:04,036 - app.routes.object_detection - INFO - Decoded video in 3.14 seconds
2024-11-26 21:58:18,198 - app.routes.object_detection - INFO - Detections processed in 14.16 seconds
2024-11-26 22:00:07,278 - app.routes.object_detection - INFO - Annotated frames converted to data URLs in 109.08 seconds, frame count: 266
2024-11-26 22:00:08,900 INFO:     172.17.0.1:52420 - "POST /object-detection HTTP/1.1" 200 OK

Please add parameter to request to make returning the video encoded with the annotated frames optional. Annotating the frames and re-encoding takes the most time so I think should be an optional thing to return with the default being not to return the annotated frames. Naming can be something like return_annotated_video and accept true/false as values.
Add detection boxes to the data returned and the frame pts (I think is available in frame.pts when decoding).
Why was the confidence scores and detection labels returned in separate lists instead of list of detections like example below? I am not sure what value this provides other than just counting the objects detected in the segment. EDIT: this is kind of vague, apologies, was curious on why separate lists were chosen instead of one list and the second sentence was driving at without the time information on the detections it is not possible to know where the detections happened in the segment that can be 2-30+ seconds long.

"detections": [
     {
        "label": "ball",
        "confidence": .95
     },
     {
        "label": "person",
        "confidence": .98
     }
     ...
]

I had some test videos fail which I think may be PyAV related. I will test again tomorrow morning and add notes in separate comment. Found in testing adding , mode="r" to av.open fixed my test videos.

RUFFY-369 · 2024-11-30T16:50:38Z

I guess my initial comments did not post here! Not sure what happened, sorry about that.

Overall this PR looks good but would like to adjust some to provide more data back that may assist with analysis of the objects detected and speed up the processing.

The conversion of frames to base64 encoded strings takes a long time. The timing results from test video posted in go-livepeer ObjectDetection PR are below. I think it would be better to return a base64 encoded video file from the ai-runner that will be converted to binary url download at the ai-worker (likely on same machine so bandwidth delays should be negligible) when the ai-worker posts the results back to the Orchestrator.
2024-11-26 21:58:04,036 - app.routes.object_detection - INFO - Decoded video in 3.14 seconds
2024-11-26 21:58:18,198 - app.routes.object_detection - INFO - Detections processed in 14.16 seconds
2024-11-26 22:00:07,278 - app.routes.object_detection - INFO - Annotated frames converted to data URLs in 109.08 seconds, frame count: 266
2024-11-26 22:00:08,900 INFO:     172.17.0.1:52420 - "POST /object-detection HTTP/1.1" 200 OK
Please add parameter to request to make returning the video encoded with the annotated frames optional. Annotating the frames and re-encoding takes the most time so I think should be an optional thing to return with the default being not to return the annotated frames. Naming can be something like return_annotated_video and accept true/false as values.

Add detection boxes to the data returned and the frame pts (I think is available in frame.pts when decoding).

For poins 1-3, I have pushed the commits

Why was the confidence scores and detection labels returned in separate lists instead of list of detections like example below? I am not sure what value this provides other than just counting the objects detected in the segment. EDIT: this is kind of vague, apologies, was curious on why separate lists were chosen instead of one list and the second sentence was driving at without the time information on the detections it is not possible to know where the detections happened in the segment that can be 2-30+ seconds long.
"detections": [
     {
        "label": "ball",
        "confidence": .95
     },
     {
        "label": "person",
        "confidence": .98
     }
     ...
]

Regarding point no.4, you are correct that it will be difficult to associate individual detections with their counterpart labels and BBs but I think for giving them out as individual lists it may become simpler for users to do batch processing. It may also reduce the overhead in JSON payload due to its structure and also reduce the memory compensation due to data structure being simple lists instead of KV pairs.
What do you think @ad-astra-video

I had some test videos fail which I think may be PyAV related. I will test again tomorrow morning and add notes in separate comment. Found in testing adding , mode="r" to av.open fixed my test videos.

I added it in this code as well in the latest commit to avoid possible failure.

RUFFY-369 added 6 commits October 27, 2024 19:45

feat:initial implementation of object-detection pipeline

3905b84

style:ruff format

9f1268d

Merge remote-tracking branch 'upstream/main' into feature/video-to-text

a760e05

fix:bug in pipeline inference

fe8caa0

chore:regenerate openapi specification

5e15705

fix:resolve merge conflicts

5131e2a

RUFFY-369 requested a review from rickstaa as a code owner October 28, 2024 12:28

RUFFY-369 marked this pull request as draft October 28, 2024 12:30

RUFFY-369 added 3 commits October 29, 2024 21:26

chore:make codegen

1c0b66e

fix:resolve merge conflicts

5097806

chore:make codegen

b8f6c44

RUFFY-369 marked this pull request as ready for review October 29, 2024 16:02

RUFFY-369 added 4 commits October 31, 2024 02:16

fix:internal server error

9ee510e

Merge remote-tracking branch 'upstream/main' into feature/object-dete…

38d3dc3

…ction

fix:resolve merge conflicts

bc331ad

chore:make codegen

63328bc

RUFFY-369 mentioned this pull request Nov 1, 2024

[AI] Add support for Object Detection pipeline livepeer/go-livepeer#3228

Open

5 tasks

RUFFY-369 added 8 commits November 2, 2024 19:49

fix:bug in go-livepeer local build

7ec7bb0

chore:fix merge conflicts

bbbed92

chore:make codegen

60e5d35

chore:fix merge conflicts

f4439dc

chore:make codegen

444c32b

chore:suggested update in object detection runner

0b48da6

chore:fix merge conflicts

927eaaf

chore:make codegen

5785333

RUFFY-369 added 4 commits November 30, 2024 16:07

chore:return a base64 encoded video file instead of url for each frames

cceb64f

chore:add new parameter for optionality of annotated video frames

fc423ee

chore:add detection boxes to ObjecDetectionResponse

dc61b5f

chore:add frames pts to ObjectDetectionResponse

41d658a

fix:suggested solution for possible test videos failure

fcd3125

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add object detection pipeline #243

Add object detection pipeline #243

RUFFY-369 commented Oct 28, 2024 •

edited

Loading

ad-astra-video commented Nov 26, 2024 •

edited

Loading

RUFFY-369 commented Nov 30, 2024

Add object detection pipeline #243

Are you sure you want to change the base?

Add object detection pipeline #243

Conversation

RUFFY-369 commented Oct 28, 2024 • edited Loading

What does this PR do?

Expected Results:

ad-astra-video commented Nov 26, 2024 • edited Loading

RUFFY-369 commented Nov 30, 2024

RUFFY-369 commented Oct 28, 2024 •

edited

Loading

ad-astra-video commented Nov 26, 2024 •

edited

Loading