-
Notifications
You must be signed in to change notification settings - Fork 72
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'main' of https://github.com/rhymes-ai/Aria
- Loading branch information
Showing
22 changed files
with
3,292 additions
and
29 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,3 @@ | ||
[settings] | ||
profile=black | ||
profile=black | ||
skip=datasets |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,17 +1,17 @@ | ||
***This document provides examples to fine-tune Aria on three different datasets: single-image data, multi-image data and video data.*** | ||
|
||
# Single-Image SFT | ||
# Fine-tune on single-image dataset | ||
We use a 30k subset of the [RefCOCO dataset](https://arxiv.org/pdf/1608.00272) as an example. | ||
RefCOCO is a visual grounding task. Given an image and a description of the reference object as input, the model is expected to output corresponding bounding box. For a given bounding box, we normalize its coordinates to `[0,1000)` and transform it into "(x1,y1), (x2,y2)". Please refer to [RefCOCO_Example](./refcoco/README.md) for more details! | ||
|
||
|
||
|
||
# Multi-Image SFT | ||
# Fine-tune on multi-image dataset | ||
We use the [NLVR2 dataset](https://arxiv.org/abs/1811.00491) as an example. | ||
NLVR2 (Natural Language for Visual Reasoning) is a task where given two images, the model needs to determine whether a claim is true by answering yes or no. Please refer to [NLVR2_Example](./nlvr2/README.md) for details! | ||
|
||
|
||
# Video SFT | ||
# Fine-tune on video dataset | ||
We use the [NextQA dataset](https://arxiv.org/abs/2105.08276) as an example. | ||
NextQA requires the model to select an answer from several options according to the video input and question. The model is expected to output the correct option's character. Please refer to [NextQA_Example](./nextqa/README.md) for details! | ||
|
580 changes: 580 additions & 0 deletions
580
inference/notebooks/01_single_image_understanding.ipynb
Large diffs are not rendered by default.
Oops, something went wrong.
Oops, something went wrong.