MQA is an interactive Multi-modal Query Answering system, powered by MUST and latest LLMs. It comprises five core components: Data Preprocessing, Vector Representation, Index Construction, Query Execution, and Answer Generation, all orchestrated by a dedicated coordinator to ensure smooth data flow from input to answer generation.
We make use of the Anaconda package manager in order to avoid dependency/reproducibility problems.
-
Clone the repository
git clone https://github.com/ZJU-DAILY/MQA
-
Install Python dependencies using conda.
conda create --name mqa --file environment.yml conda activate mqa
Or install them manually.
conda create -n mqa -y python=3.11.5 conda activate mqa conda install flask=2.2.5 conda install tqdm=4.65.0 conda install -y -c pytorch pytorch=2.1.2 torchvision=0.16.2 pip install openai==1.14.0 pip install git+https://gitclone.com/github.com/openai/CLIP.git pip install pillow==10.2.0
-
Compile C++ code for indexing and searching.
cd ./indexing_and_search git clone https://github.com/ChunelFeng/CGraph.git cmake build make
-
install npm and nodejs>=18.0.
-
To make use of OpenAI's LLMs, please set up your API key first.
-
Launch the Flask server as the backend.
python app.py
-
Launch the frontend in another terminal instance.
cd ./frontend npm install npm run dev
To properly work with the MIT-States dataset, the following structure is required:
MQA_base_path
├─dataset
│ ├─base
│ ├─meta
│ ├─MitStates
│ │ └─images
│ │ ├─adj aluminum
│ │ ├─adj animal
│ │ ├─adj apple
│ │ ├─...
We provide the pre-processed data via Google Drive in case you don't have enough GPU resources or simply want to save time. Download them and move them to /dataset/base
and /dataset/meta
as shown in the directory above. These data will also be created during the use of MQA.
@manual{MQA,
author = {Mengzhao Wang and Haotian Wu and Xiangyu Ke and Yunjun Gao and Xiaoliang Xu and Lu Chen},
title = {An Interactive Multi-modal Query Answering System with Retrieval-Augmented Large Language Models},
url = {https://github.com/ZJU-DAILY/MQA},
year = {2024}
}
@inproceedings{MUST_ICDE24,
title={{MUST}: An Effective and Scalable Framework for Multimodal Search of Target Modality},
author={Mengzhao Wang and Xiangyu Ke and Xiaoliang Xu and Lu Chen and Yunjun Gao and Pinpin Huang and Runkai Zhu},
booktitle={IEEE International Conference on Data Engineering (ICDE)},
year={2024}
}
- CGraph: A cross-platform Directed Acyclic Graph framework based on pure C++ without any 3rd-party dependencies.