Skip to content

Code for LifelongMemory: Leveraging LLMs for Answering Queries in Long-form Egocentric Videos

License

Notifications You must be signed in to change notification settings

agentic-learning-ai-lab/lifelong-memory

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LifelongMemory: Leveraging LLMs for Answering Queries in Long-form Egocentric Videos

Abstract: We introduce LifelongMemory, a new framework for accessing long-form egocentric videographic memory through natural language question answering and retrieval. LifelongMemory generates concise video activity descriptions of the camera wearer and leverages the reasoning and contextual understanding capabilities of pretrained large language models to produce precise answers. It further improves by using a confidence and refinement module to provide confident answers. Our approach achieves state-of-the-art performance on the EgoSchema benchmark for question answering and is highly competitive on the natural language query (NLQ) challenge of Ego4D.

View Paper Website

Quick start

Captions (LaViLa on every 2s video clip + caption digest): Google drive link

Ego4D NLQ

python scripts/llm_reason.py \
    --task NLQ \
    --annotation_path <the path to the official NLQ annotation file> \
    --caption_path  <the path to the csv containing captions from the egocentric videos, which contains 4 columns: cid, vid, timestamp, caption> \
    --output_path <the path to the csv containing the responses from the LLM> \
    --openai_model <gpt model name (check https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo)> \
    --openai_key <your OpenAI API key> 

If you are using Azure

python scripts/llm_reason.py \
    --task NLQ \
    --annotation_path <the path to the official NLQ annotation file> \
    --caption_path  <the path to the csv containing captions from the egocentric videos> \
    --output_path <the path to the csv containing the responses from the LLM> \
    --azure \
    --openai_endpoint  <your OpenAI endpoint e.g. https://xxxxxxxx.openai.azure.com/> \
    --openai_model <gpt model name> \
    --openai_key <your Azure OpenAI API key>

If you are using Vicuna, check its documentation on OpenAI-Compatible RESTful APIs

python scripts/llm_reason.py \
    --task NLQ \
    --annotation_path <the path to the official NLQ annotation file> \
    --caption_path  <the path to the csv containing captions from the egocentric videos> \
    --output_path <the path to the csv containing the responses from the LLM> \
    --openai_endpoint http://localhost:8000/v1 \
    --openai_model vicuna-7b-v1.5 

EgoSchema (Video QA)

python scripts/llm_reason.py \
    --task QA
    --annotation_path <the path to the official EgoSchema question file> \
    --caption_path  <the path to the csv containing captions from the egocentric videos, which contains 4 columns: q_uid, timestamp, caption> \
    --output_path <the path to the csv containing the responses from the LLM> \
    --openai_model <gpt model name (check https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo)> \
    --openai_key <your OpenAI API key> \

About

Code for LifelongMemory: Leveraging LLMs for Answering Queries in Long-form Egocentric Videos

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages