Skip to content

ACM Multimedia2020 University-1652: A Multi-view Multi-source Benchmark for Drone-based Geo-localization 🚁 annotates 1652 buildings in 72 universities around the world.

License

Notifications You must be signed in to change notification settings

layumi/University1652-Baseline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

University1652-Baseline

Python 3.6+ License: MIT

VideoDemo

[Paper] [Slide] [Explore Drone-view Data] [Explore Satellite-view Data] [Explore Street-view Data] [Video Sample] [中文介绍] [Building Name List] [Latitude and Longitude] [Flight Path]

Download [University-1652] upon request (Usually I will reply you in 5 minutes). You may use the request template.

This repository contains the dataset link and the code for our paper University-1652: A Multi-view Multi-source Benchmark for Drone-based Geo-localization, ACM Multimedia 2020. The offical paper link is at https://dl.acm.org/doi/10.1145/3394171.3413896. We collect 1652 buildings of 72 universities around the world. Thank you for your kindly attention.

Task 1: Drone-view target localization. (Drone -> Satellite) Given one drone-view image or video, the task aims to find the most similar satellite-view image to localize the target building in the satellite view.

Task 2: Drone navigation. (Satellite -> Drone) Given one satellite-view image, the drone intends to find the most relevant place (drone-view images) that it has passed by. According to its flight history, the drone could be navigated back to the target place.

1. ACM MM UAVM Workshop

23 Apr 2024 We will hold the 2nd workshop on ACM MM 2024! Please see https://www.zdzheng.xyz/ACMMM2024Workshop-UAV/ for reference.

2. ACM ICMR Workshop

12 Jan 2024 We are holding a workshop at ACM ICMR 2024 on Multimedia Object Re-ID. You are welcome to show your insights. See you at Phuket, Thailand!😃 The workshop link is https://www.zdzheng.xyz/MORE2024/ . Submission DDL is 15 April 2024.

2023 Workshop and Sepcial Session

1. IEEE ITSC Special Session

We host a special session on IEEE Intelligent Transportation Systems Conference (ITSC), covering the object re-identification & point cloud topic. The paper ddl is by May 15, 2023 and the paper notification is at June 30, 2023. Please select the session code ``w7r4a'' during submission. More details can be found at Special Session Website.

2. Remote Sensing Special Issue

We raise a special issue on Remote Sensing (IF=5.3) from now to 16 June 2023 16 Dec 2023. You are welcomed to submit your manuscript at (https://www.mdpi.com/journal/remotesensing/special_issues/EMPK490239), but you need to keep open-source fee in mind.

3. ACM Multimedia Workshop

We are holding the workshop at ACM Multimedia 2023 on Aerial-view Imaging. Call for papers 中文介绍

4. Coda Lab Challenge

We also provide a challenging cross-view geo-localization dataset, called University160k, and the workshop audience may consider to participate the competition. The motivation is to simulate the real- world geo-localization scenario that we usually face an extremely large satellite-view pool. In particular, University160k extends the current University-1652 dataset with extra 167,486 satellite- view gallery distractors. We have release University160k on the challenge page, and made a public leader board. (More details are at https://codalab.lisn.upsaclay.fr/competitions/12672)

Table of contents

About Dataset

The dataset split is as follows:

Split #imgs #buildings #universities
Training 50,218 701 33
Query_drone 37,855 701 39
Query_satellite 701 701 39
Query_ground 2,579 701 39
Gallery_drone 51,355 951 39
Gallery_satellite 951 951 39
Gallery_ground 2,921 793 39

More detailed file structure:

├── University-1652/
│   ├── readme.txt
│   ├── train/
│       ├── drone/                   /* drone-view training images 
│           ├── 0001
|           ├── 0002
|           ...
│       ├── street/                  /* street-view training images 
│       ├── satellite/               /* satellite-view training images       
│       ├── google/                  /* noisy street-view training images (collected from Google Image)
│   ├── test/
│       ├── query_drone/  
│       ├── gallery_drone/  
│       ├── query_street/  
│       ├── gallery_street/ 
│       ├── query_satellite/  
│       ├── gallery_satellite/ 
│       ├── 4K_drone/

We note that there are no overlaps between 33 univeristies of training set and 39 univeristies of test set.

News

26 Nov 2024 Drone to BEV? You may check our new paper "Video2BEV: Transforming Drone Videos to BEVs for Video-based Geo-localization" at https://arxiv.org/abs/2411.13610.

2 Jul 2024 Text-guided Geo-localization is accepted by ECCV 2024. Code is available.

26 Jan 2023 1652 Building Name List is at Here.

10 Jul 2022 Rainy?Night?Foggy? Snow? You may check our new paper "Multiple-environment Self-adaptive Network for Aerial-view Geo-localization" at https://github.com/wtyhub/MuseNet (accepted by Pattern Recognition'24)

1 Dec 2021 Fix the issue due to the latest torchvision, which do not allow the empty subfolder. Note that some buildings do not have google images.

3 March 2021 GeM Pooling is added. You may use it by --pool gem.

21 January 2021 The GPU-Re-Ranking, a GNN-based real-time post-processing code, is at Here.

21 August 2020 The transfer learning code for Oxford and Paris is at Here.

27 July 2020 The meta data of 1652 buildings, such as latitude and longitude, are now available at Google Driver. (You could use Google Earth Pro to open the kml file or use vim to check the value).
We also provide the spiral flight tour file at Google Driver. (You could open the kml file via Google Earth Pro to enable the flight camera).

26 July 2020 The paper is accepted by ACM Multimedia 2020.

12 July 2020 I made the baseline of triplet loss (with soft margin) on University-1652 public available at Here.

12 March 2020 I add the state-of-the-art page for geo-localization and tutorial, which will be updated soon.

Code Features

Now we have supported:

  • Float16 to save GPU memory based on apex
  • Multiple Query Evaluation
  • Re-Ranking
  • Random Erasing
  • ResNet/VGG-16
  • Visualize Training Curves
  • Visualize Ranking Result
  • Linear Warm-up

Prerequisites

  • Python 3.6+
  • GPU Memory >= 8G
  • Numpy > 1.12.1
  • Pytorch 0.3+
  • [Optional] apex (for float16)

Getting started

Installation

pip install -r requirement.txt
  • [Optinal] You may skip it. Install apex from the source
git clone https://github.com/NVIDIA/apex.git
cd apex
python setup.py install --cuda_ext --cpp_ext
  • [Optinal] Usually it comes with pytorch. Install Torchvision from the source (Please check the README. Or directly install by anaconda. It will be Okay.)
git clone https://github.com/pytorch/vision # Please check the version to match Pytorch.
cd vision
python setup.py install

Dataset & Preparation

Download [University-1652] upon request. You may use the request template.

Or download CVUSA / CVACT.

For CVUSA, I follow the training/test split in (https://github.com/Liumouliu/OriCNN).

Train & Evaluation

Train & Evaluation University-1652

python train.py --name three_view_long_share_d0.75_256_s1_google  --extra --views 3  --droprate 0.75  --share  --stride 1 --h 256  --w 256 --fp16; 
python test.py --name three_view_long_share_d0.75_256_s1_google

Default setting: Drone -> Satellite If you want to try other evaluation setting, you may change these lines at: https://github.com/layumi/University1652-Baseline/blob/master/test.py#L217-L225

Ablation Study only Satellite & Drone

python train_no_street.py --name two_view_long_no_street_share_d0.75_256_s1  --share --views 3  --droprate 0.75  --stride 1 --h 256  --w 256  --fp16; 
python test.py --name two_view_long_no_street_share_d0.75_256_s1

Set three views but set the weight of loss on street images to zero.

Train & Evaluation CVUSA

python prepare_cvusa.py
python train_cvusa.py --name usa_vgg_noshare_warm5_lr2 --warm 5 --lr 0.02 --use_vgg16 --h 256 --w 256  --fp16 --batchsize 16;
python test_cvusa.py  --name usa_vgg_noshare_warm5_lr2 

Show the retrieved Top-10 result

python test.py --name three_view_long_share_d0.75_256_s1_google # after test
python demo.py --query_index 0 # which image you want to query in the query set 

It will save an image named `show.png' containig top-10 retrieval results in the folder.

Trained Model

You could download the trained model at GoogleDrive or OneDrive. After download, please put model folders under ./model/.

Citation

The following paper uses and reports the result of the baseline model. You may cite it in your paper.

@article{zheng2020university,
  title={University-1652: A Multi-view Multi-source Benchmark for Drone-based Geo-localization},
  author={Zheng, Zhedong and Wei, Yunchao and Yang, Yi},
  journal={ACM Multimedia},
  year={2020}
}
@inproceedings{zheng2023uavm,
  title={UAVM'23: 2023 Workshop on UAVs in Multimedia: Capturing the World from a New Perspective},
  author={Zheng, Zhedong and Shi, Yujiao and Wang, Tingyu and Liu, Jun and Fang, Jianwu and Wei, Yunchao and Chua, Tat-seng},
  booktitle={Proceedings of the 31st ACM International Conference on Multimedia},
  pages={9715--9717},
  year={2023}
}

Instance loss is defined in

@article{zheng2017dual,
  title={Dual-Path Convolutional Image-Text Embeddings with Instance Loss},
  author={Zheng, Zhedong and Zheng, Liang and Garrett, Michael and Yang, Yi and Xu, Mingliang and Shen, Yi-Dong},
  journal={ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)},
  doi={10.1145/3383184},
  volume={16},
  number={2},
  pages={1--23},
  year={2020},
  publisher={ACM New York, NY, USA}
}

Related Work

  • Instance Loss Code
  • Person re-ID from Different Viewpoints Code
  • Lending Orientation to Neural Networks for Cross-view Geo-localization Code
  • Predicting Ground-Level Scene Layout from Aerial Imagery Code