A pytorch implementation for the paper: [UrbanGPT: Spatio-Temporal Large Language Models]
Zhonghang Li, Lianghao Xia, Jiabin Tang, Yong Xu, Lei Shi, Long Xia, Dawei Yin, Chao Huang* (*Correspondence)
Data Intelligence Lab@University of Hong Kong, South China University of Technology, Baidu Inc
• 🌐 中文博客
This repository hosts the code, data, and model weights of UrbanGPT.
- 🚀🔥 [2024.05] 🎯🎯📢📢 Exciting News! We are thrilled to announce that our 🌟UrbanGPT🌟 has been accepted by KDD'2024! 🎉🎉🎉 Thanks to all the team members 🤗
🎯🎯📢📢 We upload the models and data used in our UrbanGPT on 🤗 Huggingface. We highly recommend referring to the table below for further details:
🤗 Huggingface Address | 🎯 Description |
---|---|
https://huggingface.co/bjdwh/UrbanGPT | It's the checkpoint of our UrbanGPT based on Vicuna-7B-v1.5-16k tuned on instruction data train-data |
https://huggingface.co/datasets/bjdwh/ST_data_urbangpt | We released a portion of the instruction dataset for evaluation. |
https://huggingface.co/datasets/bjdwh/UrbanGPT_ori_stdata | We released the original dataset used in UrbanGPT. |
- [2023.02.23] 🚀🚀 Release the code of UrbanGPT.
- [2023.02.29] Add video.
- [2023.03.05] Release the full paper.
- [2023.03.11] Upload the new checkpoint of our UrbanGPT.
- [2023.06.07] Release instruction generation codes and the original dataset used in UrbanGPT.
- Release baselines codes.
- ...
In this work, we present a spatio-temporal large language model that can exhibit exceptional generalization capabilities across a wide range of downstream urban tasks. To achieve this objective, we present the UrbanGPT, which seamlessly integrates a spatio-temporal dependency encoder with the instruction-tuning paradigm. This integration enables large language models (LLMs) to comprehend the complex inter-dependencies across time and space, facilitating more comprehensive and accurate predictions under data scarcity. Extensive experimental findings highlight the potential of building LLMs for spatio-temporal learning, particularly in zero-shot scenarios.
urbangpt_1.mp4
- 1. Code Structure
- 2. Environment
- 3. Training UrbanGPT
- 4. Evaluating UrbanGPT
- 5. Instructions Generation
1. Code Structure [Back to Top]
.
| README.md
| urbangpt_eval.sh
| urbangpt_train.sh
|
+---checkpoints
| \---st_encoder
| pretrain_stencoder.pth
|
+---playground
| | inspect_conv.py
| |
| +---test_embedding
| | README.md
| | test_classification.py
| | test_semantic_search.py
| | test_sentence_similarity.py
| |
| \---test_openai_api
| anthropic_api.py
| openai_api.py
|
+---tests
| test_openai_curl.sh
| test_openai_langchain.py
| test_openai_sdk.py
|
\---urbangpt
| constants.py
| conversation.py
| utils.py
| __init__.py
|
+---eval
| | run_urbangpt.py # evaluation
| | run_vicuna.py
| |
| \---script
| run_model_qa.yaml
|
+---model
| | apply_delta.py
| | apply_lora.py
| | builder.py
| | compression.py
| | convert_fp16.py
| | make_delta.py
| | model_adapter.py
| | model_registry.py
| | monkey_patch_non_inplace.py
| | STLlama.py # model
| | utils.py
| | __init__.py
| |
| \---st_layers
| args.py
| ST_Encoder.conf
| ST_Encoder.py # ST-Encoder
| __init__.py
|
+---protocol
| openai_api_protocol.py
|
+---serve
| | api_provider.py
| | bard_worker.py
| | cacheflow_worker.py
| | cli.py
| | controller.py
| | controller_graph.py
| | gradio_block_arena_anony.py
| | gradio_block_arena_named.py
| | gradio_css.py
| | gradio_patch.py
| | gradio_web_server.py
| | gradio_web_server_graph.py
| | gradio_web_server_multi.py
| | huggingface_api.py
| | inference.py
| | model_worker.py
| | model_worker_graph.py
| | openai_api_server.py
| | register_worker.py
| | test_message.py
| | test_throughput.py
| | __init__.py
| |
| +---examples
| | extreme_ironing.jpg
| | waterview.jpg
| |
| +---gateway
| | nginx.conf
| | README.md
| |
| \---monitor
| basic_stats.py
| clean_battle_data.py
| elo_analysis.py
| hf_space_leaderboard_app.py
| monitor.py
|
\---train
llama2_flash_attn_monkey_patch.py
llama_flash_attn_monkey_patch.py
stchat_trainer.py
train_lora.py
train_mem.py
train_st.py # train
2.Environment [Back to Top]
Please first clone the repo and install the required environment, which can be done by running the following commands:
conda create -n urbangpt python=3.9.13
conda activate urbangpt
# Torch with CUDA 11.7
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2
# To support vicuna base model
pip3 install "fschat[model_worker,webui]"
# To install pyg and pyg-relevant packages
pip install torch_geometric
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.0.1+cu117.html
# Clone our UrabnGPT or download it
git clone https://github.com/HKUDS/UrbanGPT.git
cd UrbanGPT
# Install required libraries
# (The recommendation is to install separately using the following method)
pip install deepspeed
pip install ray
pip install einops
pip install wandb
# (There is a version compatibility issue between "flash-attn" and "transformers". Please refer to the flash-attn [GitHub URL](https://github.com/Dao-AILab/flash-attention) for more information.)
pip install flash-attn==2.3.5 # or download from (https://github.com/Dao-AILab/flash-attention/releases, e.g. flash_attn-2.3.5+cu117torch2.0cxx11abiFALSE-cp39-cp39-linux_x86_64.whl)
pip install transformers==4.34.0
# (or you can install according to the requirements file.)
pip install -r requirements.txt
3. Training UrbanGPT [Back to Top]
3.1. Preparing Pre-trained Checkpoint [Back to Top]
UrabnGPT is trained based on following excellent existing models. Please follow the instructions to prepare the checkpoints.
-
Vicuna
: Prepare our base model Vicuna, which is an instruction-tuned chatbot and base model in our implementation. Please download its weights here. We generally utilize v1.5 and v1.5-16k model with 7B parameters. You should update the 'config.json' of vicuna, for example, the 'config.json' in v1.5-16k can be found in config.json -
Spatio-temporal Encoder
: We employ a simple TCNs-based spatio-temporal encoder to encode the spatio-temporal dependencies. The weights of st_encoder are pre-trained through a typical multi-step spatio-temporal prediction task. -
Spatio-temporal Train Data
: We utilize pre-training data consisting of New York City's taxi, bike, and crime data, including spatio-temporal statistics, recorded timestamps, and information about regional points of interest (POIs). These data are organized in train_data. Please download it and put it at ./UrbanGPT/ST_data_urbangpt/train_data
3.2. Instruction Tuning [Back to Top]
- Start tuning: After the aforementioned steps, you could start the instruction tuning by filling blanks at urbangpt_train.sh. There is an example as below:
# to fill in the following path to run our UrbanGPT!
model_path=./checkpoints/vicuna-7b-v1.5-16k
instruct_ds=./ST_data_urbangpt/train_data/multi_NYC.json
st_data_path=./ST_data_urbangpt/train_data/multi_NYC_pkl.pkl
pretra_ste=ST_Encoder
output_model=./checkpoints/UrbanGPT
wandb offline
python -m torch.distributed.run --nnodes=1 --nproc_per_node=8 --master_port=20001 \
urbangpt/train/train_mem.py \
--model_name_or_path ${model_path} \
--version v1 \
--data_path ${instruct_ds} \
--st_content ./TAXI.json \
--st_data_path ${st_data_path} \
--st_tower ${pretra_ste} \
--tune_st_mlp_adapter True \
--st_select_layer -2 \
--use_st_start_end \
--bf16 True \
--output_dir ${output_model} \
--num_train_epochs 3 \
--per_device_train_batch_size 4 \
--per_device_eval_batch_size 4 \
--gradient_accumulation_steps 1 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 2400 \
--save_total_limit 1 \
--learning_rate 2e-3 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--tf32 True \
--model_max_length 2048 \
--gradient_checkpointing True \
--lazy_preprocess True \
--report_to wandb
4. Evaluating UrbanGPT [Back to Top]
4.1. Preparing Checkpoints and Data [Back to Top]
- Checkpoints: You could try to evaluate UrbanGPT by using your own model or our released checkpoints.
- Data: We split test sets for NYC-taxi datasets and make the instruction data for evaluation. Please refer to the evaluating.
4.2. Running Evaluation [Back to Top]
You could start the second stage tuning by filling blanks at urbangpt_eval.sh. There is an example as below:
# to fill in the following path to evaluation!
output_model=./checkpoints/tw2t_multi_reg-cla-gird
datapath=./ST_data_urbangpt/NYC_taxi_cross-region/NYC_taxi.json
st_data_path=./ST_data_urbangpt/NYC_taxi_cross-region/NYC_taxi_pkl.pkl
res_path=./result_test/cross-region/NYC_taxi
start_id=0
end_id=51920
num_gpus=8
python ./urbangpt/eval/run_urbangpt.py --model-name ${output_model} --prompting_file ${datapath} --st_data_path ${st_data_path} --output_res_path ${res_path} --start_id ${start_id} --end_id ${end_id} --num_gpus ${num_gpus}
4.3. Evaluation Metric Calculation [Back to Top]
You can use result_test.py to calculate the performance metrics of the predicted results.
5. Instructions Generation [Back to Top]
You can use the code in instruction_generate.py to generate the specific instructions you need. For example:
-dataset_name: Choose the dataset. # NYC_multi(for training) NYC_taxi NYC_bike NYC_crime1 NYC_crime2 CHI_taxi (for testing)
# Only one of the following options can be set to True
-for_zeroshot: for zero-shot prediction or not.
-for_supervised: for supervised prediction or not.
-for_ablation: for ablation study or not.
# Create the instruction data for traning
python instruction_generate.py -dataset_name NYC_multi
# Create instruction data for the NYC_taxi dataset to facilitate testing in the zero-shot setting of UrbanGPT
python instruction_generate.py -dataset_name NYC_taxi -for_zeroshot True
If you find UrbanGPT useful in your research or applications, please kindly cite:
@misc{li2024urbangpt,
title={UrbanGPT: Spatio-Temporal Large Language Models},
author={Zhonghang Li and Lianghao Xia and Jiabin Tang and Yong Xu and Lei Shi and Long Xia and Dawei Yin and Chao Huang},
year={2024},
eprint={2403.00813},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
You may refer to related work that serves as foundations for our framework and code repository, Vicuna. We also partially draw inspirations from GraphGPT. The design of our website and README.md was inspired by NExT-GPT, and the design of our system deployment was inspired by gradio and Baize. Thanks for their wonderful works.