All trained models, training sets, and artifacts generated by the models have been uploaded to Zenodo. The files are publicly accessible at: https://zenodo.org/records/10642388. All files are released under the CC-BY 4.0 license.
Each file listed below can be downloaded using the download.py
script. For example, to download cifs_v1_val.pkl.gz
:
python bin/download.py cifs_v1_val.pkl.gz
Name | Description | Download Link |
---|---|---|
cifs_v1_orig.tar.gz | The original CIF file dataset containing 3,551,492 symmetrized CIF files. md5:f5d2f99835be1c6a73147fcc48a64d46 • 722.6 MB |
download ↓ |
cifs_v1_orig.pkl.gz | The contents of cifs_v1_orig.tar.gz as a serialized Python list of 2-tuples: (ID, CIF). md5:7a52dd67de3d26c80062d97766206741 • 644.9 MB |
download ↓ |
cifs_v1_dedup.tar.gz | The deduplicated original CIF dataset, containing 2,285,914 symmetrized CIF files. md5:46fd42bc6a9b7b5c0a533206816b6aa4 • 466.5 MB |
download ↓ |
cifs_v1_dedup.pkl.gz | The contents of cifs_v1_dedup.tar.gz as a serialized Python list of 2-tuples: (ID, CIF). md5:44c07eabcffa5c69f03ff602187dd727 • 418.1 MB |
download ↓ |
cifs_v1_prep.tar.gz | The deduplicated and pre-processed original CIF dataset, containing 2,285,719 CIF files. md5:463b96dba247ec41a7041318ff9f783c • 336.9 MB |
download ↓ |
cifs_v1_prep.pkl.gz | The contents of cifs_v1_prep.tar.gz as a serialized Python list of 2-tuples: (ID, CIF). md5:87d40346f4b236714444b3bafbec0e37 • 296.3 MB |
download ↓ |
cifs_v1_train.tar.gz | The training split of the main dataset, containing 2,047,889 CIF files. md5:a0cdbc4f73186c02bce3770be6b2c36e • 326.3 MB |
download ↓ |
cifs_v1_train.pkl.gz | The contents of cifs_v1_train.tar.gz as a serialized Python list of 2-tuples: (ID, CIF). md5:54f695b7862b79ce653338f4161a120d • 286.2 MB |
download ↓ |
cifs_v1_val.tar.gz | The validation split of the main dataset, containing 227,544 CIF files. md5:19aed46ee52616ca9988bd0ebdb5f00b • 36.2 MB |
download ↓ |
cifs_v1_val.pkl.gz | The contents of cifs_v1_val.tar.gz as a serialized Python list of 2-tuples: (ID, CIF). md5:df7f9d8e3ef0dc8e25cc80eba55fc01a • 31.8 MB |
download ↓ |
cifs_v1_test.tar.gz | The test split of the main dataset, containing 10,286 CIF files. md5:903655eef90d46fe74a1c67162bfe621 • 1.6 MB |
download ↓ |
cifs_v1_test.pkl.gz | The contents of cifs_v1_test.tar.gz as a serialized Python list of 2-tuples: (ID, CIF). md5:14b82a67ae2605bfad5facbabfb3c910 • 1.4 MB |
download ↓ |
tokens_v1_all.tar.gz | The tokens of the complete main dataset. md5:94bb09b7a44636df5a9a638ad0744ca9 • 231.8 MB |
download ↓ |
tokens_v1_train_val.tar.gz | The tokens of the training and validation sets of the main dataset. md5:49474e2b7af83a33200aba8786ff1264 • 230.8 MB |
download ↓ |
starts_v1_train.pkl | The start indices for the tokenized training set structures of the main dataset. md5:9e35547108299313dc8cb9d854811d6d • 10.2 MB |
download ↓ |
starts_v1_val.pkl | The start indices for the tokenized validation set structures of the main dataset. md5:27f167ca6ed5fa8f7cd0c462b5bb0bd2 • 1.1 MB |
download ↓ |
challenge_set_v1.zip | The structures of the challenge set. md5:9951749c3af9b27812052c9f68337085 • 287.9 kB |
download ↓ |
Name | Description | Download Link |
---|---|---|
crystallm_v1_small.tar.gz | Model with small architecture trained on the full main dataset. md5:0221fbcd166bddb17f75be8a610892f3 • 285.0 MB |
download ↓ |
crystallm_v1_large.tar.gz | Model with large architecture trained on the full main dataset. md5:7229ae633f832a935eb30dc7f58a8830 • 2.2 GB |
download ↓ |
crystallm_perov_5_small.tar.gz | Model with small architecture trained on the Perov-5 training set only. md5:243378b326e7292cb7a39a1bd0aa179b • 285.9 MB |
download ↓ |
crystallm_perov_5_large.tar.gz | Model with large architecture trained on the Perov-5 training set only. md5:dbd0cfddfc1a2341ebbeb10ce828b378 • 2.3 GB |
download ↓ |
crystallm_carbon_24_small.tar.gz | Model with small architecture trained on the Carbon-24 training set only. md5:8b30dd04ded754fb532c85bc4f50f264 • 284.6 MB |
download ↓ |
crystallm_carbon_24_large.tar.gz | Model with large architecture trained on the Carbon-24 training set only. md5:b4eaec93761b0906294b03779b4261e6 • 2.3 GB |
download ↓ |
crystallm_mp_20_small.tar.gz | Model with small architecture trained on the MP-20 training set only. md5:132bab875f673851c3a13d884dc8a264 • 284.3 MB |
download ↓ |
crystallm_mp_20_large.tar.gz | Model with large architecture trained on the MP-20 training set only. md5:2096d75a309fbe3eddf569232422b9a8 • 2.3 GB |
download ↓ |
crystallm_mpts_52_small.tar.gz | Model with small architecture trained on the MPTS-52 training set only. md5:a24ea9f14c143f3d2e88a9f6238ad3e9 • 284.4 MB |
download ↓ |
crystallm_mpts_52_large.tar.gz | Model with large architecture trained on the MPTS-52 training set only. md5:99e427f1eeb154e243ce637fd180b90a • 2.3 GB |
download ↓ |
crystallm_v1_minus_mpts_52_small.tar.gz | Model with small architecture trained on the full main dataset minus the MPTS-52 test and validation sets. md5:e0fa119d78fadadbe22a0a0875a388ea • 285.0 MB |
download ↓ |
Name | Description | Download Link |
---|---|---|
perov_5_train_orig.tar.gz | The original CIF files of the Perov-5 training set (symmetrized). md5:f483e48ee0e5885800ad38cc0ee2b908 • 1.0 MB |
download ↓ |
perov_5_train_orig.pkl.gz | The contents of perov_5_train_orig.tar.gz as a serialized Python list of 2-tuples: (ID, CIF). md5:c4a03cd42729f5534e09f00874c9164d • 858.5 kB |
download ↓ |
perov_5_train_prep.pkl.gz | The pre-processed CIF files of the Perov-5 training set. md5:c324b2b76f57a02dec057d79bce1a048 • 862.5 kB |
download ↓ |
perov_5_val_orig.tar.gz | The original CIF files of the Perov-5 validation set (symmetrized). md5:e4727034425ed38185f0bbe4c1068b50 • 337.2 kB |
download ↓ |
perov_5_val_orig.pkl.gz | The contents of perov_5_val_orig.tar.gz as a serialized Python list of 2-tuples: (ID, CIF). md5:ab841f403de7c3ba989bc2472a82e7c9 • 287.4 kB |
download ↓ |
perov_5_val_prep.pkl.gz | The pre-processed CIF files of the Perov-5 validation set. md5:597aef92d9e666d312f45a91e71b8b08 • 289.8 kB |
download ↓ |
perov_5_test_orig.tar.gz | The original CIF files of the Perov-5 test set (symmetrized). md5:c405369029780690481444a45a68ffe6 • 335.1 kB |
download ↓ |
perov_5_test_orig.pkl.gz | The contents of perov_5_test_orig.tar.gz as a serialized Python list of 2-tuples: (ID, CIF). md5:5a8484a6f264bee945d56c32cafeac1b • 287.4 kB |
download ↓ |
perov_5_test_prep.pkl.gz | The pre-processed CIF files of the Perov-5 test set. md5:3e9445cccc0ba33c40219d73e2a699cd • 288.1 kB |
download ↓ |
tokens_perov_5.tar.gz | The tokens of the Perov-5 training and validation sets. md5:7c35b6852a06f7fbf6d18daf92d719d8 • 998.3 kB |
download ↓ |
starts_perov_5_train.pkl | The start indices for the tokenized training set structures of the Perov-5 training set. md5:ac91edb62d58c985a919ee2e1563aa29 • 56.5 kB |
download ↓ |
prompts_perov_5_test.tar.gz | Text files containing prompts derived from the Perov-5 test set. md5:3fc9d0624bd319cacabc69e7e48236f4 • 64.8 kB |
download ↓ |
Name | Description | Download Link |
---|---|---|
carbon_24_train_orig.tar.gz | The original CIF files of the Carbon-24 training set (symmetrized). md5:dcab5867560883033fa7db714daac41e • 1.2 MB |
download ↓ |
carbon_24_train_orig.pkl.gz | The contents of carbon_24_train_orig.tar.gz as a serialized Python list of 2-tuples: (ID, CIF). md5:01b6e2b909eb1c7b5ed02694cbcf45db • 1.0 MB |
download ↓ |
carbon_24_train_prep.pkl.gz | The pre-processed CIF files of the Carbon-24 training set. md5:0d3802992b8f39f3409c79634717105d • 587.1 kB |
download ↓ |
carbon_24_val_orig.tar.gz | The original CIF files of the Carbon-24 validation set (symmetrized). md5:661322f67f8cef28df89b922f9004320 • 388.0 kB |
download ↓ |
carbon_24_val_orig.pkl.gz | The contents of carbon_24_val_orig.tar.gz as a serialized Python list of 2-tuples: (ID, CIF). md5:2c831de6c3a6fee518bd6ca8f00b6cc0 • 342.7 kB |
download ↓ |
carbon_24_val_prep.pkl.gz | The pre-processed CIF files of the Carbon-24 validation set. md5:812767bca290928dd75510c3fdd09878 • 196.2 kB |
download ↓ |
carbon_24_test_orig.tar.gz | The original CIF files of the Carbon-24 test set (symmetrized). md5:23688c4556a4c6dc516a15fdf7eff64a • 389.4 kB |
download ↓ |
carbon_24_test_orig.pkl.gz | The contents of carbon_24_test_orig.tar.gz as a serialized Python list of 2-tuples: (ID, CIF). md5:68d1a175b2e3c65e527cb91c2035f27f • 342.9 kB |
download ↓ |
carbon_24_test_prep.pkl.gz | The pre-processed CIF files of the Carbon-24 test set. md5:c4c15d20142b47f56fa4d69fa9a49965 • 195.9 kB |
download ↓ |
tokens_carbon_24.tar.gz | The tokens of the Carbon-24 training and validation sets. md5:96d9c02ecb573e867479d1b343c24d17 • 706.6 kB |
download ↓ |
starts_carbon_24_train.pkl | The start indices for the tokenized training set structures of the Carbon-24 training set. md5:b7ca9fabecc41684fdbdb7b93bafae6b • 30.1 kB |
download ↓ |
prompts_carbon_24_test.tar.gz | Text files containing prompts derived from the Carbon-24 test set. md5:457f19ea1560dd83bb7fc85109d6cbd3 • 37.4 kB |
download ↓ |
Name | Description | Download Link |
---|---|---|
mp_20_train_orig.tar.gz | The original CIF files of the MP-20 training set (symmetrized). md5:0db4e1afb628a5ed40403f047837e460 • 5.8 MB |
download ↓ |
mp_20_train_orig.pkl.gz | The contents of mp_20_train_orig.tar.gz as a serialized Python list of 2-tuples: (ID, CIF). md5:8ae21099f9c9324d27ddb3fb0bb43f0a • 5.0 MB |
download ↓ |
mp_20_train_prep.pkl.gz | The pre-processed CIF files of the MP-20 training set. md5:4f74def3d03d4668d6ab2c0b6a49fd56 • 3.4 MB |
download ↓ |
mp_20_val_orig.tar.gz | The original CIF files of the MP-20 validation set (symmetrized). md5:72a6f605e31439b12799f096adb3372b • 1.9 MB |
download ↓ |
mp_20_val_orig.pkl.gz | The contents of mp_20_val_orig.tar.gz as a serialized Python list of 2-tuples: (ID, CIF). md5:8746bb8cc3e89b51b8728b3b447ea740 • 1.7 MB |
download ↓ |
mp_20_val_prep.pkl.gz | The pre-processed CIF files of the MP-20 validation set. md5:1ffcbb1864e6589e06f05f63fced031f • 1.1 MB |
download ↓ |
mp_20_test_orig.tar.gz | The original CIF files of the MP-20 test set (symmetrized). md5:e3c839dd72bb078793e568438e3a7338 • 1.9 MB |
download ↓ |
mp_20_test_orig.pkl.gz | The contents of mp_20_test_orig.tar.gz as a serialized Python list of 2-tuples: (ID, CIF). md5:0a28dd93b9ab05efd506a9fd3007c7bd • 1.6 MB |
download ↓ |
mp_20_test_prep.pkl.gz | The pre-processed CIF files of the MP-20 test set. md5:2510049cbd4d38fcb5d38745ba7c6784 • 1.1 MB |
download ↓ |
tokens_mp_20.tar.gz | The tokens of the MP-20 training and validation sets. md5:8a0878a547e2a872b1460cc8de24e081 • 4.1 MB |
download ↓ |
starts_mp_20_train.pkl | The start indices for the tokenized training set structures of the MP-20 training set. md5:d626c5889822136b747d110c86e294c3 • 135.4 kB |
download ↓ |
prompts_mp_20_test.tar.gz | Text files containing prompts derived from the MP-20 test set. md5:279b2dd59e5a3f1d81f1875472001ead • 195.5 kB |
download ↓ |
Name | Description | Download Link |
---|---|---|
mpts_52_train_orig.tar.gz | The original CIF files of the MPTS-52 training set (symmetrized). md5:ebf0d6083d6be2003ca5c5fbb081638c • 6.2 MB |
download ↓ |
mpts_52_train_orig.pkl.gz | The contents of mpts_52_train_orig.tar.gz as a serialized Python list of 2-tuples: (ID, CIF). md5:771a9c34f8680e0b6f641bcdb1c634bd • 5.5 MB |
download ↓ |
mpts_52_train_prep.pkl.gz | The pre-processed CIF files of the MPTS-52 training set. md5:1661fe47efa9fe56b7848478045a2252 • 3.7 MB |
download ↓ |
mpts_52_val_orig.tar.gz | The original CIF files of the MPTS-52 validation set (symmetrized). md5:3b491718acef2fb4d29def70e71c93f3 • 1.2 MB |
download ↓ |
mpts_52_val_orig.pkl.gz | The contents of mpts_52_val_orig.tar.gz as a serialized Python list of 2-tuples: (ID, CIF). md5:37a9f04f52bed5ec1fd17ed5e59a8628 • 1.1 MB |
download ↓ |
mpts_52_val_prep.pkl.gz | The pre-processed CIF files of the MPTS-52 validation set. md5:09e3304fad4e1f2f3ac0ffeeafa08c37 • 849.4 kB |
download ↓ |
mpts_52_test_orig.tar.gz | The original CIF files of the MPTS-52 test set (symmetrized). md5:1a28002b36cce07a9d80814e237918a6 • 2.1 MB |
download ↓ |
mpts_52_test_orig.pkl.gz | The contents of mpts_52_test_orig.tar.gz as a serialized Python list of 2-tuples: (ID, CIF). md5:e36697fc6c668402f664f552c8028f31 • 1.9 MB |
download ↓ |
mpts_52_test_prep.pkl.gz | The pre-processed CIF files of the MPTS-52 test set. md5:ddd9fc4ebb28c9564597218696da638a • 1.4 MB |
download ↓ |
tokens_mpts_52.tar.gz | The tokens of the MPTS-52 training and validation sets. md5:b2f053ac4f642c99dec74a60035910a0 • 4.4 MB |
download ↓ |
tokens_v1_minus_mpts_52.tar.gz | The tokens of the full main dataset minus the MPTS-52 validation and test sets. md5:eb9cb102f4c927164b97ec2b2850ed7f • 215.9 MB |
download ↓ |
starts_mpts_52_train.pkl | The start indices for the tokenized training set structures of the MPTS-52 training set. md5:fba4e7f2490d77639c1922014afd281c • 136.7 kB |
download ↓ |
prompts_mpts_52_test.tar.gz | Text files containing prompts derived from the MPTS-52 test set. md5:9edde7e656e9cd19eda82dccb200943d • 171.3 kB |
download ↓ |
Name | Description | Download Link |
---|---|---|
gen_perov_5_small_raw.tar.gz | CIF files generated with the Perov-5 small model starting from the Perov-5 test set prompts (n=20). md5:0e4de11f181c0ef0dce9e845e897b320 • 3.5 MB |
download ↓ |
gen_perov_5_small.tar.gz | Pre-processed CIF files generated with the Perov-5 small model starting from the Perov-5 test set prompts (n=20). md5:5cb0db181b4889407869f9b3ef9322ad • 3.7 MB |
download ↓ |
gen_perov_5_large_raw.tar.gz | CIF files generated with the Perov-5 large model starting from the Perov-5 test set prompts (n=20). md5:2d262a5090b7a91003f48e3a504a628d • 3.3 MB |
download ↓ |
gen_perov_5_large.tar.gz | Pre-processed CIF files generated with the Perov-5 large model starting from the Perov-5 test set prompts (n=20). md5:6915d51c95bec568a4d5e4865a952602 • 3.5 MB |
download ↓ |
gen_carbon_24_small_raw.tar.gz | CIF files generated with the Carbon-24 small model starting from the Carbon-24 test set prompts (n=20). md5:4431635a299f43daf3ced0909b700ddc • 3.7 MB |
download ↓ |
gen_carbon_24_small.tar.gz | Pre-processed CIF files generated with the Carbon-24 small model starting from the Carbon-24 test set prompts (n=20). md5:82ae422dc64dfead56c9cc1981b50052 • 4.4 MB |
download ↓ |
gen_carbon_24_large_raw.tar.gz | CIF files generated with the Carbon-24 large model starting from the Carbon-24 test set prompts (n=20). md5:1cdc7bcba908bc7ad28a36051d0ff14b • 3.7 MB |
download ↓ |
gen_carbon_24_large.tar.gz | Pre-processed CIF files generated with the Carbon-24 large model starting from the Carbon-24 test set prompts (n=20). md5:e501ff21c5fad04c3c8268cd830ff28e • 4.4 MB |
download ↓ |
gen_mp_20_small_raw.tar.gz | CIF files generated with the MP-20 small model starting from the MP-20 test set prompts (n=20). md5:20d9d71ec25a45a7e02bb89f5ec02364 • 11.4 MB |
download ↓ |
gen_mp_20_small.tar.gz | Pre-processed CIF files generated with the MP-20 small model starting from the MP-20 test set prompts (n=20). md5:c488b9f4f71f7c193e775b463ea2e195 • 15.4 MB |
download ↓ |
gen_mp_20_large_raw.tar.gz | CIF files generated with the MP-20 large model starting from the MP-20 test set prompts (n=20). md5:85b66c1c2263233f37db74ee6628eb93 • 9.0 MB |
download ↓ |
gen_mp_20_large.tar.gz | Pre-processed CIF files generated with the MP-20 large model starting from the MP-20 test set prompts (n=20). md5:e3bdfe7077360e2a03ae315eee086d8d • 12.8 MB |
download ↓ |
gen_mpts_52_large_raw.tar.gz | CIF files generated with the MPTS-52 large model starting from the MPTS-52 test set prompts (n=20). md5:10c7d53cf9109b180542bafb4018db8e • 17.4 MB |
download ↓ |
gen_mpts_52_large.tar.gz | Pre-processed CIF files generated with the MPTS-52 large model starting from the MPTS-52 test set prompts (n=20). md5:84b7d7c4748f70cfedf72a03f61d060a • 19.8 MB |
download ↓ |
gen_v1_minus_mpts_52_small_raw.tar.gz | CIF files generated with the small model trained on the full dataset minus the MPTS-52 test and validation sets, starting from the MPTS-52 test set prompts (n=20). md5:1d45428d4e3fbf7b6dc6f79b4bd834f5 • 19.5 MB |
download ↓ |
gen_v1_minus_mpts_52_small.tar.gz | Pre-processed CIF files generated with the small model trained on the full dataset minus the MPTS-52 test and validation sets, starting from the MPTS-52 test set prompts (n=20). md5:7a066559bf0ce2a625aa3e06c31f87cb • 21.8 MB |
download ↓ |