Skip to content

trailofbits/ml-file-formats

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 

Repository files navigation

List of ML File Formats

This repository lists file formats used in ML/AI systems. It can be used as a resource for tool development and vulnerability research. We aim to keep this list as up-to-date and accurate as possible. If you discover any missing file formats, inaccuracies, or if you have more details to contribute, please raise an issue or submit a pull request.

Name ML-specific Framework/Organization (if applicable) Identification Tooling Extensions Additional Notes
PyTorch v1.3 Yes PyTorch Fickling .pt, .pth, .bin Description: ZIP file containing data.pkl (1 pickle file)
PyTorch v0.1.1 Yes PyTorch Fickling .pt, .pth, .bin Description: Tar file with sys_info, pickle, storages, and tensors
PyTorch v0.1.10 Yes PyTorch Fickling .pt, .pth, .bin Description: Stacked pickle files
TorchScript v1.4 Yes PyTorch Fickling .pt, .pth, .bin Description: ZIP file with data.pkl, constants.pkl, and version (2 pickle files and a folder)
TorchScript v1.3 (deprecated) Yes PyTorch Fickling .pt, .pth, .bin Description: ZIP file with data.pkl and constants.pkl (2 pickle files)
TorchScript v1.1 (deprecated) Yes PyTorch Fickling .pt, .pth, .bin Description: ZIP file with model.json and attributes.pkl (a JSON file and a pickle file)
TorchScript v1.0 (deprecated) Yes PyTorch Fickling .pt, .pth, .bin Description: ZIP file with model.json
PyTorch model archive format [ZIP] Yes PyTorch Fickling .mar Description: ZIP file that includes Python code files and pickle files
PyTorch model archive format [TAR] Yes PyTorch - .mar Description: TAR file that includes Python code files and pickle files
PyTorch Package Yes PyTorch - .pt, .pth, .bin Description: ZIP file that includes a pickled model, user files represented as a Python package, and framework files including serialized tensor data
ExecuTorch Yes PyTorch - .pte Description: Modified binary flatbuffer file with optional data segments appended
Torch.export Yes PyTorch - .pt2 Description: ZIP file with JSON files and Python code file
PyTorch Mobile Yes PyTorch - .ptl Description: Modified binary flatbuffer file
Safetensors Yes - PolyFile .safetensors Refer to our audit
ONNX Yes - - .onnx Refer to LobotoMI
Keras native file format Yes Keras - .keras Description: ZIP archive with 2 JSON files and 1 h5 file
TensorFlow Saved Models Yes TensorFlow - .pb Description: Custom Protobuf format. Can result in arbitrary code execution.
TensorFlow Checkpoint Yes TensorFlow - .ckpt Description: Custom Protobuf format. Can result in arbitrary code execution.
TFLite Yes TensorFlow - .tflite Description: Modified binary flatbuffer file
TFJS Yes TensorFlow - - Description: JSON file and binary file with weights. Technically not a singular file format.
TF1 Hub format (deprecated) Yes TensorFlow - - Description: Custom Protobuf format.
Tensorizer Yes CoreWeave - - Not uncommon especially in private production systems
TFRecords Yes TensorFlow - .tfrecords Description: Wrapper around a Protocol Buffer
NPY Yes NumPy - .npy Used to integrate pickle by default as well.
NPZ Yes NumPy - .npz Description: ZIP file of NPY files
GGUF Yes llama.cpp/GGML - .gguf -
GGML Yes llama.cpp/GGML - .ggml -
GGMF (deprecated) Yes llama.cpp/GGML - .ggmf -
GGJT (deprecated) Yes llama.cpp/GGML - .ggjt -
NetCDF Yes - - .nc -
PMML Yes - - - -
MLeap Yes Spark - .mleap -
CoreML Yes Apple - .coreml -
MLFlow Format Yes MLFlow - - -
MLFlow TensorSpec input format Yes MLFlow - - -
SurrealML Yes SurrealDB - .surml -
Llamafile Yes - - .llamafile -
.prompt Yes HumanLoop - .prompt -
Pickle No Python PolyFile .pkl Refer to Fickling
Joblib No - PolyFile - -
Nemo Yes NVIDIA - - -
Riva Yes NVIDIA - - -
AVRO No - - - -
PARQUET No - - - -
ORC No - - - -
JSON No - PolyFile - -
CSV No - - - -
Protocol Buffers No - - - Usually an underlying file format
HDF5 No - - .h5 -
Caffe Yes Caffe - .caffemodel & .prototxt Description: Protobuf-based file format
ArmNN Flatbuffers Yes ArmNN - - -
Cambricon Yes - - - -
Circle Yes - - - -
ZIP No - PolyFile - Usually an underlying file format
CNTK v1 (deprecated) Yes Microsoft Cognitive Toolkit - - -
CNTK v2 Yes Microsoft Cognitive Toolkit - - Description: Protobuf-based file format
Darknet Yes Hank.ai Darknet - - -
DL4J Yes DL4J - - Description: ZIP-based file format
Deep Learning Container (DLC) Yes Qualcomm Neural Processing SDK - .dlc -