Running the below code downloads a model - does anyone know what folder it downloads it to?

!pip install -q transformersfrom transformers import pipelinemodel = pipeline('fill-mask')
7

Best Answer


Update 2023-05-02: The cache location has changed again, and is now ~/.cache/huggingface/hub/, as reported by @Victor Yan. Notably, the sub folders in the hub/ directory are also named similar to the cloned model path, instead of having a SHA hash, as in previous versions.


Update 2021-03-11: The cache location has now changed, and is located in ~/.cache/huggingface/transformers, as it is also detailed in the answer by @victorx.


This post should shed some light on it (plus some investigation of my own, since it is already a bit older).

As mentioned, the default location in a Linux system is ~/.cache/torch/transformers/ (I'm using transformers v 2.7, currently, but it is unlikely to change anytime soon.). The cryptic folder names in this directory seemingly correspond to the Amazon S3 hashes.

Also note that the pipeline tasks are just a "rerouting" to other models. To know which one you are currently loading, see here. For your specific model, pipeline(fill-mask) actually utilizes a distillroberta-base model.

On windows 10, replace ~ with C:\Users\username or in cmd do cd /d "%HOMEDRIVE%%HOMEPATH%".

So full path will be: C:\Users\username\.cache\huggingface\transformers

As of Transformers version 4.3, the cache location has been changed.

The exact place is defined in this code section ​https://github.com/huggingface/transformers/blob/master/src/transformers/file_utils.py#L181-L187

On Linux, it is at ~/.cache/huggingface/transformers.

The file names there are basically SHA hashes of the original URLs from which the files are downloaded. The corresponding json files can help you figure out what are the original file names.

As of transformers 4.22, the path appears to be (tested on CentOS):

~/.cache/huggingface/hub/

On Windows C:\Users\USER\.cache\huggingface\hub

from huggingface_hub import hf_hub_downloadhf_hub_download(repo_id="sentence-transformers/all-MiniLM-L6-v2", filename="config.json")
ls -lrth ~/.cache/huggingface/hub/models--sentence-transformers--all-MiniLM-L6-v2/snapshots/7dbbc90392e2f80f3d3c277d6e90027e55de9125/total 4.0Klrwxrwxrwx 1 alex alex 52 Jan 25 12:15 config.json -> ../../blobs/72b987fd805cfa2b58c4c8c952b274a11bfd5a00lrwxrwxrwx 1 alex alex 76 Jan 25 12:15 pytorch_model.bin -> ../../blobs/c3a85f238711653950f6a79ece63eb0ea93d76f6a6284be04019c53733baf256lrwxrwxrwx 1 alex alex 52 Jan 25 12:30 vocab.txt -> ../../blobs/fb140275c155a9c7c5a3b3e0e77a9e839594a938lrwxrwxrwx 1 alex alex 52 Jan 25 12:30 special_tokens_map.json -> ../../blobs/e7b0375001f109a6b8873d756ad4f7bbb15fbaa5lrwxrwxrwx 1 alex alex 52 Jan 25 12:30 tokenizer_config.json -> ../../blobs/c79f2b6a0cea6f4b564fed1938984bace9d30ff0

I have some code on my Fedora Linux, like

embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",model_kwargs={"device":'cuda'})

which will download the LLM when executed, to ~/.cache/torch/sentence_transformers when using sentence_transformers model(s)

You may want to list and sort by size

cd ~/.cache; du -sk -- * | sort -nr