Here is how we can use this to load the GPT2-1.5B model. If you want to use big model inference with 🤗 Transformers models, check out this documentation. It will also automatically dispatch those weights across the devices you have available (GPUs, CPU RAM), so if you are loading a sharded checkpoint, the maximum RAM usage will be the size of the biggest shard. This supports full checkpoints (a single file containing the whole state dict) as well as sharded checkpoints. The second tool 🤗 Accelerate introduces is a function load_checkpoint_and_dispatch(), that will allow you to load a checkpoint inside your empty model.
Copied Īnd first_state_dict.bin containing the weights for 'linear1.weight' and 'linear1.bias', second_state_dict.bin the ones for 'linear2.weight' and 'linear2.bias' Loading weights