Running OPT 175B with different hardware configurations

Created by: sachit-menon

❓ Questions and Help

Before asking:

search the issues.
search the docs.

What is your question?

How can I get the 175B model running for inference on a hardware setup as described below? Is it possible on one node with 8 A6000s with 51GB each, perhaps with DeepSpeed or similar? I know there are multiple other similar issues, but I'm wondering if the requirements can be somewhat relaxed for inference only (and my hardware setup is a bit different), so I thought I'd throw my question into the ring :).

What's your environment?

metaseq Version (e.g., 1.0 or master): master
PyTorch Version (e.g., 1.0) 1.10.1+cu113
OS (e.g., Linux): Linux
How you installed metaseq (pip, source): per instructions in https://github.com/facebookresearch/metaseq/blob/main/docs/setup.md
Build command you used (if compiling from source):
Python version: 3.9
CUDA/cuDNN version: 11.3
GPU models and configuration: (potentially 2 nodes of) 8x NVIDIA RTX A6000 51GB RAM
Any other relevant information: