Train and release OPT-350m with layer norm
Created by: suchenzang
It currently seems like OPT-350m may not have been trained with layer norms, as raised by https://github.com/facebookresearch/metaseq/issues/383 and https://github.com/facebookresearch/metaseq/commit/c4b33ba6e2cd9b33539bbb5a35d831096bde3282 diff which shows that decoder_normalize_before was defaulted to True only in the model parallel case (and False otherwise).
We should re-train OPT-350m with model parallel 2 to match the rest of OPT models and update paper results as well.