Created by: suchenzang
Continuing to break down https://github.com/facebookresearch/metaseq/pull/197
- Removed
scale_fc,scale_attn, andscale_headsflags that were brought in as part of Normformer - might have to bring scale_fc back later, but hoping to clean up configuration / flags before then. - Removed
sync_ln_variancewhich was brought in to speed up NormFormer when we went tensor parallel.