OPT Results not matching for HellaSWAG dataset

Created by: Hritikbansal

Hi,

I tried reproducing the OPT results for various datasets using the LM-eval-harness framework.

I observe that the OPT Accuracy scores do not match the ones reported in the Figure 6 of the OPT paper. However, the Accuracy-norm score seem to be matching for this task.

For the rest of the tasks, regular accuracy scores match the ones presented in the plots in Figure 6.

Here is the table for HellaSWAG:

@stephenroller