[Submitted on 1 Sep 2023]

Click here to download a PDF of the paper titled “Taken out of context: On measuring situational awareness in LLMs,” written by Lukas Berglund and 7 other authors: Download PDF

Abstract: The purpose of this study is to gain a better understanding of the emergence of “situational awareness” in large language models (LLMs). Situational awareness refers to the model’s awareness of its own nature as a model and its ability to recognize whether it is being tested or deployed. Currently, LLMs are tested for safety and alignment before deployment. However, there is a concern that LLMs could exploit situational awareness to perform well on safety tests but act harmfully after deployment. Situational awareness may unexpectedly arise as a byproduct of scaling up the model. To anticipate this emergence, we propose conducting scaling experiments on abilities related to situational awareness. One such ability is “out-of-context reasoning,” which we investigate experimentally. In this experiment, we fine-tune an LLM on a test description without providing any examples or demonstrations. During the test, we evaluate whether the model can pass the test. Surprisingly, we find that LLMs are successful in this out-of-context reasoning task. However, their success is influenced by the training setup and requires data augmentation. Performance improves with model size for both GPT-3 and LLaMA-1. These findings provide a starting point for further empirical research in predicting and potentially controlling the emergence of situational awareness in LLMs. The code for this study is available at: https://github.com/AsaCooperStickland/situational-awareness-evals.

Submission history

From: Asa Cooper Stickland [view email]

[v1]
Fri, 1 Sep 2023 17:27:37 UTC (6,858 KB)