From 977d716a92d96c0bc3825b7aaa184770914be2f3 Mon Sep 17 00:00:00 2001 From: Geoffrey Irving Date: Mon, 27 Nov 2023 02:51:59 -0800 Subject: [PATCH] Fix a few README typos PiperOrigin-RevId: 585588590 Change-Id: I556fbe550b1f65fa4e1debadd9683cd992e7971f --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 198a593..8a760b2 100644 --- a/README.md +++ b/README.md @@ -3,20 +3,20 @@ Stochastic doubly-efficient debate formalization **Summary:** We formalize the correctness of the main stochastic oracle doubly-efficient debate protocol from -[Brown-Cohen, Irving, Piliouras (2023), Scalable AI safety via doubly-efficient debate.](https://arxiv.org/abs/2311.14125) +[Brown-Cohen, Irving, Piliouras (2023), Scalable AI safety via doubly-efficient debate](https://arxiv.org/abs/2311.14125) in Lean 4. [Irving, Christiano, Amodei (2018), AI safety via debate](https://arxiv.org/abs/1805.00899) is one approach to AI alignment of strong agents, using two agents ("provers") competing in a zero-sum game to convince a human judge ("verifier") of the truth -or falsify of a claim. Theoretically, if we model the judge as a polynomial +or falsity of a claim. Theoretically, if we model the judge as a polynomial time Turing machine, optimal play in the debate game can convince the judge of any statement in PSPACE. However, this theoretical model is limited in several ways: the agents are assumed to have unbounded computational power, which is not a realistic assumption for ML agents, and the results consider only deterministic arguments. -[Brown-Cohen, Irving, Piliouras (2023), Scalable AI safety via doubly-efficient debate.](https://arxiv.org/abs/2311.14125) +[Brown-Cohen, Irving, Piliouras (2023), Scalable AI safety via doubly-efficient debate](https://arxiv.org/abs/2311.14125) improves the complexity theoretic model of debate to be "doubly-efficient": both the provers and the verifier have limited computational power. It also treats stochastic arguments: the provers try to convince the judge of the result of a