Skip to content

Commit 7c746fe

Browse files
committed
copy
1 parent 7396e38 commit 7c746fe

File tree

2 files changed

+37
-26
lines changed

2 files changed

+37
-26
lines changed

website/_data/refactor_leaderboard.yml

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -143,25 +143,25 @@
143143
seconds_per_case: 67.8
144144
total_cost: 20.4889
145145

146-
147-
- dirname: 2024-06-20-16-39-18--refac-claude-3.5-sonnet-diff
146+
- dirname: 2024-07-01-18-30-33--refac-claude-3.5-sonnet-diff-not-lazy
148147
test_cases: 89
149148
model: claude-3.5-sonnet (diff)
150149
edit_format: diff
151-
commit_hash: e5e07f9
152-
pass_rate_1: 55.1
153-
percent_cases_well_formed: 70.8
154-
error_outputs: 240
155-
num_malformed_responses: 54
156-
num_with_malformed_responses: 26
157-
user_asks: 10
150+
commit_hash: 7396e38-dirty
151+
pass_rate_1: 64.0
152+
percent_cases_well_formed: 76.4
153+
error_outputs: 176
154+
num_malformed_responses: 39
155+
num_with_malformed_responses: 21
156+
user_asks: 11
158157
lazy_comments: 2
159-
syntax_errors: 0
160-
indentation_errors: 3
158+
syntax_errors: 4
159+
indentation_errors: 0
161160
exhausted_context_windows: 0
162161
test_timeouts: 0
163162
command: aider --model openrouter/anthropic/claude-3.5-sonnet
164-
date: 2024-06-20
165-
versions: 0.38.1-dev
166-
seconds_per_case: 51.9
167-
total_cost: 0.0000
163+
date: 2024-07-01
164+
versions: 0.40.7-dev
165+
seconds_per_case: 42.8
166+
total_cost: 11.5242
167+

website/_posts/2024-07-01-sonnet-not-lazy.md

Lines changed: 22 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,34 @@
11
---
22
title: Sonnet is the opposite of lazy
33
excerpt: Claude 3.5 Sonnet represents a step change in AI coding.
4-
#highlight_image: /assets/linting.jpg
5-
draft: true
4+
highlight_image: /assets/sonnet-not-lazy.jpg
65
nav_exclude: true
76
---
7+
8+
[![sonnet is the opposite of lazy](/assets/sonnet-not-lazy.jpg)](https://aider.chat/assets/sonnet-not-lazy.jpg)
9+
810
{% if page.date %}
911
<p class="post-date">{{ page.date | date: "%B %d, %Y" }}</p>
1012
{% endif %}
1113

12-
1314
# Sonnet is the opposite of lazy
1415

15-
[![sonnet is the opposite of lazy](/assets/sonnet-not-lazy.jpg)](https://aider.chat/assets/sonnet-not-lazy.jpg)
16-
1716
Claude 3.5 Sonnet represents a step change
1817
in AI coding.
1918
It is so industrious, diligent and hard working that
2019
it has caused multiple problems for aider.
20+
2121
It's been worth the effort to adapt aider to work well
2222
with Sonnet,
2323
because the result is surprisingly powerful.
24+
Sonnet's score on
25+
[aider's refactoring benchmark](https://aider.chat/docs/leaderboards/#code-refactoring-leaderboard)
26+
jumped from 55.1% up to 64.0%
27+
as a result of the changes discussed below.
28+
This moved Sonnet into second place, ahead of GPT-4o and
29+
behind only Opus.
30+
31+
## Problems
2432

2533
Sonnet's amazing work ethic caused a few problems:
2634

@@ -31,7 +39,7 @@ on API responses, which truncates its coding in mid-stream.
3139
2. Similarly, Sonnet can specify large sequences of edits in one go,
3240
like changing a majority of lines while refactoring a large file.
3341
Again, this regularly triggered the 4k output limit
34-
and resulted in a failed edits.
42+
and resulted in failed edits.
3543
3. Sonnet is not shy about quoting large chunks of an
3644
existing file to perform a SEARCH & REPLACE edit across
3745
a long span of lines.
@@ -57,7 +65,7 @@ Problem (3) does cause some real downsides.
5765
Faced with a few small changes spread far apart in
5866
a source file,
5967
Sonnet would often prefer to do one giant SEARCH/REPLACE
60-
operation of the ~entire file.
68+
operation of almost the entire file.
6169
This wastes a tremendous amount of tokens,
6270
time and money -- and risks hitting the 4k output limit.
6371
It would be far faster and less expensive to instead
@@ -76,13 +84,16 @@ has specialized support for Claude 3.5 Sonnet:
7684
- Aider allows Sonnet to produce as much code as it wants,
7785
by automatically and seamlessly spreading the response
7886
out over a sequence of 4k token API responses.
79-
- Aider carefully prompts Sonnet to be concise and
80-
return only changing sections of code.
87+
- Aider carefully prompts Sonnet to be concise when proposing
88+
code edits.
8189
This reduces Sonnet's tendency to waste time, tokens and money
8290
returning large chunks of unchanging code.
83-
- Aider now uses `claude-3-5-sonnet-20240620` by default if `ANTHROPIC_API_KEY` is set in the environment.
91+
- Aider now uses Claude 3.5 Sonnet by default if the `ANTHROPIC_API_KEY` is set in the environment.
8492

85-
You can use aider with Sonnet like this:
93+
See
94+
[aider's install instructions](https://aider.chat/docs/install.html)
95+
for more details, but
96+
you can get started quickly with aider and Sonnet like this:
8697

8798
```
8899
pip install aider-chat

0 commit comments

Comments
 (0)