I repeated my experiment with the Miss Manners benchmark using Clause Opus (Clause Pro). The results were better than GPT4, but inferior to Mistral (Large).

For details on the Miss Manners benchmark, and the results with ChatGPT, refer to my previous post, or the view a summary of other results.

Zero-shot Clause Opus produced a valid solution. Note that it seemed to be sensitive to the whitespace and line breaks in the prompt.

Prompting:

Guest 7 (f) is recently divorced from guest 4 (m). Could you take that into account?

Opus failed to detect that the existing solution satisfied the new constraint and in fact created a new solution that violated the divorce constraint, seating the divorced couple next to each other:

Note that this behaviour was also seen with Gemini and with GPT4, so perhaps there is something inherent in “attention is all you need” that makes adding this type of constraint difficult?

Prompt:

What are the problems with this arrangement?

Opus correctly determined that the arrangement did not account for the divorced couple, but then created an invalid arrangement, seating F16 and F12 next to each other.

With a couple of extra prompts to fix issues, Opus finally produced this valid arrangement:

Prompt:

What are the problems with this arrangement?

Correct response:

I don't see any problems with the latest seating arrangement. Let's review the requirements and how this arrangement satisfies them:
- Alternating male and female guests: The seating arrangement follows a perfect alternating pattern of male and female guests around the table.
- Guests sharing common hobbies: Each guest shares at least one hobby with both of their neighbors, as detailed in the shared hobbies list.
- Recently divorced guests (Guest 4 and Guest 7) not seated next to each other: Guest 4 (m) and Guest 7 (f) are seated several seats apart, with Guest 3 (m), Guest 16 (f), Guest 8 (m), and Guest 12 (f) between them, providing a comfortable distance.

The arrangement meets all the specified criteria, and there are no visible conflicts or issues. However, if there are any additional constraints or preferences that I haven't accounted for, please let me know, and I'll be happy to make further adjustments to ensure the best possible seating arrangement for your dinner party.

Conclusions

This was strong set of results, better than GPT4 but not as good as Mistral (Large):

  1. 👍Zero-shot Opus found a good solution to the basic 16 person seating problem.
  2. 👍The statement of the recent divorce between two guests was correctly inferred to mean they should not be seated next to each other
  3. 👎It failed to notice that its solution already accounted for the two recently divorced guests, and in attempting to account for that it created a solution that DID seat the divorced couple next to each other, but created an incorrect explanation for what it had done
  4. 👍It was able to analyse solutions and correct them when invalid
  5. 👍When asked to analyse its solution it correctly determined that it was valid and listed some useful caveats