I repeated my experiment with the Miss Manners benchmark using Mistral.ai (Large). The results were impressive!

For details on the Miss Manners benchmark, and the results with ChatGPT, refer to my previous post.

Prompt:

Help me plan the seating arrangements for a dinner party. All guests will be sitting at the same circular table. I want to ensure that people of the same sex (male or female) are not seated next to each other, and that people seated next to each other share a common hobby.  Here is a table of the guests, their genders and hobbies: Name	Gender	Hobbies		
Hobby 1 Hobby 2 Hobby 3
1 m 2 1 3
2 f 2 1 3
3 m 3 2
4 m 3 2 1
5 m 2 1 3
6 m 2 3 1
7 f 1 2 3
8 m 3 1
9 m 2 3 1
10 m 3 2 1
11 f 1 3 2
12 f 3 1 2
13 f 2 3
14 f 1 2
15 f 2 3 1
16 f 2 3

Note that when I did not supply the guest information in the initial prompt Mistral hallucinated some guest data and then attempted to solve for it.

Mistral provided a valid arrangement, zero-shot. It also listed the hobbies that each seating pair had in common. Impressive!

I then added the divorce constraint:

Guest Female(7) is recently divorced from guest Male(4). Could you take that into account?

Mistral offered this new (invalid) seating arrangement, failing to notice that the arrangement already provided did not seat F7 and M4 next to each other.

It placed F7 “first” and M4 “last” failing to account for the fact that the table is circular, effectively making them neighbours.

Prompt to correct this:

Because the table is circular Make guest 4 and Female guest 7 are now seated next to each other.

Updated (valid) arrangement:

When promoted to analyse this (valid) solution:

What are the problems with this arrangement?

Mistral offered these useful caveats:

The current arrangement meets all the specified criteria: no two people of the same gender are seated next to each other, each person shares a common hobby with the person seated next to them, and Female guest 7 and Male guest 4 are not seated next to each other.

However, there are a few potential issues that might arise, depending on other factors not mentioned in the original criteria:

- If there are any other personal conflicts or relationships between guests that were not mentioned, this arrangement might not account for them.
- If certain guests have specific seating preferences or requirements (e.g., proximity to the entrance, restroom, or a specific guest), this arrangement might not accommodate those needs.
- If any guests are more outgoing or introverted, this arrangement might not take into account the balance of personalities at the table.
- If any guests have dietary restrictions or allergies, this arrangement does not address those concerns.

In summary, the current arrangement meets the specified criteria, but it might not account for other factors that could impact the overall experience of the guests at the dinner party.

Conclusions

Mistral clearly outperformed ChatGPT on this task.

  1. đź‘ŤZero-shot, Mistral found a good solution to the basic 16 person seating problem.
  2. đź‘ŤThe statement of the recent divorce between two guests was correctly inferred to mean they should not be seated next to each other
  3. đź‘ŽIt failed to notice that its solution already accounted for the two recently divorced guests, and in attempting to account for that it failed to take into account the fact that the table is circular
  4. đź‘ŤWhen asked to analyse its solution it correctly determined that it was valid and listed some useful caveats