Miss Manners with GPT 4

I repeated my experiment with the Miss Manners benchmark using OpenAI GPT4. The results were better than ChatGPT, but not as good as Mistral (Large).

For details on the Miss Manners benchmark, and the results with ChatGPT, refer to my previous post, or the results with Mistral.AI.

For this test I used the OpenAI Playground, set to GPT4 mode.

Zero-shot response from GPT4:

I'm making several assumptions about the hobby numbers listed here. I'll assume that each number correlates to a specific, unique hobby, and that multiple guests can share the same hobbies.
Seating Arrangement:

Adding the divorce constraint:

Guest Female(7) is recently divorced from guest Male(4). Could you take that into account?

GPT4 correctly deduced that F7 and M4 should not be seated together, however did not detect that was already the case in the proposed solution, and created a new (incorrect) solution, with an incorrect explanation.

Of course, we can rearrange the seating positions to accommodate their situation. Here's the revised seating plan:

<see image below>

This way, the recently divorced couple, Guest 4 and Guest 7, will not be sitting next to each other, but they're still within the requirements of hobby sharing and different genders.

Prompt to correct:

This is incorrect, you have seated Guest 7 and 4 next to each other.

Updated (valid) seating arrangement:

Analysing the results:

What are the problems with this arrangement?

Confused/incorrect response:

Reviewing the table and seating arrangement again, I found an error:
Guest 4 and Guest 14 don't share a common hobby. Guest 4 hobbies are 3, 2, 1 and Guest 14 hobbies are only 1, 2. Guest 14 should not be seated next to Guest 4.
Another potential issue is that, given the diversity of your guests' hobbies, there may be guests who don't have the opportunity to engage as deeply in conversation about their particular hobbies when limited to discussion with immediate neighbors.
This arrangement doesn't take into account known interpersonal relationships among your guests beyond the recent divorce of Guests 4 and 7. Some guests may want to sit close to their friends, while certain guests might not get along and prefer to sit further apart.
Finally, since the hobby preferences are ranked 1-3 for some guests, this arrangement doesn't consider the ranking. For instance, a guest may be seated next to someone sharing their 3rd ranked hobby option rather than discussing a 1st or 2nd ranked hobby they might enjoy more.
A solution would be to gather more information on hobby preferences and interpersonal relationships among guests. This would enable an arrangement that could better facilitate engaging and comfortable conversations for everyone.

Conclusions

GPT4 outperformed ChatGPT on this task, but did not perform as well as Mistral (Large).

👍Zero-shot, GPT4 found a good solution to the basic 16 person seating problem.
👍The statement of the recent divorce between two guests was correctly inferred to mean they should not be seated next to each other
👎It failed to notice that its solution already accounted for the two recently divorced guests, and in attempting to account for that it created a solution that DID seat the divorced couple next to each other, but created an incorrect explanation for what it had done
👎When asked to analyse its solution it incorrectly determined that it was invalid, added some confusing text about ranking of hobbies. It did list some useful caveats

Innovation That Matters

Technology trends for the world of tomorrow