I repeated my experiment with the Miss Manners benchmark using Clause Sonnet (Clause 3.5 Pro). The results were the best I’ve seen so far.
Continue reading “Miss Manners with Claude Sonnet”I repeated my experiment with the Miss Manners benchmark using Clause Opus (Clause Pro). The results were better than GPT4, but inferior to Mistral (Large).
Continue reading “Miss Manners with Claude Opus”I repeated my experiment with the Miss Manners benchmark using Google Gemini. The results were inferior to ChatGPT.
Continue reading “Miss Manners with Gemini”