OpenAI’s ChatGPT o3 model won against Elon Musk’s xAI model Grok 4 in suspenseful AI chess tournament on Kaggle. The tournament unfolded over three days ll and was approximately to test how general-purpose l word language models deal with the highly complex game of chess. Unlike traditional chess engines such as Stockfish, this tournament was focused on multi-purpose l argem LMs that were not specifically designed for chess.
The tournament featured eight AIs from companies like OpenAI, xAI, Google, Anthropic, and Chinese firms such as DeepSeek and Moonshot AI. The AIs followed standard chess rules, but none specialized in chess, making it challenging to determine how to apply their skills or general knowledge in a strategy-based game requiring planning and foresight.
In the early stages, Grok 4 dominated, looking unstoppable. It won by huge margins and seemed the only contender for the championship. Commentators did not feel Grok 4 was even being challenged until the semi-final, where Grok 4 demonstrated its strength as it had done throughout the tournament. Only then did the unsuspecting twists of chess await Grok 4 in an episode against ChatGPT o3 in the finals of the tournament.
Grandmaster Hikaru Nakamura, who provided live commentary for the event, indicated Grok 4 played strong chess throughout the earlier rounds, but really struggled in the final. He also mentioned that OpenAI’s ChatGPT did not make those mistakes and played consistently throughout the matches.
Musk Responds to the Loss
Elon Musk quickly played down the loss and explained that Grok 4’s strong play earlier in the tournament was a “side effect” and that his company xAI had not put a lot of time or effort into chess. Musk also noted that xAI is not in the business of chess-playing AI but had other goals in mind in regards to artificial intelligence.
Chess has long been viewed as a gauge of the advancement of AI. Then we have already seen prominent AI systems in general purpose models (such as DeepMind’s AlphaGo), which incorporated many facets of deep strategic thinking. But this was different. This was a tournament of general-purpose LLMs to pitched their capabilities to perform structured tasks and exercise complex thought processes without training for the game; and without the prompts that inevitably pointed them toward the structured play of the game itself.
The results of this tournament reflect that not all large language models are equally competent in the strategic performance of chess. ChatGPT’s ability to dynamically adapt to the given chess positions and withstand the rigors of strategic tournament play through to the conclusion illustrate LLMs ability to deploy some of the faculties of complex adversarial activity. Grok 4’s failure to sustain structural performance upon the introduction of pressure demonstrates the inconsistency still possible in LLMs.
This event supplements the ongoing rivalry between Elon Musk’s xAI and OpenAI while investing the potential to continue using chess as a means of relying upon structured benchmark in evaluating LLMs. As intensifying movements of AI continue to evolve it is likely that these tournaments will provide useful insights when systematically operating under elevated-pressure strategic environments.
Also Read: iPhone 17 Pro Max Or Google Pixel 10 Pro XL: The Next Big Smartphone War