By Disha Gupta
Scientists are racing to build AI systems that can think more like humans, and a team from Singapore-based company Sapient has introduced a new model that seems to be a step in that direction. Called the Hierarchical Reasoning Model (HRM), it focuses on reasoning in a way that mimics how different parts of the human brain process information over short and long time spans. Unlike popular large language models (LLMs) such as ChatGPT, which rely on billions of parameters and huge amounts of data, HRM is much smaller, using just 27 million parameters and 1,000 training samples, yet it has shown stronger performance in reasoning tasks.
When tested on the ARC-AGI benchmark, one of the toughest assessments for measuring how close AI is to human-like intelligence, HRM performed impressively. It scored 40.3% in the ARC-AGI-1 test, outperforming OpenAI’s 03-mini-high at 34.5%, Anthropic’s Claude 3.7 at 21.2%, and DeepSeek R1 at 15.8%. Even in the more difficult ARC-AGI-2 test, HRM scored 5%, which was still higher than the other models.
Most advanced LLMs use a method called chain-of-thought reasoning, where problems are broken down step by step, but this approach can be slow and data-heavy. HRM instead uses a two-part system: one module does slower, abstract planning, while the other handles quick, detailed calculations. It also applies an approach called iterative refinement, starting with a rough answer and then improving it in short bursts until a strong solution is reached. This helped HRM succeed at tasks that stump many AI models, like solving Sudoku puzzles and finding paths in mazes.
The results are promising, but there is some caution. The research has only been published on the open-access platform arXiv and has not yet been peer-reviewed. The ARC-AGI benchmark team, which tested the model after it was released, confirmed the strong scores but noted that the success may be less about the hierarchical design and more about an extra refinement step used during training.
