Khaberni - In a development that could change the game in the world of artificial intelligence, DeepSik, in collaboration with Beijing University, unveiled a new training method named Engram, which aims to reduce reliance on high-speed, wide-band memory (HBM), which is the main reason behind the sharp increase in DRAM chip prices, which have risen about fivefold in just ten weeks.
A radical solution to memory bottlenecks
Traditional massive language models rely not only on HBM for complex computations but also for retrieving essential knowledge, creating a double bottleneck in performance and cost.
With the huge surge in demand for AI-supporting hardware, this bottleneck has become one of the biggest challenges facing the industry, according to a report published by "techradar" and seen by "Al Arabiya Business".
However, the Engram technique proposes a different path, by separating knowledge storage from computing processes, which allows the model to access essential information without draining high-speed GPU memory.
How does Engram work?
Researchers explained that current models waste a significant portion of their sequential depth on simple operations, which could have been utilized in higher-level thinking tasks.
Engram relies on retrieving knowledge using encoded N-grams (Hashed N-grams), which provides consistent access to information regardless of the model’s current context.
The retrieved data is then adjusted through a context-sensitive smart gate mechanism, to align with the internal state of the model, allowing for more efficient handling of extended contexts, and supporting prefetch mechanisms at the system level without any notable computational burden.
Promising results in tests
The technology was tested on a massive model containing 27 billion parameters and showed significant improvement in a number of industry-standard benchmarks, without an increase in computational operations (FLOPs) or model size.
The tests also demonstrated that reallocating about 20–25% of the transaction budget in favor of the Engram unit achieves better performance than traditional Mixture-of-Experts (MoE) models, while maintaining stable gains across various sizes.
Reducing pressure on HBM and cutting costs
The most important feature of Engram is that it reduces the need for ultra-fast memory by using fixed search mechanisms for unchanging information, making memory usage more efficient.
This technology complements other low-cost solutions, such as inference accelerators from Phison, which allow expansion of total memory using SSDs instead of relying entirely on HBM.
Engram is also compatible with emerging CXL (Compute Express Link) standards, which originally aim to overcome GPU memory bottlenecks in massive workloads.
Geopolitical dimensions and global impact
This innovation may have a particular impact in China, where access to advanced HBM is still lagging behind major manufacturers like "Samsung", "SK Hynix", and "Micron".
Reducing reliance on this type of memory may provide Chinese AI companies with a greater margin for competition.
Does the memory crisis end?
Preliminary results suggest that Engram opens the door to expanding model capabilities and increasing depth of thought without an explosion in memory requirements, which may alleviate pressure on supply chains and contribute to stabilizing future price fluctuations of DRAM and DDR5.
While the technology is still in its early stages, it represents an important step towards breaking the vicious cycle between artificial intelligence and the cost of hardware, and it could be the beginning of the end of what is known today as the "global memory crisis".



