#chetanpatil – Chetan Arvind Patil

The Semiconductor Shift Toward Processor-In-Memory And Processing-Near-Memory

Image Generated Using Nano Banana


Reliance Of AI And Data Workloads On Computer Architecture

AI and modern data workloads have transformed how we think about computing systems. Traditional processors were designed for sequential tasks and moderate data movement. Today’s AI models work with enormous datasets and large numbers of parameters that must move constantly between memory and compute units. This movement introduces delays and consumes significant energy. As a result, memory bandwidth and the distance to the data have become major performance bottlenecks.

Graphics processors, tensor accelerators, and custom architectures try to address these issues by increasing parallelism. Yet, parallel computing alone cannot solve the challenge if data cannot reach the compute units fast enough. The cost of moving data inside a system is now often higher than the cost of the computation itself.

This places the spotlight on the relationship between compute location, memory hierarchy, and data flow. As models grow in size and applications demand faster responses, the gap between processor speed and memory access continues to widen.

The computing industry often refers to this as the memory wall. When AI tasks require moving gigabytes of data per operation, each additional millimeter of distance within a chip or package matters. To break this pattern, new approaches look at placing compute engines closer to where data is stored.

This shift has sparked interest in Processor In-Memory and Processing Near-Memory solutions.

Instead of pulling data along long paths, the system reorganizes itself so that computation occurs either within the memory arrays or very close to them. This architectural change aims to reduce latency, cut energy use, and support the growing scale of AI workloads.


What Is Processor-In-Memory And Processing-Near-Memory

Processor-In-Memory places simple compute units directly inside memory arrays. The idea is to perform certain operations, such as multiplication and accumulation, inside the storage cells or peripheral logic. By doing this, data does not need to travel to a separate processor. This can lead to significant improvements in throughput and reductions in energy consumption for specific AI tasks, especially those involving matrix operations.

Processing-Near-Memory keeps memory arrays unchanged but integrates compute units very close to them, usually on the same stack or interposer. These compute units are not inside the memory but sit at a minimal distance from it. This enables faster data access than traditional architectures without requiring significant changes to memory cell structures. PNM often offers a more flexible design path because memory vendors do not need to modify core-array technology.

Here is a simple comparison of the two approaches.

FeatureProcessor-In-MemoryProcessing-Near-Memory
Compute locationInside memory arrays or peripheral logicAdjacent to memory through same stack or substrate
Memory modificationRequires changes to memory cell or array designUses standard memory with added compute units nearby
Data movementVery low due to in-array operationLow because compute is positioned close to data
FlexibilityLimited to specific operations built into memoryWider range of compute tasks possible
Technology maturityStill emerging and specializedMore compatible with existing memory roadmaps

Both approaches challenge the long-standing separation between computing and storage. Instead of treating memory as a passive container for data, they treat it as an active part of the computation pipeline. This helps systems scale with the rising demands of AI without relying entirely on larger, more power-hungry processors.


Research Efforts For Processor In Memory And Processing Near Memory

Research activity in this area has grown quickly as AI and data workloads demand new architectural ideas. Both Processor In Memory and Processing Near Memory have attracted intense attention from academic and industrial groups. PIM work often focuses on reducing data movement by performing arithmetic inside or at the edge of memory arrays. At the same time, PNM research explores programmable compute units placed near memory stacks to improve bandwidth and latency.

The selected examples below show how each direction is pushing the boundaries of energy efficiency, scalability, and workload suitability.

Image Credit: SparseP
CategoryExample WorkKey FocusWhat It DemonstratesLink
Processor In MemorySparseP: Efficient Sparse Matrix Vector Multiplication on Real PIM Systems (2022)Implements SpMV on real PIM hardwareShows strong gains for memory-bound workloads by computing inside memory arraysPaper
Processor In MemoryNeural-PIM: Efficient PIM with Neural Approximation of Peripherals (2022)Uses RRAM crossbars and approximation circuitsShows how analog compute in memory can accelerate neural networks while cutting conversion overheadPaper
Processing Near MemoryA Modern Primer on Processing In Memory (Conceptual framework)Defines PIM vs PNM in stacked memory systemsClarifies architectural boundaries and highlights PNM integration paths in 3D memoryPaper
Processing Near MemoryAnalysis of Real Processing In Memory Hardware (2021)Evaluates DRAM with adjacent compute coresProvides methods used widely in PNM evaluation for bandwidth and workload behaviorPaper

This comparison above captures both experimental implementations and architectural frameworks.

Together, they show how PIM pushes compute directly into memory structures, while PNM enables more flexible acceleration by placing logic close to high-bandwidth memory.


Implications And When Each Approach Can Benefit

Processor-In-Memory is often most useful when the workload is highly repetitive and dominated by simple arithmetic on large matrices. Examples include neural network inference and certain scientific operations. Since operations occur in memory, energy savings can be substantial. However, PIM is less suitable for general-purpose tasks that require flexible instruction sets or complex branching.

Processing-Near-Memory is a more adaptable option for systems that need performance improvements but cannot redesign memory cells. It supports tasks such as training large AI models, running recommendation engines, and accelerating analytics pipelines. Because PNM units are programmable, they can handle a broader range of workloads while still providing shorter data paths than traditional processors.

Image Credit: Computing Landscape Review

In real systems, both approaches may coexist. PIM might handle dense linear algebra while PNM handles control logic, preprocessing, and other mixed operations. The choice depends on workload structure, system integration limits, and power budgets. As AI becomes embedded in more devices, from data centers to edge sensors, these hybrids create new ways to deliver faster responses at lower energy.


The Direction Forward

The movement toward Processor-In-Memory and Processing-Near-Memory signals a larger architectural shift across the semiconductor world. Instead of treating compute and memory as separate units connected by wide interfaces, the industry is exploring tightly coupled designs that reflect the actual behavior of modern AI workloads. This shift helps push past the limits of conventional architectures and opens new opportunities for performance scaling.

As more applications rely on real-time analytics, foundation models, and data-intensive tasks, the pressure on memory systems will continue to increase. Designs that bring compute closer to data are becoming essential to maintaining progress. Whether through in-memory operations or near-memory acceleration, these ideas point toward a future where data movement becomes a manageable cost rather than a fundamental barrier.

The direction is clear. To support the next generation of AI and computing systems, the computing industry is rethinking distance, energy, and data flow at the chip level. Processor-In-Memory and Processing-Near-Memory represent two critical steps in that journey, reshaping how systems are built and how performance is achieved.


Chetan Arvind Patil

Chetan Arvind Patil

                Hi, I am Chetan Arvind Patil (chay-tun – how to pronounce), a semiconductor professional whose job is turning data into products for the semiconductor industry that powers billions of devices around the world. And while I like what I do, I also enjoy biking, working on few ideas, apart from writing, and talking about interesting developments in hardware, software, semiconductor and technology.

COPYRIGHT

2026

, CHETAN ARVIND PATIL

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. In other words, share generously but provide attribution.

DISCLAIMER

Opinions expressed here are my own and may not reflect those of others. Unless I am quoting someone, they are just my own views.

RECENT POSTS

Get In

Touch