The Semiconductor Shift Toward Processor-In-Memory And Processing-Near-Memory

By Chetan Arvind Patil

| Published On: December 6, 2025

Image Generated Using Nano Banana

Reliance Of AI And Data Workloads On Computer Architecture

AI and modern data workloads have transformed how we think about computing systems. Traditional processors were designed for sequential tasks and moderate data movement. Today’s AI models work with enormous datasets and large numbers of parameters that must move constantly between memory and compute units. This movement introduces delays and consumes significant energy. As a result, memory bandwidth and the distance to the data have become major performance bottlenecks.

Graphics processors, tensor accelerators, and custom architectures try to address these issues by increasing parallelism. Yet, parallel computing alone cannot solve the challenge if data cannot reach the compute units fast enough. The cost of moving data inside a system is now often higher than the cost of the computation itself.

This places the spotlight on the relationship between compute location, memory hierarchy, and data flow. As models grow in size and applications demand faster responses, the gap between processor speed and memory access continues to widen.

The computing industry often refers to this as the memory wall. When AI tasks require moving gigabytes of data per operation, each additional millimeter of distance within a chip or package matters. To break this pattern, new approaches look at placing compute engines closer to where data is stored.

This shift has sparked interest in Processor In-Memory and Processing Near-Memory solutions.

Instead of pulling data along long paths, the system reorganizes itself so that computation occurs either within the memory arrays or very close to them. This architectural change aims to reduce latency, cut energy use, and support the growing scale of AI workloads.

What Is Processor-In-Memory And Processing-Near-Memory

Processor-In-Memory places simple compute units directly inside memory arrays. The idea is to perform certain operations, such as multiplication and accumulation, inside the storage cells or peripheral logic. By doing this, data does not need to travel to a separate processor. This can lead to significant improvements in throughput and reductions in energy consumption for specific AI tasks, especially those involving matrix operations.

Processing-Near-Memory keeps memory arrays unchanged but integrates compute units very close to them, usually on the same stack or interposer. These compute units are not inside the memory but sit at a minimal distance from it. This enables faster data access than traditional architectures without requiring significant changes to memory cell structures. PNM often offers a more flexible design path because memory vendors do not need to modify core-array technology.

Here is a simple comparison of the two approaches.

Feature	Processor-In-Memory	Processing-Near-Memory
Compute location	Inside memory arrays or peripheral logic	Adjacent to memory through same stack or substrate
Memory modification	Requires changes to memory cell or array design	Uses standard memory with added compute units nearby
Data movement	Very low due to in-array operation	Low because compute is positioned close to data
Flexibility	Limited to specific operations built into memory	Wider range of compute tasks possible
Technology maturity	Still emerging and specialized	More compatible with existing memory roadmaps

Both approaches challenge the long-standing separation between computing and storage. Instead of treating memory as a passive container for data, they treat it as an active part of the computation pipeline. This helps systems scale with the rising demands of AI without relying entirely on larger, more power-hungry processors.

Research Efforts For Processor In Memory And Processing Near Memory

Research activity in this area has grown quickly as AI and data workloads demand new architectural ideas. Both Processor In Memory and Processing Near Memory have attracted intense attention from academic and industrial groups. PIM work often focuses on reducing data movement by performing arithmetic inside or at the edge of memory arrays. At the same time, PNM research explores programmable compute units placed near memory stacks to improve bandwidth and latency.

The selected examples below show how each direction is pushing the boundaries of energy efficiency, scalability, and workload suitability.

Category	Example Work	Key Focus	What It Demonstrates	Link
Processor In Memory	SparseP: Efficient Sparse Matrix Vector Multiplication on Real PIM Systems (2022)	Implements SpMV on real PIM hardware	Shows strong gains for memory-bound workloads by computing inside memory arrays	Paper
Processor In Memory	Neural-PIM: Efficient PIM with Neural Approximation of Peripherals (2022)	Uses RRAM crossbars and approximation circuits	Shows how analog compute in memory can accelerate neural networks while cutting conversion overhead	Paper
Processing Near Memory	A Modern Primer on Processing In Memory (Conceptual framework)	Defines PIM vs PNM in stacked memory systems	Clarifies architectural boundaries and highlights PNM integration paths in 3D memory	Paper
Processing Near Memory	Analysis of Real Processing In Memory Hardware (2021)	Evaluates DRAM with adjacent compute cores	Provides methods used widely in PNM evaluation for bandwidth and workload behavior	Paper

This comparison above captures both experimental implementations and architectural frameworks.

Together, they show how PIM pushes compute directly into memory structures, while PNM enables more flexible acceleration by placing logic close to high-bandwidth memory.

Implications And When Each Approach Can Benefit

Processor-In-Memory is often most useful when the workload is highly repetitive and dominated by simple arithmetic on large matrices. Examples include neural network inference and certain scientific operations. Since operations occur in memory, energy savings can be substantial. However, PIM is less suitable for general-purpose tasks that require flexible instruction sets or complex branching.

Processing-Near-Memory is a more adaptable option for systems that need performance improvements but cannot redesign memory cells. It supports tasks such as training large AI models, running recommendation engines, and accelerating analytics pipelines. Because PNM units are programmable, they can handle a broader range of workloads while still providing shorter data paths than traditional processors.

*Image Credit: Computing Landscape Review*

In real systems, both approaches may coexist. PIM might handle dense linear algebra while PNM handles control logic, preprocessing, and other mixed operations. The choice depends on workload structure, system integration limits, and power budgets. As AI becomes embedded in more devices, from data centers to edge sensors, these hybrids create new ways to deliver faster responses at lower energy.

The Direction Forward

The movement toward Processor-In-Memory and Processing-Near-Memory signals a larger architectural shift across the semiconductor world. Instead of treating compute and memory as separate units connected by wide interfaces, the industry is exploring tightly coupled designs that reflect the actual behavior of modern AI workloads. This shift helps push past the limits of conventional architectures and opens new opportunities for performance scaling.

As more applications rely on real-time analytics, foundation models, and data-intensive tasks, the pressure on memory systems will continue to increase. Designs that bring compute closer to data are becoming essential to maintaining progress. Whether through in-memory operations or near-memory acceleration, these ideas point toward a future where data movement becomes a manageable cost rather than a fundamental barrier.

The direction is clear. To support the next generation of AI and computing systems, the computing industry is rethinking distance, energy, and data flow at the chip level. Processor-In-Memory and Processing-Near-Memory represent two critical steps in that journey, reshaping how systems are built and how performance is achieved.

Chetan Arvind Patil

Hi, I am Chetan Arvind Patil (chay-tun – how to pronounce), a semiconductor professional whose job is turning data into products for the semiconductor industry that powers billions of devices around the world. And while I like what I do, I also enjoy biking, working on few ideas, apart from writing, and talking about interesting developments in hardware, software, semiconductor and technology.

COPYRIGHT

2026

, CHETAN ARVIND PATIL

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. In other words, share generously but provide attribution.

DISCLAIMER

Opinions expressed here are my own and may not reflect those of others. Unless I am quoting someone, they are just my own views.

The Strategic Crossroads Of AI SoC Development

Nano Banana Strategic Context AI SoC development is now a board-level strategic choice, not just a technical decision. The

January 24, 2026

The Semiconductor Foundations To Drive Data Center Networking

Nano Banana Data Center Networking Became A Silicon Problem Data center networking has moved from a background enabler to

January 17, 2026

The Key Pillars Of Compute Infrastructure Built On Semiconductor Solutions

Nano Banana Compute Infrastructure For New-Age Applications Modern workloads are AI-heavy, data-intensive, latency-sensitive, and increasingly distributed. These characteristics are