Do not merge requests and responses in completers by akihikodaki · Pull Request #682 · ChampSim/ChampSim

akihikodaki · 2026-01-16T05:47:20Z

This is a follow-up of #618.

Traditionally, ChampSim had several mechanisms to merge memory transactions. This duplication of merging operations makes it complicated to instrument for Kanata, an instruction pipeline visualizer (please also see #594). Even without Kanata, it adds extra code complexity and hurts simulation accuracy because the cost of duplicate merging operations are not accounted.

A path of memory transaction in ChampSim can be understood as it consists of the following:

Requester
Channel
Completer

#618 removed duplicate merging operations in channel (2). This pull request removes merging operations in completers (3). In other words, this pull request ensures that the requester always sees one response for each request it makes, just as a real hardware protocol like AXI do.

With this change, merging will only happen at the requester. More concretely, O3_CPU fetches multiple instructions with one request. CACHE allocates an MSHR for multiple upstream requests with overlapping addresses and issues one corresponding request to the downstream.

As I have done in #618, I evaluated the changes with the DPC-3 traces:
https://github.dev/akihikodaki/e/blob/fd8e72e83fa95aa41e308de2d7c4e0b148478778/stats.ipynb
The "timeline" column at the bottom left allows you to compare commits.

Notably, commit d048f26 dramatically reduced the IPC by about 4 %. This is because the bandwidth consumption for responses that used to be merged is now properly modeled.

CACHE and DRAM_CHANNEL merge requests and return a response for each group of merged packets, saving their bandwidth. In my understanding, it is unrealistic to save bandwidth this way because a requester will need to search all requests in its queue and update them at once. BOOMv3, an open-source processor, does not seem to implement it either. Return responses for all merged packets in a group and make sure all of them consume the bandwidth.

O3_CPU, CACHE, and PageTableWalker finishes multiple memory requests at once if their lower levels return one response witch a matching address. In my understanding, finishing multiple memory requests at once is difficult for hardware as it requires to search all requests in queues and update them at once. BOOMv3, an open-source processor, does not seem to have such a behavior. Change these classes to finish only one memory request for each response.

PageTableWalker::mshr_type::to_return always has zero or one element so change its type from std::vector<std::deque<response_type>*> to std::deque<response_type>* to save extra dynamic memory allocations.

akihikodaki added 3 commits January 14, 2026 05:49

Replace vector with pointer in PageTableWalker

20a8464

PageTableWalker::mshr_type::to_return always has zero or one element so change its type from std::vector<std::deque<response_type>*> to std::deque<response_type>* to save extra dynamic memory allocations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not merge requests and responses in completers#682

Do not merge requests and responses in completers#682
akihikodaki wants to merge 3 commits intoChampSim:developfrom
akihikodaki:channel

akihikodaki commented Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

akihikodaki commented Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant