Skip to content

Do not merge requests and responses in completers#682

Open
akihikodaki wants to merge 3 commits intoChampSim:developfrom
akihikodaki:channel
Open

Do not merge requests and responses in completers#682
akihikodaki wants to merge 3 commits intoChampSim:developfrom
akihikodaki:channel

Conversation

@akihikodaki
Copy link
Copy Markdown
Contributor

This is a follow-up of #618.

Traditionally, ChampSim had several mechanisms to merge memory transactions. This duplication of merging operations makes it complicated to instrument for Kanata, an instruction pipeline visualizer (please also see #594). Even without Kanata, it adds extra code complexity and hurts simulation accuracy because the cost of duplicate merging operations are not accounted.

A path of memory transaction in ChampSim can be understood as it consists of the following:

  1. Requester
  2. Channel
  3. Completer

#618 removed duplicate merging operations in channel (2). This pull request removes merging operations in completers (3). In other words, this pull request ensures that the requester always sees one response for each request it makes, just as a real hardware protocol like AXI do.

With this change, merging will only happen at the requester. More concretely, O3_CPU fetches multiple instructions with one request. CACHE allocates an MSHR for multiple upstream requests with overlapping addresses and issues one corresponding request to the downstream.

As I have done in #618, I evaluated the changes with the DPC-3 traces:
https://github.dev/akihikodaki/e/blob/fd8e72e83fa95aa41e308de2d7c4e0b148478778/stats.ipynb
The "timeline" column at the bottom left allows you to compare commits.

Notably, commit d048f26 dramatically reduced the IPC by about 4 %. This is because the bandwidth consumption for responses that used to be merged is now properly modeled.

CACHE and DRAM_CHANNEL merge requests and return a response for each
group of merged packets, saving their bandwidth.

In my understanding, it is unrealistic to save bandwidth this way
because a requester will need to search all requests in its queue and
update them at once. BOOMv3, an open-source processor, does not seem to
implement it either.

Return responses for all merged packets in a group and make sure all of
them consume the bandwidth.
O3_CPU, CACHE, and PageTableWalker finishes multiple memory requests at
once if their lower levels return one response witch a matching address.

In my understanding, finishing multiple memory requests at once is
difficult for hardware as it requires to search all requests in queues
and update them at once. BOOMv3, an open-source processor, does not seem
to have such a behavior.

Change these classes to finish only one memory request for each
response.
PageTableWalker::mshr_type::to_return always has zero or one element
so change its type from std::vector<std::deque<response_type>*> to
std::deque<response_type>* to save extra dynamic memory allocations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant