Implement a simple algorithm that transposes a no-symmetric matrix of size NxN.
- The algorithm takes the dimension of the matrix as input (use a power of 2).
- For example,
./transpose 10transposes a 2^10 x 2^10 matrix - Measure the Effective bandwidth of your implementation by using
-00 –O1 –O2 –O3options. Analyze the cache behavior.
To reduce recording errors and OS noise, multiple executions were performed so that the total exec. time was
In this assignment we study how the system behaves with different matrix transpose implementations. Both a simple (sequential, with nested loops) and block implementation are visualized to better understand cache misses depending on how we access the data.
TL:DR simple method workds better when
More info in report.pdf.
![]() Effective Bandwidth |
![]() Execution time |
![]() L1 miss rate |
![]() D1 miss rate |
![]() LL miss rate |
![]() D references |
| Processor Specs | |
|---|---|
|
Model: AMD Ryzen 5 5600X Architecture: x86 Clock Speeds: 3.7 GHz base, 4.6 GHz boost Cache Levels: L1 384 KB, L2 3 MB, L3 32 MB Cores, Threads: 6, 12 |
|
| Memory Specs | |
|---|---|
|
Type: DDR4 Size: 16 GB Speed: 3200 MHz Memory Channels: Dual Channel DOCP/AMP/XMP: DOCP 3200 MHz |
|
├── launcher.sh # [SLOW] Automatic experiment launcher, do not use in this repo as here there's no plot fn
├── log.txt # Param for `launcher.sh`
├── main.pdf # Project report
├── Makefile
├── plot # Images used in the report
│ ├── _bandwidth.png
│ ├── D1 miss rate.png
│ ├── double_bandwidth.png
│ ├── double_time.png
│ ├── D references.png
│ ├── float_bandwidth.png
│ ├── float_time.png
│ ├── L1 miss rate.png
│ ├── LL miss rate.png
│ └── _time.png
├── README.md
├── src # My library code
│ ├── matrix.cc
│ ├── matrix.h
│ ├── utils.cc
│ └── utils.h
└── transpose.cc # Main file





