Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Assignment instructions

Implement a simple algorithm that transposes a no-symmetric matrix of size NxN.

  • The algorithm takes the dimension of the matrix as input (use a power of 2).
  • For example, ./transpose 10 transposes a 2^10 x 2^10 matrix
  • Measure the Effective bandwidth of your implementation by using -00 –O1 –O2 –O3 options. Analyze the cache behavior.

Runs

To reduce recording errors and OS noise, multiple executions were performed so that the total exec. time was $~1.5s$.

Results

In this assignment we study how the system behaves with different matrix transpose implementations. Both a simple (sequential, with nested loops) and block implementation are visualized to better understand cache misses depending on how we access the data.

TL:DR simple method workds better when $N \le 10^6$, while block method when $N > 10^6$; prefetching may improve performances when the matrix is big (D1 miss rate, most of the misses are due to write miss).

More info in report.pdf.


Effective Bandwidth

Execution time

L1 miss rate

D1 miss rate

LL miss rate

D references

System info

Processor Specs
Model: AMD Ryzen 5 5600X
Architecture: x86
Clock Speeds: 3.7 GHz base, 4.6 GHz boost
Cache Levels: L1 384 KB, L2 3 MB, L3 32 MB
Cores, Threads: 6, 12
Memory Specs
Type: DDR4
Size: 16 GB
Speed: 3200 MHz
Memory Channels: Dual Channel
DOCP/AMP/XMP: DOCP 3200 MHz

Dir structure

├── launcher.sh                 # [SLOW] Automatic experiment launcher, do not use in this repo as here there's no plot fn
├── log.txt                     # Param for `launcher.sh`
├── main.pdf                    # Project report
├── Makefile                    
├── plot                        # Images used in the report
│   ├── _bandwidth.png
│   ├── D1 miss rate.png
│   ├── double_bandwidth.png
│   ├── double_time.png
│   ├── D references.png
│   ├── float_bandwidth.png
│   ├── float_time.png
│   ├── L1 miss rate.png
│   ├── LL miss rate.png
│   └── _time.png
├── README.md
├── src                         # My library code
│   ├── matrix.cc
│   ├── matrix.h
│   ├── utils.cc
│   └── utils.h
└── transpose.cc                # Main file