A1

Assignment instructions

Implement a simple algorithm that transposes a no-symmetric matrix of size NxN.

The algorithm takes the dimension of the matrix as input (use a power of 2).
For example, ./transpose 10 transposes a 2^10 x 2^10 matrix
Measure the Effective bandwidth of your implementation by using -00 –O1 –O2 –O3 options. Analyze the cache behavior.

Runs

To reduce recording errors and OS noise, multiple executions were performed so that the total exec. time was $~1.5s$.

Results

In this assignment we study how the system behaves with different matrix transpose implementations. Both a simple (sequential, with nested loops) and block implementation are visualized to better understand cache misses depending on how we access the data.

TL:DR simple method workds better when $N \le 10^6$, while block method when $N > 10^6$; prefetching may improve performances when the matrix is big (D1 miss rate, most of the misses are due to write miss).

More info in report.pdf.

Effective Bandwidth	Execution time
L1 miss rate	D1 miss rate
LL miss rate	D references

System info

Processor Specs
Model: AMD Ryzen 5 5600X Architecture: x86 Clock Speeds: 3.7 GHz base, 4.6 GHz boost Cache Levels: L1 384 KB, L2 3 MB, L3 32 MB Cores, Threads: 6, 12

Memory Specs
Type: DDR4 Size: 16 GB Speed: 3200 MHz Memory Channels: Dual Channel DOCP/AMP/XMP: DOCP 3200 MHz

Dir structure

├── launcher.sh                 # [SLOW] Automatic experiment launcher, do not use in this repo as here there's no plot fn
├── log.txt                     # Param for `launcher.sh`
├── main.pdf                    # Project report
├── Makefile                    
├── plot                        # Images used in the report
│   ├── _bandwidth.png
│   ├── D1 miss rate.png
│   ├── double_bandwidth.png
│   ├── double_time.png
│   ├── D references.png
│   ├── float_bandwidth.png
│   ├── float_time.png
│   ├── L1 miss rate.png
│   ├── LL miss rate.png
│   └── _time.png
├── README.md
├── src                         # My library code
│   ├── matrix.cc
│   ├── matrix.h
│   ├── utils.cc
│   └── utils.h
└── transpose.cc                # Main file

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Assignment instructions

Runs

Results

System info

Dir structure

Name		Name	Last commit message	Last commit date
parent directory ..
plot		plot
src		src
Makefile		Makefile
README.md		README.md
launcher.sh		launcher.sh
log.txt		log.txt
report.pdf		report.pdf
transpose.cc		transpose.cc

FilesExpand file tree

A1

Directory actions

More options

Directory actions

More options

Latest commit

History

A1

Folders and files

parent directory

README.md

Assignment instructions

Runs

Results

System info

Dir structure