Skip to content

New stable CODES version#239

Merged
helq merged 198 commits intomasterfrom
develop
Jul 23, 2025
Merged

New stable CODES version#239
helq merged 198 commits intomasterfrom
develop

Conversation

@helq
Copy link
Copy Markdown
Member

@helq helq commented Jul 23, 2025

This new version comes with:

helq and others added 30 commits December 6, 2022 18:02
After compiling, if ctest is enabled, one can simply run:

> cd build/
> ctest

And all tests will be run.

CTest runs the same tests as the autoconf version except for the
archived models (under `src/network-workloads/archived/`), which are not
compiled by CMake.
To test the changes, I've used the synthetic ping pong example, which
has been modified to allow for a more random pattern.

Printing to screen the delay of each packet is a temporal change.
Previously, the latency (delay) of the packet was assumed to be the
latency of the last chunk to arrive at the destination terminal. This is
wrong. We must store the time at which the first chunk is sent and the
time at which the last chunk is received. This change paves the way to
implement a strategy to feed a predictor with latencies in the order in
which the packets where sent (not delievered).
`g_is_surrogate_on` turns on or off the behaviour. When off, the
simulation runs as usual. When it is on, packages are sent directly to
the terminal destination skipping the network completely
- This needs ROSS commit 178e3c0
- The director is a function that it's called at GVT by ROSS
- Average latency predictor implemented
Now we can configure the surrogate via the config .conf file. All
workloads using dragonfly-dally have access to the surrogate now!
This requires the introduction of zombie events which inform the
terminals of what packets to ignore but to keep simulating their
behaviour anyway.
This is buggy. With some models this will work, with others it will
definetely not. The problem resides on model-net's complexity. Knowing
when to trigger the "next event" event is dependent on the state of
model-net and its future messages already in the queue (the most
important of which is the workload's new event)
Conflicts:
	src/networks/model-net/dragonfly-dally.C
helq added 29 commits June 16, 2025 16:21
…ke a different name

The idea of this change is to be able to have a configuration file like:

```
20 milc1 1 0
15 conceptual-jacobi3d-5 1 0
```

While the workload_json_files allow us to tell CODES where to look for
the json configuration files:

```
milc1 path-to/milc1.json
conceptual-jacobi3d-5 path-to/my-conceptual-jacobi3d.json
```
Replaced fscanf loop with fgets/sscanf to handle trailing newlines
consistently across systems (this bug was silently showing up in the
GHC200 system). Also added error reporting for malformed
lines.

btw, this code was written by Claude and audited by me ;)
This merge brings three major changes:
- The hardening of the reverse handlers and thus the removal of all
  non-determinism
- The full implementation of an application director for mpi-replay, so
  that simulations can be accelerated
- The connection of the old network surrogate to the application
  surrogate
Autoconf is now far too outdated and keeping it on synch with the
changes made in the CMakefile
@helq helq merged commit 4a055c2 into master Jul 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants