Skip to content

SimpLL: Equivalence slicer#182

Open
tmalecova wants to merge 3 commits intodiffkemp:masterfrom
tmalecova:code_slicing
Open

SimpLL: Equivalence slicer#182
tmalecova wants to merge 3 commits intodiffkemp:masterfrom
tmalecova:code_slicing

Conversation

@tmalecova
Copy link
Copy Markdown
Contributor

@tmalecova tmalecova commented Dec 9, 2020

This pull request contains an implementation of the static backward slicer. It is able to remove semantically equal instructions of compared functions that are not equal. The program slices of compared functions consist of different instructions and also recursively added operand of these differences.
PR also contains an implementation of new option --equivalence-slicer and 25 regression tests.

@tmalecova tmalecova marked this pull request as draft December 11, 2020 07:59
@tmalecova tmalecova force-pushed the code_slicing branch 6 times, most recently from 2d0bfb4 to 522d555 Compare April 24, 2021 19:49
@tmalecova tmalecova marked this pull request as ready for review April 29, 2021 16:29
@tmalecova tmalecova marked this pull request as draft April 29, 2021 16:32
@tmalecova tmalecova force-pushed the code_slicing branch 6 times, most recently from 67daa79 to a5188b4 Compare May 8, 2021 17:39
@tmalecova tmalecova marked this pull request as ready for review May 8, 2021 18:07
Copy link
Copy Markdown
Collaborator

@lenticularis39 lenticularis39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general idea of the algorithm is good, but there are a few things that could possibly be improved in the implementation.

The ControlFlowSlicer class contains a lot of duplicate code from VarDependencySlicer, some of which is not even used (like checkDependency). The code that is used by both VarDependencySlicer and EquivalenceSlicer should be separated into a single class and used by both (e.g. by modifying VarDependencySlicer to use ControlFlowSlicer), not just duplicated.

As of the slicing itself - did you consider implementing the synchronization part in DifferentialFunctionComparator (extending the comparison already implemented there), leaving only the slicing in EquivalenceSlicer? (Also see my comments in EquivalenceSlicer.cpp about the comparison.)

fComp->getSnMaps().first->erase(fComp->DifferingInstructions.first);
fComp->getSnMaps().second->erase(fComp->DifferingInstructions.second);

while (!QL.empty() && !QR.empty()) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really necessary to repeat the function comparison from the start here? Besides the walk order (DFS vs BFS) and comparing the basic blocks instruction by instruction (falling back to the more sophisticated cmpBasicBlocks only of a difference is found) I can't see much difference between this comparison and the one in DifferentialFunctionComparator. Resuming the comparison from where DifferentialFunctionComparator left (if it's possible) would improve performance by not having to compare the same code twice.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure here, but I think that we came to an agreement that we need to use a different approach to the comparison from what is in DifferentialFunctionComparator.

On the other hand, I agree that we should try to resume the comparison from where the DFC left, but I'm not sure if that's possible here. Will have to think about this more.

@viktormalik
Copy link
Copy Markdown
Collaborator

The ControlFlowSlicer class contains a lot of duplicate code from VarDependencySlicer, some of which is not even used (like checkDependency). The code that is used by both VarDependencySlicer and EquivalenceSlicer should be separated into a single class and used by both (e.g. by modifying VarDependencySlicer to use ControlFlowSlicer), not just duplicated.

I agree with this, move all the common code into ControlFlowSlicer (or whatever new name we use) and then make the two other slicers extend that class.

As of the slicing itself - did you consider implementing the synchronization part in DifferentialFunctionComparator (extending the comparison already implemented there), leaving only the slicing in EquivalenceSlicer? (Also see my comments in EquivalenceSlicer.cpp about the comparison.)

I'd rather have this separated. DifferentialFunctionComparator is already large enough, there's no need to extend it with new functionality.

Copy link
Copy Markdown
Collaborator

@viktormalik viktormalik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started with a high-level review only.

Probably the main problem that I can see now is that the slicing algorithm is very long and complex, and therefore hard to understand. What would probably help here would be to split it into multiple, conveniently named functions.

Let's start with that (and with the current review from @lenticularis39) and then I'll do another round.

@tmalecova tmalecova force-pushed the code_slicing branch 2 times, most recently from 772c4fa to 6716e1b Compare September 26, 2021 21:26
Removes semantically equivalent parts of compared functions.
Checks each basic block, when the difference is found, it tries
to find subsequent synchronisation. Produces program slices by
keeping only instructions that are different. Recursively keeps
also their operands. There were necessary edits in DFC and MC.
Created new ControlFlowUtils that contains functions from
VarDependencySlicer for creating correct slices.
Added new option for enabling equivalence slicing.
Added 27 (RHEL7 and RHEL8) tests for checking correctness of
equivalence slicing. New directory sliced_functions/ contains
model slices of old and new version fo compared functions.
@tmalecova
Copy link
Copy Markdown
Contributor Author

I updated the PR according to your reviews. I am aware of failures in tests, and also of missing comments in the code. I will work on it, but during that, feel free to make another round of reviews.

Copy link
Copy Markdown
Collaborator

@viktormalik viktormalik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the delay, but the PR is very large and difficult to review.

I listed a lot of stuff that could be improved, what I'd mainly like to see is even better structure of the equivalence slicing code - ideally separate the individual phases into functions.

I didn't review the tests, yet.

Also, what I don't like about this PR is the small number of commits. The first commit has ~3000 changes, it would be good if we were able to split it into multiple commits (as many as possible, while keeping the tests passing after each commit).

:param control_flow_only: Check only for control-flow differences.
:param verbosity: Verbosity level.
:param semdiff_tool: Tool to use for semantic diff
:param equivalence_slicer: Remove equivalent parts of functions.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:param equivalence_slicer: Remove equivalent parts of functions.
:param equivalence_slicer: Remove equivalent parts of the compared functions.


using namespace llvm;

class EquivalenceSlicer : public CFGSlicer {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to extend CFGSlicer here when the class contains CFL and CFR?

#include <llvm/Transforms/Utils/BasicBlockUtils.h>
#include <llvm/Transforms/Utils/Cloning.h>

void EquivalenceSlicer::slice(DifferentialFunctionComparator *fComp) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This algorithm is rather complicated, it would be nice to add a longer comment explaining its basic steps here.

Comment on lines +122 to +125
std::pair<DenseMap<const Value *, int> *, DenseMap<const Value *, int> *>
getSnMaps();
/// Returns actual size of synchronization maps
int getSizeOfMaps() const;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As was already mentioned in one of the previous reviews, these methods can be replaced by direct access to sn_map* since EquivalenceSlicer is declared as a friend class.

Comment on lines +48 to +49
friend class EquivalenceSlicer;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to the bottom of the class.

Comment on lines +214 to +227
IncludedInstrsTemp.insert(&*InstR);
IncludedBasicBlocksTemp.insert(BBR);
if (InstR->isTerminator()) {
// Keep finding synchronization
addSuccesors(BBR, &QR, &pushed_QRB);

if (!QR.empty()) {
// Get next right BB
BBR = getNextBB(&QR);
InstR = BBR->begin();
continue;
}
}
++InstR;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This piece of code is very similar to the one below. Can we perhaps make a function of it?

Comment on lines +329 to +335
if (DFC->maySkipInstruction(&Inst)) {
continue;
}
if (isDebugInfo(*&Inst)) {
continue;
}
CFG->addToIncluded(&Inst);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (DFC->maySkipInstruction(&Inst)) {
continue;
}
if (isDebugInfo(*&Inst)) {
continue;
}
CFG->addToIncluded(&Inst);
if (!DFC->maySkipInstruction(&Inst) && !isDebugInfo(*&Inst))
CFG->addToIncluded(&Inst);

Comment on lines +222 to +226
// Equivalence slicing is done only when it is given in options,
// it is enabled by parameter EqSlicerEnabled to prevent slicing of
// called functions during the slicing, the currently slicing
// function is the given one by opt, and of course the functions
// are not equal
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice comment but I'd suggest putting the conditions in a list:

Equivalence slicing is only done if:
- .... and
- .... and
- ....

}

/// Removes unneeded instructions and keep the control flow
void CFGSlicer::clearFunction(Function &Fun) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sliceFunction would be a better name, I believe


/// Determines which additional instructions we need to produce a valid CFG
/// Recursively adds all instruction operands to included
void CFGSlicer::addAdditionalInsts(Function &Fun) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a better name would be: addNecessaryInsts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants