Skip to content

Add definitions of macros to diffkemp-out.yaml#347

Draft
PLukas2018 wants to merge 2 commits intodiffkemp:masterfrom
PLukas2018:macro-definitions
Draft

Add definitions of macros to diffkemp-out.yaml#347
PLukas2018 wants to merge 2 commits intodiffkemp:masterfrom
PLukas2018:macro-definitions

Conversation

@PLukas2018
Copy link
Copy Markdown
Collaborator

This PR contains the last but one part of #314 (remains enhancements of viewer for visualisation of code/diff for macro-function/function-macro differences.
The PR adds extracted information about the location of macro definitions to the diffkemp-out.yaml file which is created when diffkemp compare is run. Thanks to the added macro definitions (and thanks to previously merged PRs) it will be now possible to see the code of the macros in the result viewer.

self.function_names = set()
self.macro_names = set()
# Note: type_names contains tuples (type name, parent name)
# parent name - name of function in which type is used
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relevant comment from #314:

Could be a macro, right?

Suggested change
# parent name - name of function in which type is used
# parent name - name of function/macro in which type is used

I think, that currently, we are reporting type differences only from functions. Even if the type was originally used in a macro, the call stacks report type differences from the function in which the macro with the differing type was used. So the parent name (meaning the name of the previous call from the call stack) for type/struct should always be a function.

Copy link
Copy Markdown
Collaborator

@FrNecas FrNecas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good as far as I can tell, just a minor nitpick (obviously I may be missing some context of the previous PRs related to macros)

if called_res.first.callstack:
macro_defs, diff_macro_info = called_res.first.callstack \
.get_macro_defs(compared_fun)
self._add_definitions(macro_defs, "old")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I am not a big fan of the "old", "new", "kind" string literal hard-coding. It would be nice if we could use a string enum for this instead but I am not sure which python versions diffkemp aims to support (we are still supporting something old, aren't we?). If that is the case, feel free to ignore this.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. I replaced the old/new with boolean parameter is_old. For the kind I used enum which inherits from str, Enum (this should be available from Python 3.4), I could use StrEnum but it is supported from Python 3.11 and as you said I am also not sure which Python versions we want to support.

Add a new class (`MacroDefinitions`) for extracting definitions
of macros so they can be saved into `diffkemp-out.yaml` file.

When processing call stacks definitions of macros (all except
the differing macro) are extracted from them. Also, the name of
the differing macro is extracted from the call stacks and the name
of the function/vertex (which should contain detailed information
about the macro). Using this information the definition of differing
macro is then extracted from the `SyntaxDiff` object saved in vertex
of ComparisonGraph.

`_create_def_info` method of the `YamlOutput` class was changed to
function so it can also be used from the new class (`MacroDefinitions`).

Unit tests for the `YamlOutput` class are updated with testing/checking
definitions of macros. Because the test fixture in `conftest.py` file
does not contain an absolute path to files for symbols it was necessary
to mock some `os.path` functions.

Also by adding the definitions to the `diffkemp-out.yaml` it is
now possible to see the code of the macros in the result viewer.
Replacing hard-coded strings for representing kind of symbols
with string enum.

Note: In case we will be supporting pyhon 3.11+ we could use
`StrEnum` instead of `str, Enum`.
@PLukas2018 PLukas2018 requested a review from FrNecas June 29, 2024 17:41
Copy link
Copy Markdown
Collaborator

@viktormalik viktormalik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I'm quite confused by the new code and it seems over-complicated to me at this point. I'm not saying that it's not the best way, it's just that I'm failing to see why that's the case.

self.new_dir = snapshot_dir_new
self.result = result
# Information from call stacks for extracting defs for differing macros
# - list of tuples (vertex name, differing/last macro name).
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two remarks:

  • Is this a list or a set?
  • What does the vertex name correspond to? Maybe it's worth specifying.

self.function_names = set()
self.macro_names = set()
# Note: type_names contains tuples (type name, parent name)
# parent name - name of function in which type is used
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could call it caller name or caller function name? We probably wouldn't even need the additional explanation comment, then.

if called_res.second.callstack else []
# updating symbol names
if called_res.first.callstack:
# Note: Macro names are not currently used.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we drop the field, then? Also an explanation of why the macros are handled differently from functions and types would be very helpful.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also an explanation of why the macros are handled differently from functions and types would be very helpful.

  • So for functions, the location of the definition (file and line where the definition is located) is saved directly in Vertex class/object (class representing cached results for a function) in ComparisonGraph.
  • For types the location of the definition is cached directly in TypeDiff class/object which is accessible from Vertex class/object (from the function which contains/uses the differing type).
  • In the case of the macros, the location of definitions is not cached directly but it is possible to extract them from macros callstack in SyntaxDiff class which is accessible from Vertex class (function which uses macros with differences) or to extract them from the final created callstack (which I currently do). For the differing macro is the location of the definition not possible to extract from callstack but it is cached in SyntaxDiff (PR Caching information about definitions of differing macros #333).

return self.value


class MacroDefinitions:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is macro collection done using a class while function and types are handled directly by methods of YamlOutput. Are macro somewhat special that the collection must keep some state?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is macro collection done using a class while function and types are handled directly by methods of YamlOutput. Are macro somewhat special that the collection must keep some state?

The macro collection is done in a separate class mostly because the extraction of the definition is a little bit more complex so I created a class for it so the logic is encapsulated, also it was done in reaction to this comment on the original PR #314 (comment)

result).output
# Mocking path.join for the same reason as relpath.
# Note: This mock can be problem if diffkemp uses >2 paths in join.
def mock_path_join(path1, *path2):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mocking functions from os.path sounds very dangerous. Are we not able to come up with a better solution?

@PLukas2018
Copy link
Copy Markdown
Collaborator Author

I'm quite confused by the new code and it seems over-complicated to me at this point. I'm not saying that it's not the best way, it's just that I'm failing to see why that's the case.

Yeah, it is probably over-complicated and probably also not very effective. There are probably better solutions, I did it like this to leave most of the logic the same and just at the end extract the necessary information (location of definitions) from places which contains them. Another ideas of how to do this which comes to my mind:

  1. Use cscope/ctags or something similar to extract the definition. I think, the problem with this would be that the symbols can have multiple definitions which are conditionally 'selected' during compilation. The correct definition could probably be deduced from the information saved in callstacks (using the place at which the symbol is called/used --> the definition must start somewhere above), but there would be still a problem in deducing the correct location of the definition for differing symbols. On the other hand, if we would want in the future to add to the viewer code navigation, we would still need to use something like ctags/cscope...
  2. Very similar solution to the current one -- add to SyntaxDiff class get_definitions method which would extract the definitions from the callstack and which would add the definitions for differing symbols. Then the YamlOutput class would need just to extract from the resulting callstacks names of differing macros and the name of function/vertex that uses the macros (similar to what is done for types). And the location of definitions would be extracted using the get_definitions method on SyntaxDiff class which would be found using the previously mentioned information. This solution would not be probably too much time effective too.
  3. a) Accumulate the definitions already in the SimpLL library and serialize them to output and then cache them in the ComparisonGraph.
    b) Caching the definitions in ComparisonGraph when deserializing results from SimpLL library.
  4. Save the definitions to Result class when the functions are compared in function_diff function. In YamlOutput just save the definitions from the Result class.
  5. ...

Not sure if all the ideas are implementable, just some options that came to my mind. I will need to more deeply think about this because I am not sure which approach to choose.

@viktormalik
Copy link
Copy Markdown
Collaborator

Yeah, it seems that there are many ways to do this. From the first glance, option 3 seems like the best but I think that we should have an in-person discussion on this to better understand what the problem is and what the best solution would be.

@PLukas2018 PLukas2018 marked this pull request as draft July 11, 2024 13:53
@PLukas2018 PLukas2018 mentioned this pull request Aug 5, 2024
@PLukas2018 PLukas2018 self-assigned this Sep 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants