Add definitions of macros to `diffkemp-out.yaml` by PLukas2018 · Pull Request #347 · diffkemp/diffkemp

PLukas2018 · 2024-06-21T10:37:08Z

This PR contains the last but one part of #314 (remains enhancements of viewer for visualisation of code/diff for macro-function/function-macro differences.
The PR adds extracted information about the location of macro definitions to the diffkemp-out.yaml file which is created when diffkemp compare is run. Thanks to the added macro definitions (and thanks to previously merged PRs) it will be now possible to see the code of the macros in the result viewer.

PLukas2018 · 2024-06-25T10:23:06Z

diffkemp/output.py

        self.function_names = set()
-        self.macro_names = set()
        # Note: type_names contains tuples (type name, parent name)
+        #   parent name - name of function in which type is used


Relevant comment from #314:

Could be a macro, right?

Suggested change

# parent name - name of function in which type is used

# parent name - name of function/macro in which type is used

I think, that currently, we are reporting type differences only from functions. Even if the type was originally used in a macro, the call stacks report type differences from the function in which the macro with the differing type was used. So the parent name (meaning the name of the previous call from the call stack) for type/struct should always be a function.

FrNecas

Looks good as far as I can tell, just a minor nitpick (obviously I may be missing some context of the previous PRs related to macros)

FrNecas · 2024-06-26T07:37:19Z

diffkemp/output.py

+        if called_res.first.callstack:
+            macro_defs, diff_macro_info = called_res.first.callstack \
+                .get_macro_defs(compared_fun)
+            self._add_definitions(macro_defs, "old")


Hm, I am not a big fan of the "old", "new", "kind" string literal hard-coding. It would be nice if we could use a string enum for this instead but I am not sure which python versions diffkemp aims to support (we are still supporting something old, aren't we?). If that is the case, feel free to ignore this.

Thanks for the review. I replaced the old/new with boolean parameter is_old. For the kind I used enum which inherits from str, Enum (this should be available from Python 3.4), I could use StrEnum but it is supported from Python 3.11 and as you said I am also not sure which Python versions we want to support.

Add a new class (`MacroDefinitions`) for extracting definitions of macros so they can be saved into `diffkemp-out.yaml` file. When processing call stacks definitions of macros (all except the differing macro) are extracted from them. Also, the name of the differing macro is extracted from the call stacks and the name of the function/vertex (which should contain detailed information about the macro). Using this information the definition of differing macro is then extracted from the `SyntaxDiff` object saved in vertex of ComparisonGraph. `_create_def_info` method of the `YamlOutput` class was changed to function so it can also be used from the new class (`MacroDefinitions`). Unit tests for the `YamlOutput` class are updated with testing/checking definitions of macros. Because the test fixture in `conftest.py` file does not contain an absolute path to files for symbols it was necessary to mock some `os.path` functions. Also by adding the definitions to the `diffkemp-out.yaml` it is now possible to see the code of the macros in the result viewer.

Replacing hard-coded strings for representing kind of symbols with string enum. Note: In case we will be supporting pyhon 3.11+ we could use `StrEnum` instead of `str, Enum`.

viktormalik

TBH, I'm quite confused by the new code and it seems over-complicated to me at this point. I'm not saying that it's not the best way, it's just that I'm failing to see why that's the case.

viktormalik · 2024-06-29T14:47:34Z

diffkemp/output.py

+        self.new_dir = snapshot_dir_new
+        self.result = result
+        # Information from call stacks for extracting defs for differing macros
+        # - list of tuples (vertex name, differing/last macro name).


Two remarks:

Is this a list or a set?

What does the vertex name correspond to? Maybe it's worth specifying.

viktormalik · 2024-06-29T21:36:54Z

diffkemp/output.py

        self.function_names = set()
-        self.macro_names = set()
        # Note: type_names contains tuples (type name, parent name)
+        #   parent name - name of function in which type is used


Maybe we could call it caller name or caller function name? We probably wouldn't even need the additional explanation comment, then.

viktormalik · 2024-06-29T21:42:39Z

diffkemp/output.py

                    if called_res.second.callstack else []
                # updating symbol names
                if called_res.first.callstack:
+                    # Note: Macro names are not currently used.


Should we drop the field, then? Also an explanation of why the macros are handled differently from functions and types would be very helpful.

Also an explanation of why the macros are handled differently from functions and types would be very helpful.

So for functions, the location of the definition (file and line where the definition is located) is saved directly in Vertex class/object (class representing cached results for a function) in ComparisonGraph.

For types the location of the definition is cached directly in TypeDiff class/object which is accessible from Vertex class/object (from the function which contains/uses the differing type).

In the case of the macros, the location of definitions is not cached directly but it is possible to extract them from macros callstack in SyntaxDiff class which is accessible from Vertex class (function which uses macros with differences) or to extract them from the final created callstack (which I currently do). For the differing macro is the location of the definition not possible to extract from callstack but it is cached in SyntaxDiff (PR Caching information about definitions of differing macros #333).

viktormalik · 2024-06-30T15:27:50Z

diffkemp/output.py

+        return self.value
+
+
+class MacroDefinitions:


Why is macro collection done using a class while function and types are handled directly by methods of YamlOutput. Are macro somewhat special that the collection must keep some state?

Why is macro collection done using a class while function and types are handled directly by methods of YamlOutput. Are macro somewhat special that the collection must keep some state?

The macro collection is done in a separate class mostly because the extraction of the definition is a little bit more complex so I created a class for it so the logic is encapsulated, also it was done in reaction to this comment on the original PR #314 (comment)

viktormalik · 2024-06-30T15:32:40Z

tests/unit_tests/result_test.py

-                      result).output
+    # Mocking path.join for the same reason as relpath.
+    # Note: This mock can be problem if diffkemp uses >2 paths in join.
+    def mock_path_join(path1, *path2):


Mocking functions from os.path sounds very dangerous. Are we not able to come up with a better solution?

PLukas2018 · 2024-07-10T09:42:11Z

I'm quite confused by the new code and it seems over-complicated to me at this point. I'm not saying that it's not the best way, it's just that I'm failing to see why that's the case.

Yeah, it is probably over-complicated and probably also not very effective. There are probably better solutions, I did it like this to leave most of the logic the same and just at the end extract the necessary information (location of definitions) from places which contains them. Another ideas of how to do this which comes to my mind:

Use cscope/ctags or something similar to extract the definition. I think, the problem with this would be that the symbols can have multiple definitions which are conditionally 'selected' during compilation. The correct definition could probably be deduced from the information saved in callstacks (using the place at which the symbol is called/used --> the definition must start somewhere above), but there would be still a problem in deducing the correct location of the definition for differing symbols. On the other hand, if we would want in the future to add to the viewer code navigation, we would still need to use something like ctags/cscope...
Very similar solution to the current one -- add to SyntaxDiff class get_definitions method which would extract the definitions from the callstack and which would add the definitions for differing symbols. Then the YamlOutput class would need just to extract from the resulting callstacks names of differing macros and the name of function/vertex that uses the macros (similar to what is done for types). And the location of definitions would be extracted using the get_definitions method on SyntaxDiff class which would be found using the previously mentioned information. This solution would not be probably too much time effective too.
a) Accumulate the definitions already in the SimpLL library and serialize them to output and then cache them in the ComparisonGraph.
b) Caching the definitions in ComparisonGraph when deserializing results from SimpLL library.
Save the definitions to Result class when the functions are compared in function_diff function. In YamlOutput just save the definitions from the Result class.
...

Not sure if all the ideas are implementable, just some options that came to my mind. I will need to more deeply think about this because I am not sure which approach to choose.

viktormalik · 2024-07-10T11:51:35Z

Yeah, it seems that there are many ways to do this. From the first glance, option 3 seems like the best but I think that we should have an in-person discussion on this to better understand what the problem is and what the best solution would be.

PLukas2018 commented Jun 25, 2024

View reviewed changes

FrNecas reviewed Jun 26, 2024

View reviewed changes

PLukas2018 added 2 commits June 26, 2024 10:59

Using Enum for kind of symbol in output.py

e45732f

Replacing hard-coded strings for representing kind of symbols with string enum. Note: In case we will be supporting pyhon 3.11+ we could use `StrEnum` instead of `str, Enum`.

PLukas2018 force-pushed the macro-definitions branch from d34abbb to e45732f Compare June 26, 2024 10:49

PLukas2018 requested a review from FrNecas June 29, 2024 17:41

viktormalik reviewed Jun 30, 2024

View reviewed changes

PLukas2018 marked this pull request as draft July 11, 2024 13:53

PLukas2018 mentioned this pull request Aug 5, 2024

Release 0.6.1 #368

Merged

PLukas2018 self-assigned this Sep 24, 2025

	# parent name - name of function in which type is used
	# parent name - name of function/macro in which type is used

Conversation

PLukas2018 commented Jun 21, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

FrNecas left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

viktormalik left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PLukas2018 commented Jul 10, 2024

Uh oh!

viktormalik commented Jul 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants