Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix parse_cpp_datatype not stripping volatile tag #510

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Henri-J-Norden
Copy link

No description provided.

@sevaa
Copy link
Contributor

sevaa commented Sep 27, 2023

Makes sense. The test corpus was limited.

@Henri-J-Norden
Copy link
Author

Henri-J-Norden commented Sep 27, 2023

I forgot to also add it to cpp_symbols dict for the PR, but I've now noticed that actually places volatile after the variable type... I believe this is technically valid for simple types (I'm using it for C code), but a larger change is probably needed to properly support all cases. From skimming cv (const and volatile) type qualifiers, it seems volatile is also valid wherever const is valid - and there is some special handling of const in TypeDesc.str, which should probably be expanded to deal with either/both. Maybe I should just make an issue instead?

This places the volatile keyword after the variable name - special handling in TypeDesc.__str__ (like it already has for const) may be necessary for complex types.
@sevaa
Copy link
Contributor

sevaa commented Sep 28, 2023

I'll have to do some language layering here. Is "pointer to volatile" a thing?

EDIT: it is. int volatile *p is a valid declaration.

@eliben
Copy link
Owner

eliben commented Oct 16, 2023

@Henri-J-Norden do you have small reproducers that demonstrate this? The ideal would be a simple C file we could compile and observe this on (using pyelftools)

@Henri-J-Norden
Copy link
Author

@eliben Here's code with some (not necessarily reasonable, but certainly compileable) examples, and the elf binary compiled with gcc -O0 -g3 using GCC 11.4.0

const int c;
const int* c_p;
int* const p_c;

// const->array->const->int
const int c_2[2];
// array->pointer->const->int
const int* c_p_2[2];
// const->array->const->pointer->int
int* const p_c_2[2];
// const->array->const->pointer->const->int
const int* const c_p_c_2[2];

// volatile->int
volatile int v;
// const->volatile->int
const volatile int cv;
volatile const int vc;
// pointer->const->volatile->int
const volatile int *cv_p;
volatile const int *vc_p;
// volatile->const->pointer->int
int *const volatile p_cv;
int *volatile const p_vc;
// volatile->const->pointer->const->volatile->int
const volatile int *const volatile cv_p_cv;
const volatile int *volatile const cv_p_vc;
volatile const int *const volatile vc_p_cv;
volatile const int *volatile const vc_p_vc;

// volatile->const->pointer->volatile->const->pointer->const->volatile->int
volatile const int *volatile const *volatile const vc_p_vc_p_vc;

// volatile->const->array->const->volatile->int
volatile const int vc_2[2];
// volatile->const->array->volatile->const->pointer->const->volatile->int
volatile const int *const volatile vc_p_cv_2[2];

int main(void)
{
	return 0;
}

volatile.zip

I added a script for debugging the types in https://github.com/Henri-J-Norden/pyelftools/blob/7fd850b21efaf999654e6880d63f8aff1bfa1258/examples/dwarf_var_types.py
It seems const int c_2[2] also gets incorrectly parsed as const const int[2]. Here's the full output:

examples\dwarf_var_types.py --test ../../a.out 
Processing file: ../../a.out
  Found a compile unit at offset 0, length 721
    Top DIE with tag=DW_TAG_compile_unit
    name=/mnt/c/debug/volatile.c
      > name=b'c' type='const int'
        DIE tag=DW_TAG_const_type type='const int' modifiers=('const',)
          DIE tag=DW_TAG_base_type type='int' modifiers=()
      > name=b'c_p' type='const int *'
        DIE tag=DW_TAG_pointer_type type='const int *' modifiers=('const', 'pointer')
          DIE tag=DW_TAG_const_type type='const int' modifiers=('const',)
            DIE tag=DW_TAG_base_type type='int' modifiers=()
      > name=b'p_c' type='int *const'
        DIE tag=DW_TAG_const_type type='int *const' modifiers=('pointer', 'const')
          DIE tag=DW_TAG_pointer_type type='int *' modifiers=('pointer',)
            DIE tag=DW_TAG_base_type type='int' modifiers=()
      > name=b'c_2' type='const const int[2]'
        DIE tag=DW_TAG_const_type type='const const int[2]' modifiers=('const',)
          DIE tag=DW_TAG_array_type type='const int[2]' modifiers=()
            DIE tag=DW_TAG_const_type type='const int' modifiers=('const',)
              DIE tag=DW_TAG_base_type type='int' modifiers=()
      > name=b'c_p_2' type='const int *[2]'
        DIE tag=DW_TAG_array_type type='const int *[2]' modifiers=()
          DIE tag=DW_TAG_pointer_type type='const int *' modifiers=('const', 'pointer')
            DIE tag=DW_TAG_const_type type='const int' modifiers=('const',)
              DIE tag=DW_TAG_base_type type='int' modifiers=()
      > name=b'p_c_2' type='const int *const[2]'
        DIE tag=DW_TAG_const_type type='const int *const[2]' modifiers=('const',)
          DIE tag=DW_TAG_array_type type='int *const[2]' modifiers=()
            DIE tag=DW_TAG_const_type type='int *const' modifiers=('pointer', 'const')
              DIE tag=DW_TAG_pointer_type type='int *' modifiers=('pointer',)
                DIE tag=DW_TAG_base_type type='int' modifiers=()
      > name=b'c_p_c_2' type='const const int *const[2]'
        DIE tag=DW_TAG_const_type type='const const int *const[2]' modifiers=('const',)
          DIE tag=DW_TAG_array_type type='const int *const[2]' modifiers=()
            DIE tag=DW_TAG_const_type type='const int *const' modifiers=('const', 'pointer', 'const')
              DIE tag=DW_TAG_pointer_type type='const int *' modifiers=('const', 'pointer')
                DIE tag=DW_TAG_const_type type='const int' modifiers=('const',)
                  DIE tag=DW_TAG_base_type type='int' modifiers=()
      > name=b'v' type='volatile '
        DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
          DIE tag=DW_TAG_base_type type='int' modifiers=()
      > name=b'cv' type='const volatile '
        DIE tag=DW_TAG_const_type type='const volatile ' modifiers=('const',)
          DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
            DIE tag=DW_TAG_base_type type='int' modifiers=()
      > name=b'vc' type='const volatile '
        DIE tag=DW_TAG_const_type type='const volatile ' modifiers=('const',)
          DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
            DIE tag=DW_TAG_base_type type='int' modifiers=()
      > name=b'cv_p' type='const volatile  *'
        DIE tag=DW_TAG_pointer_type type='const volatile  *' modifiers=('const', 'pointer')
          DIE tag=DW_TAG_const_type type='const volatile ' modifiers=('const',)
            DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
              DIE tag=DW_TAG_base_type type='int' modifiers=()
      > name=b'vc_p' type='const volatile  *'
        DIE tag=DW_TAG_pointer_type type='const volatile  *' modifiers=('const', 'pointer')
          DIE tag=DW_TAG_const_type type='const volatile ' modifiers=('const',)
            DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
              DIE tag=DW_TAG_base_type type='int' modifiers=()
      > name=b'p_cv' type='volatile '
        DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
          DIE tag=DW_TAG_const_type type='int *const' modifiers=('pointer', 'const')
            DIE tag=DW_TAG_pointer_type type='int *' modifiers=('pointer',)
              DIE tag=DW_TAG_base_type type='int' modifiers=()
      > name=b'p_vc' type='volatile '
        DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
          DIE tag=DW_TAG_const_type type='int *const' modifiers=('pointer', 'const')
            DIE tag=DW_TAG_pointer_type type='int *' modifiers=('pointer',)
              DIE tag=DW_TAG_base_type type='int' modifiers=()
      > name=b'cv_p_cv' type='volatile '
        DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
          DIE tag=DW_TAG_const_type type='const volatile  *const' modifiers=('const', 'pointer', 'const')
            DIE tag=DW_TAG_pointer_type type='const volatile  *' modifiers=('const', 'pointer')
              DIE tag=DW_TAG_const_type type='const volatile ' modifiers=('const',)
                DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
                  DIE tag=DW_TAG_base_type type='int' modifiers=()
      > name=b'cv_p_vc' type='volatile '
        DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
          DIE tag=DW_TAG_const_type type='const volatile  *const' modifiers=('const', 'pointer', 'const')
            DIE tag=DW_TAG_pointer_type type='const volatile  *' modifiers=('const', 'pointer')
              DIE tag=DW_TAG_const_type type='const volatile ' modifiers=('const',)
                DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
                  DIE tag=DW_TAG_base_type type='int' modifiers=()
      > name=b'vc_p_cv' type='volatile '
        DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
          DIE tag=DW_TAG_const_type type='const volatile  *const' modifiers=('const', 'pointer', 'const')
            DIE tag=DW_TAG_pointer_type type='const volatile  *' modifiers=('const', 'pointer')
              DIE tag=DW_TAG_const_type type='const volatile ' modifiers=('const',)
                DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
                  DIE tag=DW_TAG_base_type type='int' modifiers=()
      > name=b'vc_p_vc' type='volatile '
        DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
          DIE tag=DW_TAG_const_type type='const volatile  *const' modifiers=('const', 'pointer', 'const')
            DIE tag=DW_TAG_pointer_type type='const volatile  *' modifiers=('const', 'pointer')
              DIE tag=DW_TAG_const_type type='const volatile ' modifiers=('const',)
                DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
                  DIE tag=DW_TAG_base_type type='int' modifiers=()
      > name=b'vc_p_vc_p_vc' type='volatile '
        DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
          DIE tag=DW_TAG_const_type type='volatile  *const' modifiers=('pointer', 'const')
            DIE tag=DW_TAG_pointer_type type='volatile  *' modifiers=('pointer',)
              DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
                DIE tag=DW_TAG_const_type type='const volatile  *const' modifiers=('const', 'pointer', 'const')
                  DIE tag=DW_TAG_pointer_type type='const volatile  *' modifiers=('const', 'pointer')
                    DIE tag=DW_TAG_const_type type='const volatile ' modifiers=('const',)
                      DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
                        DIE tag=DW_TAG_base_type type='int' modifiers=()
      > name=b'vc_2' type='volatile '
        DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
          DIE tag=DW_TAG_const_type type='const const volatile [2]' modifiers=('const',)
            DIE tag=DW_TAG_array_type type='const volatile [2]' modifiers=()
              DIE tag=DW_TAG_const_type type='const volatile ' modifiers=('const',)
                DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
                  DIE tag=DW_TAG_base_type type='int' modifiers=()
      > name=b'vc_p_cv_2' type='volatile '
        DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
          DIE tag=DW_TAG_const_type type='const volatile [2]' modifiers=('const',)
            DIE tag=DW_TAG_array_type type='volatile [2]' modifiers=()
              DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
                DIE tag=DW_TAG_const_type type='const volatile  *const' modifiers=('const', 'pointer', 'const')
                  DIE tag=DW_TAG_pointer_type type='const volatile  *' modifiers=('const', 'pointer')
                    DIE tag=DW_TAG_const_type type='const volatile ' modifiers=('const',)
                      DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
                        DIE tag=DW_TAG_base_type type='int' modifiers=()

And I guess it's also possible to have C++ references in the mix?

@sevaa
Copy link
Contributor

sevaa commented Oct 25, 2023

I always felt uneasy about datatype recovery. We needed it because it was the cost of the llvm-dwarfdump autotest, but I suspect that the general case of it is too hairy a problem, and our implementation, whatever it is like, will end up mimicking the quirks of llvm-dwarfdump (which may evolve as the tool evolves).

Point is, add your binary to the the dwarfdump autotest directory, run the autotest, and see where it lands.

@Henri-J-Norden
Copy link
Author

I actually only looked at using parse_cpp_datatype in order to figure out the base type and array dimensions - so personally I don't even need full C-style datatype recovery. Maybe it would make sense to separate these functions? In the end, I had to re-implement finding the base type anyway, since the name of the TypeDesc object always includes the modifiers (that I don't care about).

As for the test, it fails at the first, simplest variable:

Test file 'test/testfiles_for_dwarfdump/volatile.elf'
.......................FAIL
....for file test/testfiles_for_dwarfdump/volatile.elf
....for option "--debug-info"
....Output #1 is llvm-dwarfdump, Output #2 is pyelftools
@@ Mismatch on line #29:
>>                dw_at_type [dw_form_ref4]	(cu + 0x0051 => {0x00000051} "volatile int")<<
>>              dw_at_type [dw_form_ref4]	(cu + 0x0051 => {0x00000051} "volatile ")<<

@sevaa
Copy link
Contributor

sevaa commented Oct 28, 2023

If the subset of datatype recovery logic that you need is different from what the parse_cpp_datatype machinery provides, maybe you should keep it in your own logic. I, for one, have an alternative datatype recovery piece (which can be seen at https://github.com/sevaa/dwex/blob/master/dwex/locals.py), which started off as the same codebase as parse_cpp_datatype but then diverged, and I have no ambition to roll it into the library - specifically because datatype recovery is super hairy and hard to get right, and attempting to do so in a public codebase would imply a commitment to correctness that I'm not sure I can uphold.

@sevaa
Copy link
Contributor

sevaa commented Nov 8, 2023

What I think we should do for the viability of the API, we should further decouple datatype discovery from C++ declaration string composition. This way, the former can evolve towards correctness (hopefully) while the latter can evolve towards matching what llvm-dwarfdump provides, and users can supply their own type+name generation logic, if needed.

If only there was a corpus of C++ sources aimed at spanning the gamut of datatype options (more generally, language features) out there...

For the short term, I'll try and see what can/should be done about volatile.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants