<< PreviousNext >>

Debugging memory errors

The best part of debugging is debugging you do not have to do. I have realized that to try to make things fail fast will help you catch problems in both new and old code fairly fast, often without having to start a debugger at all. This is nice, because then you usually have most of the relevant details in your head already so it will become a quick fix. Sadly, this is not always true, sometimes you stumble upon a special case you did not think about earlier. For me, this usually surfaces as some assertion triggering, but I have no idea of exactly caused it. It usually makes me end up thinking: "Yes, this is wrong, but why?". To help me in this progress I am currently abusing the comma operator in asserts like this:

assert(("Message", condition));

since my compiler will then print my message along with the condition if the assertion fails (and of course the source file and line). This is nice since I do not always have to open up the relevant source file to find out what failed, but sometimes I'm still left wondering what actually caused the problem. In these cases I end up opening the debugger and rewriting the assertion like this:

if (!condition) DebugBreak();

which in MSVC is equivalent to setting a conditional breakpoint on that line. Then I can run my debugger to see a nice stack trace. From that, it is in most cases fairly simple to see the root case of the problem. It would be nice to have a good stack trace directly output from the application itself, but I have not gotten to examining that yet.

So far, we have not touched upon the tricky errors: memory related errors. Microsoft's C runtime library comes with a fairly good debug heap implementation, which has support for leak checking and use-after-free checks to some extent. However, I have noticed that it behaves differently when I launch my application from the debugger and as stand-alone for some reason. When I am running my application in the debugger I get more memory check errors than when I am running in stand-alone mode. I have explicitly turned on heap checks and leak detections from start-up code in my application, but the difference is still there.

As I am nowadays mainly developing from emacs, and therefore do not run under the debugger environment as standard, some memory related problems does not surface directly, and as a result I do not get some memory-related errors when I am coding so they become hidden somewhere in the code and therefore really hard to debug. One of these points is the shutdown code of the compiler. The compiler has a root object, Engine, that represents the entire compiler along with all its state, and when destroying this object I have to be really careful to destroy objects in the right order for the same reasons that I have to take special measures when setting up the type system. For example, I need to make sure to destroy the instance that represents the Type type last, otherwise all other objects will have invalid pointers for their type information, which will be a mess to debug. So when I recently re-wrote the Scope class to better implement scope lookups, I introduced another automatic reference counted member there which I forgot to destroy in the destructor of Engine. For a long time, I did not notice the problem since the type member was not accessed during the critical time, until I turned on the lifetime tracing I had previously written to debug another issue.

What the lifetime tracking does is basically to output the address and type of every object created and destroyed in the system (that is reference counted). This has been really useful to track down other bugs earlier, but in this case it introduced a read of the type member, which was destroyed too early here. Once again, using the other outputs I could simply check what the type of the missing object was and quite quickly track down that I had forgotten to destroy the newly introduced member early enough. The other problem remained, which was that the heap consistency check failed when running under the debugger.

A good way to track down heap inconsistencies I have used earlier is to force a heap check by doing

_ASSERT(_CrtCheckMemory());

every time i alter a reference count. This makes the compiler run insanely slow, but helps pinning down the location of the problems quite well. When the heap check fails, I can simply take a look at the stack trace to see approximatley where the problem is and then insert more heap validation around the problematic code to further pin down the problem. This time it was not as easy, since the problem here was deep inside destructor code (many times triggered by my automatic refcount management class). I still managed to pin it down to around the destruction of a syntax rule, but then strange things started to happen. Suddenly a previously innocent call to atomicDecrement started to fail and the debugger would not give me a stack trace at that point. I knew it was in one reference count decrease, and I tried tracing the number of time the decrease was called but not to much luck.

This is when I realized that I needed to know where the actual mis-step was. The Microsoft C runtime will not tell you the exact location where you accessed memory you did not own, it will only detect if you have written somewhere you should not have done at the next heap check. This was to late for me in this case. At this time I started thinking about how helpful valgrind is on Linux systems, and I started searching for something similar for Windows. Eventually, I found something called DrMemory, which looked promising.

When I ran it on my failing test code, I got a bunch of the following errors, along

Error #1: UNADDRESSABLE ACCESS beyond heap bounds: reading 0x03041fd2-0x03041fe0 14 byte(s) within 0x03041fd0-0x03041fe0
# 0 nvd3d9wrap.dll!setDeviceHandle   +0x2530   (0x709ba0b0 <nvd3d9wrap.dll+0xa0b0>)
# 1 nvd3d9wrap.dll!CheckDXGIModule   +0x44f    (0x709b1ab0 <nvd3d9wrap.dll+0x1ab0>)
# 2 nvd3d9wrap.dll!GetNVDisplayW     +0x1b14   (0x709b3795 <nvd3d9wrap.dll+0x3795>)
# 3 _ms_p5_mp_test_fdiv
# 4 _fpmath      
# 5 _cinit                            [f:\dd\vctools\crt_bld\self_x86\crt\src\crt0dat.c:289]
# 6 __tmainCRTStartup                 [f:\dd\vctools\crt_bld\self_x86\crt\src\crt0.c:249]
# 7 wmainCRTStartup                   [f:\dd\vctools\crt_bld\self_x86\crt\src\crt0.c:181]
# 8 KERNEL32.dll!BaseThreadInitThunk +0x11     (0x76b5338a <KERNEL32.dll+0x1338a>)
Note: @0:00:01.620 in thread 448
Note: refers to 0 byte(s) beyond last valid byte in prior malloc
Note: prev lower malloc:  0x03041fc0-0x03041fd2
Note: instruction: movdqu (%edx) -> %xmm1

Error #2: UNADDRESSABLE ACCESS beyond heap bounds: reading 0x03041fd2-0x03041fe0 14 byte(s) within 0x03041fd0-0x03041fe0
# 0 nvd3d9wrap.dll!setDeviceHandle   +0x2530   (0x709ba0b0 <nvd3d9wrap.dll+0xa0b0>)
# 1 nvd3d9wrap.dll!CheckDXGIModule   +0x460    (0x709b1ac1 <nvd3d9wrap.dll+0x1ac1>)
# 2 nvd3d9wrap.dll!GetNVDisplayW     +0x1b14   (0x709b3795 <nvd3d9wrap.dll+0x3795>)
# 3 _ms_p5_mp_test_fdiv
# 4 _fpmath      
# 5 _cinit                            [f:\dd\vctools\crt_bld\self_x86\crt\src\crt0dat.c:289]
# 6 __tmainCRTStartup                 [f:\dd\vctools\crt_bld\self_x86\crt\src\crt0.c:249]
# 7 wmainCRTStartup                   [f:\dd\vctools\crt_bld\self_x86\crt\src\crt0.c:181]
# 8 KERNEL32.dll!BaseThreadInitThunk +0x11     (0x76b5338a <KERNEL32.dll+0x1338a>)
Note: @0:00:01.620 in thread 448
Note: refers to 0 byte(s) beyond last valid byte in prior malloc
Note: prev lower malloc:  0x03041fc0-0x03041fd2
Note: instruction: movdqu (%edx) -> %xmm1

Error #3: UNINITIALIZED READ: reading register ecx
# 0 storm::Code::update                        [c:\users\filip\projects\storm\storm\code.cpp:20]
# 1 storm::Function::initRefs                  [c:\users\filip\projects\storm\storm\function.cpp:78]
# 2 storm::Function::ref                       [c:\users\filip\projects\storm\storm\function.cpp:26]
# 3 storm::Function::pointer                   [c:\users\filip\projects\storm\storm\function.cpp:22]
# 4 storm::Package::createReader               [c:\users\filip\projects\storm\storm\package.cpp:278]
# 5 storm::Package::addReader                  [c:\users\filip\projects\storm\storm\package.cpp:283]
# 6 storm::Package::loadAlways                 [c:\users\filip\projects\storm\storm\package.cpp:218]
# 7 storm::Package::load                       [c:\users\filip\projects\storm\storm\package.cpp:197]
# 8 storm::Package::findName                   [c:\users\filip\projects\storm\storm\package.cpp:103]
# 9 storm::Package::find                       [c:\users\filip\projects\storm\storm\package.cpp:87]
#10 CodeTest::run                              [c:\users\filip\projects\storm\stormtest\codetest.cpp:33]
#11 Tests::run                                 [c:\users\filip\projects\storm\test\testmgr.h:30]
#12 wmain                                      [c:\users\filip\projects\storm\stormtest\stormtest.cpp:11]
Note: @0:00:05.550 in thread 2500
Note: instruction: cmp    %ecx 0x08(%ebp)

[a large number of these later]

Error #10: UNADDRESSABLE ACCESS of freed memory: reading 0x02ada478-0x02ada47c 4 byte(s)
# 0 KERNEL32.dll!GetConsoleMode                                        +0x45     (0x7511136d <KERNEL32.dll+0x1136d>)
# 1 atomicDecrement                                                     [c:\users\filip\projects\storm\utils\utils.cpp:36]
# 2 storm::Object::release                                              [c:\users\filip\projects\storm\storm\lib\object.h:69]
# 3 storm::Auto<>::~Auto<>                                              [c:\users\filip\projects\storm\storm\lib\auto.h:36]
# 4 storm::Scope::~Scope
# 5 storm::SyntaxOption::~SyntaxOption                                  [c:\users\filip\projects\storm\storm\syntaxoption.cpp:13]
# 6 storm::SyntaxOption::`scalar deleting destructor'
# 7 clear<>                                                             [c:\users\filip\projects\storm\utils\utils.h:83]
# 8 storm::SyntaxRule::~SyntaxRule                                      [c:\users\filip\projects\storm\storm\syntaxrule.cpp:11]
# 9 storm::SyntaxRule::`scalar deleting destructor'
#10 clearMap<>                                                          [c:\users\filip\projects\storm\utils\utils.h:91]
#11 storm::SyntaxRules::clear                                           [c:\users\filip\projects\storm\storm\syntaxrule.cpp:52]
#12 storm::SyntaxRules::~SyntaxRules                                    [c:\users\filip\projects\storm\storm\syntaxrule.cpp:48]
#13 storm::Package::~Package                                            [c:\users\filip\projects\storm\storm\package.cpp:34]
#14 storm::Package::`scalar deleting destructor'
#15 storm::Object::release                                              [c:\users\filip\projects\storm\storm\lib\object.h:70]
#16 storm::Auto<>::~Auto<>                                              [c:\users\filip\projects\storm\storm\lib\auto.h:36]
#17 std::pair<>::~pair<>
#18 std::_List_nod<>::_Node::~_Node
#19 std::_List_nod<>::_Node::`scalar deleting destructor'
Note: @0:00:31.810 in thread 2340
Note: next higher malloc: 0x02ada4b0-0x02ada4d0
Note: prev lower malloc:  0x02ada3f8-0x02ada424
Note: 0x02ada478-0x02ada47c overlaps memory 0x02ada470-0x02ada490 that was freed here:
Note: # 0 replace_free                                                  [d:\drmemory_package\common\alloc_replace.c:2503]
Note: # 1 storm::Object::operator delete                                [c:\users\filip\projects\storm\storm\lib\object.cpp:133]
Note: # 2 storm::Object::operator delete                                [c:\users\filip\projects\storm\storm\lib\object.cpp:138]
Note: # 3 storm::ScopeExtra::`scalar deleting destructor'
Note: # 4 storm::Object::release                                        [c:\users\filip\projects\storm\storm\lib\object.h:70]
Note: # 5 storm::Auto<>::~Auto<>                                        [c:\users\filip\projects\storm\storm\lib\auto.h:36]
Note: instruction: lock xadd   (%ecx) %eax -> (%ecx) %eax

The first two reported errors seem to be within some code I do not have control over, so I will find some way to ignore them later on. The large number of UNINITIALIZED READ: reading register ecx was not what I was looking for, since I know that somewhere I have written to free'd memory, otherwise the heap check would not be able to detect it. That left me with the last one, the UNADDRESSABLE ACCESS of freed memory. As we can see from the stack trace, it also points at the atomicDecrement that Visual Studio also showed earlier but failed to provide a stack trace for. From here we can see that some object is free'd through the Auto destructor from Scope. Earlier, I had found clues leading me to believe that the SyntaxOption somehow was involved but I did not realize that the Scope was to blame, sine there was no mention of the Scope in the destructor.

After this, it was fairly simple to see where things went downhill by enabling my traces of object creations and destructions. By simply printing the address of the object contained in Scope i was able to see that for some reason that object was destroyed just before this release call. To prevent this from happening again, I added a debug mode that keeps a map of all live objects, so that I can verify before any change in reference count that the object actually is alive! Whatever the problem is, I will hopefully notice it faster next time at least. In the end, it turned out that I something stupid like this:

void function(Auto<Object> object) {
    Object *borrowed = object.borrow();
    Auto<Object> z(borrowed);
}

The constructor for Auto assumes that it should take the ownership of one reference, ie it will effectively lower the total reference count of the object by one. However, in this case I had previously only borrowed the object from the previous Auto, which means that the reference count will not match up. The fix was to simply elliminate the call to borrow. After that, it worked again, but what about the other error that Dr Memory detected?

At first glance, since Dr Memory started talking about machine code, I started fearing that my compiler did something bad. Well, the easiest way to figure out this is to take a look at the asm-code to see what is wrong. This is what I got:

    void Code::update(code::RefSource &update) {
01576D70  push        ebp  
01576D71  mov         ebp,esp 
01576D73  sub         esp,0CCh 
01576D79  push        ebx  
01576D7A  push        esi  
01576D7B  push        edi  
01576D7C  push        ecx  
01576D7D  lea         edi,[ebp-0CCh] 
01576D83  mov         ecx,33h 
01576D88  mov         eax,0CCCCCCCCh 
01576D8D  rep stos    dword ptr es:[edi] 
01576D8F  pop         ecx  
01576D90  mov         dword ptr [ebp-8],ecx 
        if (toUpdate == &update)
01576D93  mov         eax,dword ptr [this] 
01576D96  mov         ecx,dword ptr [eax+0Ch] 
01576D99  cmp         ecx,dword ptr [update] 
01576D9C  jne         storm::Code::update+30h (1576DA0h) 
            return;
01576D9E  jmp         storm::Code::update+4Fh (1576DBFh) 
        toUpdate = &update;
01576DA0  mov         eax,dword ptr [this] 
01576DA3  mov         ecx,dword ptr [update] 
01576DA6  mov         dword ptr [eax+0Ch],ecx 
        newRef();
01576DA9  mov         eax,dword ptr [this] 
01576DAC  mov         edx,dword ptr [eax] 
01576DAE  mov         esi,esp 
01576DB0  mov         ecx,dword ptr [this] 
01576DB3  mov         eax,dword ptr [edx+10h] 
01576DB6  call        eax  
01576DB8  cmp         esi,esp 
01576DBA  call        @ILT+45930(__RTC_CheckEsp) (13E636Fh) 
    }
01576DBF  pop         edi  
01576DC0  pop         esi  
01576DC1  pop         ebx  
01576DC2  add         esp,0CCh 
01576DC8  cmp         ebp,esp 
01576DCA  call        @ILT+45930(__RTC_CheckEsp) (13E636Fh) 
01576DCF  mov         esp,ebp 
01576DD1  pop         ebp  
01576DD2  ret         4    

From Dr Memory, I also know which instruction was bothering me: cmp %ecx $08(%ebp). But we can not find this exact instruction... Knowing some calling conventions on the x86 tells you that $08(%ebp) or [ebp + 08h] is referring to a parameter of the function. This function has only got one (except this), so it has to be the line cmp ecx, dword ptr [update]. Ok, so what is ecx at that point? Tracking the code backwards we can see that it is loaded from mov ecx, dword ptr [eax+0Ch] and eax comes from mov eax, dword ptr [this]. So ecx is the member at offset 0x0C in this object. Taking a quick look in the constructor, I see that yes, I have forgotten to initialize the toUpdate variable. And after that the error goes away! This shows that Dr Memory is good at tracking uninitialized memory as well, which is nice to know! I just now realized that I could simply have looked at the line number reference of the error message and seen that something was wrong on line 20 (which is the offending line), but since I initially thought it was a compiler error due to not naming the member it named a register.

So in the end Dr Memory really helped me out, and it seems to be a good tool. It seems that Valgrind has even more tools, but at the moment it does not seem to run on windows, and at the moment I only need the memory checker anyway. It also helps you verify that you have properly matched your malloc, new and new [] with the correct one of free, delete, and delete []. I made a small hack that failed this check, but let's not talk about that now :)

As you can see, it has been a quite fun Christmas for me! At least I found a good tool to help my debugging in the future, as well as improving my consistency checks.

By the way, does anyone know why the debug heap acts differently when run from Visual Studio compared to when running stand-alone? Can I alter this somehow?

Comments

New comment

You can use GitHub flavored markdown here. Parsed by Parsedown, which does not support all of GitHub's features. For example, specifying the language of code listings is not supported.

Name: