Debugging memory errors
The best part of debugging is debugging you do not have to do. I have realized that to try to make things fail fast will help you catch problems in both new and old code fairly fast, often without having to start a debugger at all. This is nice, because then you usually have most of the relevant details in your head already so it will become a quick fix. Sadly, this is not always true, sometimes you stumble upon a special case you did not think about earlier. For me, this usually surfaces as some assertion triggering, but I have no idea of exactly caused it. It usually makes me end up thinking: "Yes, this is wrong, but why?". To help me in this progress I am currently abusing the comma operator in asserts like this:
assert(("Message", condition));
since my compiler will then print my message along with the condition if the assertion fails (and of course the source file and line). This is nice since I do not always have to open up the relevant source file to find out what failed, but sometimes I'm still left wondering what actually caused the problem. In these cases I end up opening the debugger and rewriting the assertion like this:
if (!condition) DebugBreak();
which in MSVC is equivalent to setting a conditional breakpoint on that line. Then I can run my debugger to see a nice stack trace. From that, it is in most cases fairly simple to see the root case of the problem. It would be nice to have a good stack trace directly output from the application itself, but I have not gotten to examining that yet.
So far, we have not touched upon the tricky errors: memory related errors. Microsoft's C runtime library comes with a fairly good debug heap implementation, which has support for leak checking and use-after-free checks to some extent. However, I have noticed that it behaves differently when I launch my application from the debugger and as stand-alone for some reason. When I am running my application in the debugger I get more memory check errors than when I am running in stand-alone mode. I have explicitly turned on heap checks and leak detections from start-up code in my application, but the difference is still there.
As I am nowadays mainly developing from emacs, and therefore do not run under the debugger
environment as standard, some memory related problems does not surface directly, and as
a result I do not get some memory-related errors when I am coding so they become hidden
somewhere in the code and therefore really hard to debug. One of these points is the shutdown
code of the compiler. The compiler has a root object, Engine
, that represents the entire
compiler along with all its state, and when destroying this object I have to be really careful
to destroy objects in the right order for the same reasons that I have to take special measures
when setting up the type system. For example, I need to make sure to destroy the instance
that represents the Type
type last, otherwise all other objects will have invalid pointers
for their type information, which will be a mess to debug. So when I recently re-wrote the
Scope
class to better implement scope lookups, I introduced another automatic reference
counted member there which I forgot to destroy in the destructor of Engine
. For a long
time, I did not notice the problem since the type member was not accessed during the
critical time, until I turned on the lifetime tracing I had previously written to debug
another issue.
What the lifetime tracking does is basically to output the address and type of every object created and destroyed in the system (that is reference counted). This has been really useful to track down other bugs earlier, but in this case it introduced a read of the type member, which was destroyed too early here. Once again, using the other outputs I could simply check what the type of the missing object was and quite quickly track down that I had forgotten to destroy the newly introduced member early enough. The other problem remained, which was that the heap consistency check failed when running under the debugger.
A good way to track down heap inconsistencies I have used earlier is to force a heap check by doing
_ASSERT(_CrtCheckMemory());
every time i alter a reference count. This makes the compiler run insanely slow, but
helps pinning down the location of the problems quite well. When the heap check fails,
I can simply take a look at the stack trace to see approximatley where the problem is
and then insert more heap validation around the problematic code to further pin down
the problem. This time it was not as easy, since the problem here was deep inside
destructor code (many times triggered by my automatic refcount management class). I
still managed to pin it down to around the destruction of a syntax rule, but then
strange things started to happen. Suddenly a previously innocent call to atomicDecrement
started to fail and the debugger would not give me a stack trace at that point. I knew
it was in one reference count decrease, and I tried tracing the number of time the decrease
was called but not to much luck.
This is when I realized that I needed to know where the actual mis-step was. The Microsoft
C runtime will not tell you the exact location where you accessed memory you did not own,
it will only detect if you have written somewhere you should not have done at the next heap
check. This was to late for me in this case. At this time I started thinking about how helpful
valgrind
is on Linux systems, and I started searching for something similar for Windows.
Eventually, I found something called DrMemory, which looked promising.
When I ran it on my failing test code, I got a bunch of the following errors, along
Error #1: UNADDRESSABLE ACCESS beyond heap bounds: reading 0x03041fd2-0x03041fe0 14 byte(s) within 0x03041fd0-0x03041fe0 # 0 nvd3d9wrap.dll!setDeviceHandle +0x2530 (0x709ba0b0 <nvd3d9wrap.dll+0xa0b0>) # 1 nvd3d9wrap.dll!CheckDXGIModule +0x44f (0x709b1ab0 <nvd3d9wrap.dll+0x1ab0>) # 2 nvd3d9wrap.dll!GetNVDisplayW +0x1b14 (0x709b3795 <nvd3d9wrap.dll+0x3795>) # 3 _ms_p5_mp_test_fdiv # 4 _fpmath # 5 _cinit [f:\dd\vctools\crt_bld\self_x86\crt\src\crt0dat.c:289] # 6 __tmainCRTStartup [f:\dd\vctools\crt_bld\self_x86\crt\src\crt0.c:249] # 7 wmainCRTStartup [f:\dd\vctools\crt_bld\self_x86\crt\src\crt0.c:181] # 8 KERNEL32.dll!BaseThreadInitThunk +0x11 (0x76b5338a <KERNEL32.dll+0x1338a>) Note: @0:00:01.620 in thread 448 Note: refers to 0 byte(s) beyond last valid byte in prior malloc Note: prev lower malloc: 0x03041fc0-0x03041fd2 Note: instruction: movdqu (%edx) -> %xmm1 Error #2: UNADDRESSABLE ACCESS beyond heap bounds: reading 0x03041fd2-0x03041fe0 14 byte(s) within 0x03041fd0-0x03041fe0 # 0 nvd3d9wrap.dll!setDeviceHandle +0x2530 (0x709ba0b0 <nvd3d9wrap.dll+0xa0b0>) # 1 nvd3d9wrap.dll!CheckDXGIModule +0x460 (0x709b1ac1 <nvd3d9wrap.dll+0x1ac1>) # 2 nvd3d9wrap.dll!GetNVDisplayW +0x1b14 (0x709b3795 <nvd3d9wrap.dll+0x3795>) # 3 _ms_p5_mp_test_fdiv # 4 _fpmath # 5 _cinit [f:\dd\vctools\crt_bld\self_x86\crt\src\crt0dat.c:289] # 6 __tmainCRTStartup [f:\dd\vctools\crt_bld\self_x86\crt\src\crt0.c:249] # 7 wmainCRTStartup [f:\dd\vctools\crt_bld\self_x86\crt\src\crt0.c:181] # 8 KERNEL32.dll!BaseThreadInitThunk +0x11 (0x76b5338a <KERNEL32.dll+0x1338a>) Note: @0:00:01.620 in thread 448 Note: refers to 0 byte(s) beyond last valid byte in prior malloc Note: prev lower malloc: 0x03041fc0-0x03041fd2 Note: instruction: movdqu (%edx) -> %xmm1 Error #3: UNINITIALIZED READ: reading register ecx # 0 storm::Code::update [c:\users\filip\projects\storm\storm\code.cpp:20] # 1 storm::Function::initRefs [c:\users\filip\projects\storm\storm\function.cpp:78] # 2 storm::Function::ref [c:\users\filip\projects\storm\storm\function.cpp:26] # 3 storm::Function::pointer [c:\users\filip\projects\storm\storm\function.cpp:22] # 4 storm::Package::createReader [c:\users\filip\projects\storm\storm\package.cpp:278] # 5 storm::Package::addReader [c:\users\filip\projects\storm\storm\package.cpp:283] # 6 storm::Package::loadAlways [c:\users\filip\projects\storm\storm\package.cpp:218] # 7 storm::Package::load [c:\users\filip\projects\storm\storm\package.cpp:197] # 8 storm::Package::findName [c:\users\filip\projects\storm\storm\package.cpp:103] # 9 storm::Package::find [c:\users\filip\projects\storm\storm\package.cpp:87] #10 CodeTest::run [c:\users\filip\projects\storm\stormtest\codetest.cpp:33] #11 Tests::run [c:\users\filip\projects\storm\test\testmgr.h:30] #12 wmain [c:\users\filip\projects\storm\stormtest\stormtest.cpp:11] Note: @0:00:05.550 in thread 2500 Note: instruction: cmp %ecx 0x08(%ebp) [a large number of these later] Error #10: UNADDRESSABLE ACCESS of freed memory: reading 0x02ada478-0x02ada47c 4 byte(s) # 0 KERNEL32.dll!GetConsoleMode +0x45 (0x7511136d <KERNEL32.dll+0x1136d>) # 1 atomicDecrement [c:\users\filip\projects\storm\utils\utils.cpp:36] # 2 storm::Object::release [c:\users\filip\projects\storm\storm\lib\object.h:69] # 3 storm::Auto<>::~Auto<> [c:\users\filip\projects\storm\storm\lib\auto.h:36] # 4 storm::Scope::~Scope # 5 storm::SyntaxOption::~SyntaxOption [c:\users\filip\projects\storm\storm\syntaxoption.cpp:13] # 6 storm::SyntaxOption::`scalar deleting destructor' # 7 clear<> [c:\users\filip\projects\storm\utils\utils.h:83] # 8 storm::SyntaxRule::~SyntaxRule [c:\users\filip\projects\storm\storm\syntaxrule.cpp:11] # 9 storm::SyntaxRule::`scalar deleting destructor' #10 clearMap<> [c:\users\filip\projects\storm\utils\utils.h:91] #11 storm::SyntaxRules::clear [c:\users\filip\projects\storm\storm\syntaxrule.cpp:52] #12 storm::SyntaxRules::~SyntaxRules [c:\users\filip\projects\storm\storm\syntaxrule.cpp:48] #13 storm::Package::~Package [c:\users\filip\projects\storm\storm\package.cpp:34] #14 storm::Package::`scalar deleting destructor' #15 storm::Object::release [c:\users\filip\projects\storm\storm\lib\object.h:70] #16 storm::Auto<>::~Auto<> [c:\users\filip\projects\storm\storm\lib\auto.h:36] #17 std::pair<>::~pair<> #18 std::_List_nod<>::_Node::~_Node #19 std::_List_nod<>::_Node::`scalar deleting destructor' Note: @0:00:31.810 in thread 2340 Note: next higher malloc: 0x02ada4b0-0x02ada4d0 Note: prev lower malloc: 0x02ada3f8-0x02ada424 Note: 0x02ada478-0x02ada47c overlaps memory 0x02ada470-0x02ada490 that was freed here: Note: # 0 replace_free [d:\drmemory_package\common\alloc_replace.c:2503] Note: # 1 storm::Object::operator delete [c:\users\filip\projects\storm\storm\lib\object.cpp:133] Note: # 2 storm::Object::operator delete [c:\users\filip\projects\storm\storm\lib\object.cpp:138] Note: # 3 storm::ScopeExtra::`scalar deleting destructor' Note: # 4 storm::Object::release [c:\users\filip\projects\storm\storm\lib\object.h:70] Note: # 5 storm::Auto<>::~Auto<> [c:\users\filip\projects\storm\storm\lib\auto.h:36] Note: instruction: lock xadd (%ecx) %eax -> (%ecx) %eax
The first two reported errors seem to be within some code I do not have control over, so I will find
some way to ignore them later on. The large number of UNINITIALIZED READ: reading register ecx
was
not what I was looking for, since I know that somewhere I have written to free'd memory, otherwise
the heap check would not be able to detect it. That left me with the last one, the UNADDRESSABLE ACCESS of freed memory
.
As we can see from the stack trace, it also points at the atomicDecrement
that Visual Studio also
showed earlier but failed to provide a stack trace for. From here we can see that some object is free'd
through the Auto
destructor from Scope
. Earlier, I had found clues leading me to believe
that the SyntaxOption
somehow was involved but I did not realize that the Scope
was to blame,
sine there was no mention of the Scope
in the destructor.
After this, it was fairly simple to see where things went downhill by enabling my traces of object
creations and destructions. By simply printing the address of the object contained in Scope
i was
able to see that for some reason that object was destroyed just before this release
call. To
prevent this from happening again, I added a debug mode that keeps a map of all live objects, so
that I can verify before any change in reference count that the object actually is alive! Whatever
the problem is, I will hopefully notice it faster next time at least. In the end, it turned out
that I something stupid like this:
void function(Auto<Object> object) { Object *borrowed = object.borrow(); Auto<Object> z(borrowed); }
The constructor for Auto
assumes that it should take the ownership of one reference, ie it will
effectively lower the total reference count of the object by one. However, in this case I had
previously only borrowed the object from the previous Auto
, which means that the reference
count will not match up. The fix was to simply elliminate the call to borrow
. After that,
it worked again, but what about the other error that Dr Memory detected?
At first glance, since Dr Memory started talking about machine code, I started fearing that my compiler did something bad. Well, the easiest way to figure out this is to take a look at the asm-code to see what is wrong. This is what I got:
void Code::update(code::RefSource &update) { 01576D70 push ebp 01576D71 mov ebp,esp 01576D73 sub esp,0CCh 01576D79 push ebx 01576D7A push esi 01576D7B push edi 01576D7C push ecx 01576D7D lea edi,[ebp-0CCh] 01576D83 mov ecx,33h 01576D88 mov eax,0CCCCCCCCh 01576D8D rep stos dword ptr es:[edi] 01576D8F pop ecx 01576D90 mov dword ptr [ebp-8],ecx if (toUpdate == &update) 01576D93 mov eax,dword ptr [this] 01576D96 mov ecx,dword ptr [eax+0Ch] 01576D99 cmp ecx,dword ptr [update] 01576D9C jne storm::Code::update+30h (1576DA0h) return; 01576D9E jmp storm::Code::update+4Fh (1576DBFh) toUpdate = &update; 01576DA0 mov eax,dword ptr [this] 01576DA3 mov ecx,dword ptr [update] 01576DA6 mov dword ptr [eax+0Ch],ecx newRef(); 01576DA9 mov eax,dword ptr [this] 01576DAC mov edx,dword ptr [eax] 01576DAE mov esi,esp 01576DB0 mov ecx,dword ptr [this] 01576DB3 mov eax,dword ptr [edx+10h] 01576DB6 call eax 01576DB8 cmp esi,esp 01576DBA call @ILT+45930(__RTC_CheckEsp) (13E636Fh) } 01576DBF pop edi 01576DC0 pop esi 01576DC1 pop ebx 01576DC2 add esp,0CCh 01576DC8 cmp ebp,esp 01576DCA call @ILT+45930(__RTC_CheckEsp) (13E636Fh) 01576DCF mov esp,ebp 01576DD1 pop ebp 01576DD2 ret 4
From Dr Memory, I also know which instruction was bothering me: cmp %ecx $08(%ebp)
. But we can not find
this exact instruction... Knowing some calling conventions on the x86 tells you that $08(%ebp)
or
[ebp + 08h]
is referring to a parameter of the function. This function has only got one (except this
),
so it has to be the line cmp ecx, dword ptr [update]
. Ok, so what is ecx
at that point? Tracking
the code backwards we can see that it is loaded from mov ecx, dword ptr [eax+0Ch]
and eax
comes from mov eax, dword ptr [this]
. So ecx
is the member at offset 0x0C
in this object.
Taking a quick look in the constructor, I see that yes, I have forgotten to initialize the
toUpdate
variable. And after that the error goes away! This shows that Dr Memory is good
at tracking uninitialized memory as well, which is nice to know! I just now realized that I could
simply have looked at the line number reference of the error message and seen that something was wrong
on line 20 (which is the offending line), but since I initially thought it was a compiler error
due to not naming the member it named a register.
So in the end Dr Memory really helped me out, and it seems to be a good tool. It seems that
Valgrind has even more tools, but at the moment it does not seem to run on windows, and
at the moment I only need the memory checker anyway. It also helps you verify that you
have properly matched your malloc
, new
and new []
with the correct one of free
,
delete
, and delete []
. I made a small hack that failed this check, but let's not talk
about that now :)
As you can see, it has been a quite fun Christmas for me! At least I found a good tool to help my debugging in the future, as well as improving my consistency checks.
By the way, does anyone know why the debug heap acts differently when run from Visual Studio compared to when running stand-alone? Can I alter this somehow?
Comments
New comment
You can use GitHub flavored markdown here. Parsed by Parsedown, which does not support all of GitHub's features. For example, specifying the language of code listings is not supported.