Matrix Games Forums

Forums  Register  Login  Photo Gallery  Member List  Search  Calendars  FAQ 

My Profile  Inbox  Address Book  My Subscription  My Forums  Log Out

Coder Diary #32 -- A Look Inside the Sausage Factory

 
View related threads: (in this forum | in all forums)

Logged in as: Guest
Users viewing this topic: none
  Printable Version
All Forums >> [New Releases from Matrix Games] >> Campaign Series: Middle East 1948-1985 >> Coder Diary #32 -- A Look Inside the Sausage Factory Page: [1]
Login
Message << Older Topic   Newer Topic >>
Coder Diary #32 -- A Look Inside the Sausage Factory - 3/7/2016 11:03:12 PM   
berto


Posts: 20504
Joined: 3/13/2002
From: metro Chicago, Illinois, USA
Status: offline

Coder Diary #32 -- A Look Inside the Sausage Factory


Three days in the life:

quote:

ORIGINAL: berto

Heads up [to the Dev Team]:

Things have gone well. Everything checks out as "good enough for now" at least.

You can anticipate my releasing a ton of new stuff, both data and EXEs, for both ME & VN, today (Saturday).

Then the map & data guys can expect follow-up posts throughout the weekend and beyond explaining the new stuff, and what still needs to be done.

Get ready to rumble!

quote:

ORIGINAL: berto

Um, hit a snag. I did something that broke edmap startup, causing the map editor to crash. Debugging it now.

quote:

ORIGINAL: berto

All bets are off. Who knows when and how I can fix this edmap launch failure bug, and when and if I will release. I'm in deep debugging hell here.

quote:

ORIGINAL: berto

Evidence suggests a memory allocation error somewhere. It only manifests itself in Release-mode EXEs, not in Debug-modes EXEs. I have been building EXEs in Debug mode only for the past month or more. I need to revert back to the last time I released known-good EXEs to the team, all the way back to January 29. Then diff the codebase between then and now to find the needle in the haystack.

Ugh, I particularly hate this sort of bug hunt.

quote:

ORIGINAL: berto

Like a good Boy Scout, I am prepared. I have all sorts of things I can try.

quote:

ORIGINAL: berto

I rebuilt a non-crashing edmap from the 20160129 codebase (the last release EXEs).

I rebuilt edmap from the 20160228 backup codebase. The rebuilt edmap crashes, in the same way.

It seems clear that this is a software fault that I introduced sometime between 20160129 and 20160228.

(I did various other checks, including: virus scan; validated the core Windows system files; verified that nothing has changed in my Visual C++ setup in recent months; etc. Everything checks. No, all signs point to a flaw in the codebase.)

Now my strategy will be to revert to backup codebases, doing a binary search, narrowing it down between dates, until I find the two successive codebases where the fault occurs. Then it will be a matter of diff'ing those two successive codebases. I still won't have my needle, but by then I should have a much, much smaller haystack. (Because how many diffs will there be from one day to the next? Not many.)

Thank God for daily backups!

quote:

ORIGINAL: berto

I narrowed it down to a three-day span, between 20160216 (good) to 20160219 (bad). I have interim code backups for both 20160217 & 20160218, but at the time of backup, the code was not successfully compiling on those days, so they are effectively unusable as points of comparison. (That is, I can't run EXEs built from those dates.)

From the codebase diffs between 20160216 & 20160219, I spotted one suspicious thing, but after fixing it, no dice, I still get the crash.

The crazy thing is that the crash only shows in edmap, not the other EXEs; and only manifests in the Release EXE, not the Debug EXE.

I have scrutinized the diffs, and darned if I can see a problem anywhere.

I know exactly in the code where the Release EXE crashes. It's where the new Map object is constructed. I have looked at the Visual C++ library code for this function.

But here's the thing: That might be a red herring. It could very well be that something else entirely separate in the code is overwriting memory somewhere, causing this strange side effect -- the "not enough space for thread data" fault. Sometimes, in coding and debugging, it's like the bug is purposely giving random clues that point you in the wrong direction.

This is shaping into one of the most difficult bugs I've ever faced. Most difficult, because I can't use my usual tricks of single stepping through the code in Debug mode -- this only happens in the Release EXEs remember, EXEs where debugging stuff has been stripped from the code. And I can't use my new code tracing mechanism, or the usual logging either, since the fault lies (maybe; see above) in the Visual C++ libraries, which of course I can't modify (so as to add my own debugging code).

Nothing to do except to keep trying this, trying that, experimenting, thinking outside the box. Doing Web searches also, though much of what I see is extremely technical, and much of it off-target or sometimes even rubbish.

Not the way I had planned to spend my weekend.

<sigh>

...

quote:

ORIGINAL: berto

I have found the bug. I know exactly where and how to toggle on/off the R6016: "not enough space for thread data" edmap crash. I still don't know yet how to fix this properly.

Still, it's progress, major progress even.

quote:

ORIGINAL: berto

quote:

ORIGINAL: berto

I narrowed it down to a three-day span, between 20160216 (good) to 20160219 (bad). I have interim code backups for both 20160217 & 20160218, but at the time of backup, the code was not successfully compiling on those days, so they are effectively unusable as points of comparison. (That is, I can't run EXEs built from those dates.)

After 24 hours of trying this, trying that, trying almost everything, I went back to the 20160217 & 20160218 codebase backups, in each case doing the minimum necessary to get them to compile. Miraculously, the compiled EXEs from both days were both crash free.

Having narrowed down the time frame where the bug first appeared -- between the end-of-day 20160218 backup and the end-of-day 20160219 backup -- I then, file by file, incrementally applied the 20160219 edits (making the haystack smaller and smaller, as it were) until ... I found it!

quote:

I know exactly in the code where the Release EXE crashes. It's where the new Map object is constructed. I have looked at the Visual C++ library code for this function.

Not!

This:

quote:

But here's the thing: That might be a red herring. It could very well be that something else entirely separate in the code is overwriting memory somewhere, causing this strange side effect -- the "not enough space for thread data" fault. Sometimes, in coding and debugging, it's like the bug is purposely giving random clues that point you in the wrong direction.



quote:

This is shaping into one of the most difficult bugs I've ever faced. Most difficult, because I can't use my usual tricks of single stepping through the code in Debug mode -- this only happens in the Release EXEs remember, EXEs where debugging stuff has been stripped from the code. And I can't use my new code tracing mechanism, or the usual logging either, since the fault lies (maybe; see above) in the Visual C++ libraries, which of course I can't modify (so as to add my own debugging code).

Later in April or so, I really need to shift to my new development platform -- new PC, Visual Studio 2015, and supplemental power tools. It will then be so much faster and easier to fix these issues.

quote:

Nothing to do except to keep trying this, trying that, experimenting, thinking outside the box. Doing Web searches also, though much of what I see is extremely technical, and much of it off-target or sometimes even rubbish.

In all too typical fashion, Windows supplies a crash error message that is completely irrelevant, and worse than useless, it's misleading. As was so much of what I read on the Internet.

No, one wise old sage had it right:

quote:

I have 40 of programming in every type of programming language. The error you are getting is caused by an access to a memory location outside the bounds that the operating system is allowing and causing an exception. So any type error that accesses a invalid memory error can be causing this problem.

In the end, the bug is a weird array index error, where memory is read/written to outside one end or the other of the memory allocated to the array(s).

At this point, I have the bug cornered. But I don't quite know yet how to vanquish it without perhaps causing collateral damage.

quote:

Not the way I had planned to spend my weekend.

<sigh>

And beyond. A torturous last 54 hours it's been.



Until the next time ...

_____________________________

Campaign Series Legion https://cslegion.com/
Campaign Series Lead Coder https://www.matrixgames.com/forums/tt.asp?forumid=1515
Panzer Campaigns, Panzer Battles, Civil War Battles Lead Coder https://wargameds.com
Post #: 1
RE: Coder Diary #32 -- A Look Inside the Sausage Factory - 3/8/2016 12:47:30 AM   
Big Ivan


Posts: 1913
Joined: 6/9/2008
From: Mansfield, Ohio USA
Status: offline
Wow Berto,

I was totally lost in the dark jungle after your first quote!

Good luck my friend!!!

John

_____________________________

Blitz call sign Big Ivan.

(in reply to berto)
Post #: 2
RE: Coder Diary #32 -- A Look Inside the Sausage Factory - 3/8/2016 11:45:27 AM   
Jafele


Posts: 737
Joined: 4/20/2011
From: Seville (Spain)
Status: offline
It´s a declaration of war to the king of the bugs.

Hope you win the battle!

_____________________________

Las batallas contra las mujeres son las únicas que se ganan huyendo.

NAPOLEÓN BONAPARTE


Cuando el necio oye la verdad se carcajea, porque si no lo hiciera la verdad no sería la verdad.

LAO TSE

(in reply to Big Ivan)
Post #: 3
RE: Coder Diary #32 -- A Look Inside the Sausage Factory - 3/8/2016 11:56:58 AM   
harry_vdk

 

Posts: 338
Joined: 6/10/2014
From: Drachten
Status: offline
Sound like my day job.

(in reply to Jafele)
Post #: 4
RE: Coder Diary #32 -- A Look Inside the Sausage Factory - 3/8/2016 11:11:07 PM   
berto


Posts: 20504
Joined: 3/13/2002
From: metro Chicago, Illinois, USA
Status: offline

Another day in the life:

quote:

ORIGINAL: berto

quote:

ORIGINAL: berto

quote:

I know exactly in the code where the Release EXE crashes. It's where the new Map object is constructed. I have looked at the Visual C++ library code for this function.

Not!

This:

quote:

But here's the thing: That might be a red herring. It could very well be that something else entirely separate in the code is overwriting memory somewhere, causing this strange side effect -- the "not enough space for thread data" fault. Sometimes, in coding and debugging, it's like the bug is purposely giving random clues that point you in the wrong direction.



...

In the end, the bug is a weird array index error, where memory is read/written to outside one end or the other of the memory allocated to the array(s).

At this point, I have the bug cornered. But I don't quite know yet how to vanquish it without perhaps causing collateral damage.

Not!

The d@mn bug misdirected me again!

No, I have tracked this devil of a bug down to a specific place in the code -- some code for the map auto-generation facility that we no longer support.

I now know how to toggle off the R6016: "not enough space for thread data" crash, in a more pinpointed fashion, in a dormant section of the code.

That is to say, I think I can safely neutralize the bug without causing collateral damage in active sections of the code that truly matter.

I could now maybe say: issue resolved. Although I want to investigate yanking out this map auto-gen stuff for good -- in effect exiling the bug, driving it away, if not quite vanquishing it.

Almost there!



_____________________________

Campaign Series Legion https://cslegion.com/
Campaign Series Lead Coder https://www.matrixgames.com/forums/tt.asp?forumid=1515
Panzer Campaigns, Panzer Battles, Civil War Battles Lead Coder https://wargameds.com

(in reply to harry_vdk)
Post #: 5
RE: Coder Diary #32 -- A Look Inside the Sausage Factory - 3/9/2016 6:10:28 PM   
berto


Posts: 20504
Joined: 3/13/2002
From: metro Chicago, Illinois, USA
Status: offline

One more day in the life, a happier day:

quote:

ORIGINAL: berto

quote:

ORIGINAL: Jason Petho

How does that impact the Battle Generator?

Those are randomly produced maps, no?

quote:

ORIGINAL: berto

No, separate program, I believe.

Indeed. Let me set your mind(s) at ease ...

First, a deeper discussion of the bug.

[... technical mumbo jumbo ...]

Inexplicably -- I still don't understand why -- if like elsewhere you use NumberTerrains only, edmap crashes on launch with the R6016: "not enough space for thread data" bug. But if you use NumberTerrains-1, no more crash. It doesn't make sense, but there it is.

So why not just leave it that way, with NumberTerrains-1 in generate.h & generate.cpp? Fine for every other terrain, but what about the last terrain, RedDirtHex? The terrain[] & hexside[] arrays are not initialized for RedDirtHex, indeed don't even include RedDirtHex. Will this have bad consequences, be "collateral damage" I spoke about? Who knows? On the surface, it's bad. Could very well cause an edmap code malfunction later.

Except that, the stuff in generate.h & generate.cpp are not accessed by the rest of the CS code:

[... more technical mumbo jumbo ...]

Above, I have conclusively demonstrated that the buggy generate.h & generate.cpp are not in actual use anywhere. The bug there remains, and is still unexplained, but it effectively doesn't matter.

If it doesn't matter, if that code is inert, why not just -- as I proposed -- "yank it out"?

...

Remember the old intro screen to edmap, where you could auto-generate a random map, after first setting a bunch of sliders? Because that feature was too difficult to maintain (in our dynamic development, multi-game environment), we had decided to "yank it out" of JTCS 2.00. Although I didn't actually remove it from the code, I did the next best thing: comment it out, remove all active references to it.

Should I remove this useless -- and in the case of generate.h & generate.cpp, buggy -- code? Yes, eventually; eventually I should Do It Right. But not now, not on the eve of the ME 1.02 release. No, better to leave well enough alone for now. For now, I will leave in place the NumberTerrains-1 Band-Aid. No need to worry about collateral damage from that, since the active edmap code will never access it. And for now, I will leave in the codebase the now useless generate.h & generate.cpp. We are looking for codebase stability at this point. Now is not the time for major surgery.

A lot of technical gobbledygook here. But the bottom line is: no more edmap launch crash bug, issue resolved.

...

I am going to take a break from coding for the rest of this afternoon (will instead do my next turn in the PBEM I am playing with Petri), then return to the coding tomorrow. By tomorrow I hope to be back on track, where I got derailed last Saturday by this infernal NumberTerrains/AutoGen edmap crash-on-launch bug.

What a nightmare this has been! I wake up to a new, brighter day!

And so ends the current episode of a new Coder Diary series, "A Look Inside the Sausage Factory." (There might be future episodes.) Not for the faint-of-heart or squeamish! Viewer discretion is advised.

_____________________________

Campaign Series Legion https://cslegion.com/
Campaign Series Lead Coder https://www.matrixgames.com/forums/tt.asp?forumid=1515
Panzer Campaigns, Panzer Battles, Civil War Battles Lead Coder https://wargameds.com

(in reply to berto)
Post #: 6
RE: Coder Diary #32 -- A Look Inside the Sausage Factory - 3/9/2016 6:57:39 PM   
Jafele


Posts: 737
Joined: 4/20/2011
From: Seville (Spain)
Status: offline
Cheer up!

_____________________________

Las batallas contra las mujeres son las únicas que se ganan huyendo.

NAPOLEÓN BONAPARTE


Cuando el necio oye la verdad se carcajea, porque si no lo hiciera la verdad no sería la verdad.

LAO TSE

(in reply to berto)
Post #: 7
RE: Coder Diary #32 -- A Look Inside the Sausage Factory - 3/9/2016 7:58:07 PM   
carll11


Posts: 626
Joined: 11/26/2009
Status: offline
whats all this now?




I took a lance in the thigh on the Indus, marched all the way back to Greece on it...


I survived on horsesblood soup seasoned with gunpowder as we trudged back from Moscow.......


I rode a tank, held a Sturmbannführer's rank, when the blitzkrieg raged and the bodies stank..



Bugs? .....BUGS? as a spike team member deep north on Ho Chi Minh trail, we ate bugs to stay alive...




Bugs...for gods sake....Buck up man!!!!!!

(in reply to berto)
Post #: 8
Page:   [1]
All Forums >> [New Releases from Matrix Games] >> Campaign Series: Middle East 1948-1985 >> Coder Diary #32 -- A Look Inside the Sausage Factory Page: [1]
Jump to:





New Messages No New Messages
Hot Topic w/ New Messages Hot Topic w/o New Messages
Locked w/ New Messages Locked w/o New Messages
 Post New Thread
 Reply to Message
 Post New Poll
 Submit Vote
 Delete My Own Post
 Delete My Own Thread
 Rate Posts


Forum Software © ASPPlayground.NET Advanced Edition 2.4.5 ANSI

0.264