Page 2 of 2  [ 12 posts ]  Go to page Previous  1, 2
Author Message
PostPosted: Thu Oct 19, '17, 7:09 pm 
Bragatyr wrote:Yeah, Hukos, I don't know the first thing about programming or computer science, so I don't understand the exact mechanics behind a lot of the stuff, but as someone who is trained in it, I was curious if you could shed any light as to why only the second game seems to be vulnerable to this kind of manipulation, at least from the beginning. I would think that the Master System would have even less sophisticated programming, though I imagine the Genesis may have been tricky as a newer platform, but I find it strange that the first game and fourth game seem to exhibit very little of the underflow stuff, and that it doesn't seem to be nearly as big a deal in the third game.


All integers over/underflow, that's just how integers in computer science work. The size of the integer depends on the amount of data used for it.

Let's say you have an 8 bit unsigned integer. This has a range of 0 - 255. Once you reach the top of the range and increment by one, it simply goes to 0. This is integer overflow. The inverse is also true. Subtracting 1 from 0 will underflow to 255. That's not a glitch, that's literally how integers work as computers understand them.

For a signed integer (signed integers are how computers comprehend the concept of "negative" numbers through a concept known as Two's Complement - otherwise the computer will have no concept of numbers lower than 0), the process is basically the same. Instead of 0 - 255, the range is now -128 to 127. So increment 1 after 127 will net -128.

I'm sure you've noticed by now that the range I've described for an 8 bit integer is basically 2 raised to the 8th power. Which makes sense, given how binary works (with more significant bits representing 2 (n+1) power).

So a 16 bit unsigned integer is 0 - 65,535, or 2 raised to the 16th power. And so on and so forth for 32 bit integers, 64 bit integers, etc.

The thing here is that underflow/overflow can be avoided by specifically coding for it. A bit of sample code of C++

int damage;
if (damage > 999)
damage = 999;

simple right? Basically what that would do is define the integer damage, and if damage exceeds 999, it sets the value back to 999, preventing it from ever overflowing it. Well the thing is, C++ is a high level language (comparatively at least, don't yell at me haskell nerds) and video games didn't really have the convenience of high level languages. The short version is that, the computer hardware video games used was not powerful enough to utilize high level languages without throwing away lots of precious memory and cycles, and you can't afford to do that without killing performance. So everything had to be done as close "to the metal" as possible. You remember the flight simulator games EA threw out there on the Genesis that ran at like 5 frames per second? That's what happens when you use a high level language to program a Genesis game.

To explain that bit better - higher level languages require the programmer do less to achieve the same result, as the compiler handles a lot of the complicated stuff behind the scenes. But that kind of convenience has its own cost. Because programmers in those days couldn't afford that kind of cost since the hardware just wasn't there yet, they had to do things in assembly language, which is a lot more complicated to use.

As mentioned before, assembly language is like writing an english paper but you need to define the syntax of the entire english language yourself, define every letter and word yourself before you even get to use the logic underlying them. The reason is well, when you're interacting directly with hardware, the computer is an idiot. It doesn't know to do anything, you have to tell it to do EVERYTHING you want it to. So if you don't understand computer architecture, you're in trouble. Why are computers able to do that stuff by themselves then nowadays? Years and years and YEARS of work have created compilers to do that stuff for us in 2017, which is kind of a really primitive form of AI in a sense (that's not strictly true but the analogy is close enough to get the point home).

The real issue is likely that with how Phantasy Star II works, underflow/overflow likely never came up under normal gameplay conditions so they thought it wasn't really necessary to write a function that handled that.

Using the Phantasy Star II disassembly as a basis, here is the damage formula for Phantasy Star II.

Note: the following is likely really hard to follow and I'm just showing specifically to showcase that assembly language is kind of hard to read. I don't really expect you to be able to follow it.

bsr.w UpdateRNGSeed ; bsr = branch to subroutine. Jumps to a label/function
andi.l #$1F, d0 ; immediate AND operation, bitwise operation
addi.w #$54, d0 ; immediate add
add.w $1A(a3), d0 ; adds one value to another
lsl.l #8, d0 ; lsl = logical shift left - bit shifts that many places to the left.
move.w $1E(a1), d1 ; move moves one value to another location
mulu.w #$5, d1 ; mulu = multiply unsigned
addi.w #$64, d1
divu.w d1, d0 ;divu = divide unsigned
move.w d3, d1
andi.w #$7F, d1
lsl.w #4, d1
lea (InventoryData+$E).l, a4 ;lea = load effective address, loads the basic address pointers into an address register
adda.w d1, a4 ;add address - adds to an address register instead of a memory location or data register
moveq #0, d1 ;moveq = move quickly. Slightly faster than the move opcode
move.b (a4), d1
addq.w #2, d1 ;addq = add quickly. Like moveq, its faster than the basic operation.
mulu.w d1, d0
lsr.w #8, d0 ;logical shift right

The genesis has 8 data registers, which are used for storing generic data and performing arithmetic operations. It has 8 separate address registers, which are used for storing memory addresses to be operated on later - however the final address register, a7, is also the stack pointer (which is a pointer that holds the memory location of the stack) and should not be used by the programmer in any case.

The $ symbol represents a hexadecimal value. Hex numbers go from 0-F, where numbers A-F represent 11-16. Well, 10-15 since computers are zero indexed (as in, 0 takes the place of 1 as the first number computers use). So if you see $12, that means the number is is 18.

The # symbol represents an immediate value. Which means that its a "normal" number and not a memory address. Without the # symbol, the computer assumes you're talking about memory addresses. Don't get the two mixed up, bad things can happen.

.b, .w, .l represent byte, word, longword respecfully. A word is twice the length of a byte, and a longword is twice the length of a word.

The ; symbol means everything after the ; is a comment. Comments are ignored during assembling and only used for a programmer's benefit.

Now say you had after the end of that label:

cmpi.w #$03E7 d0 ;compares 999 to d0
ble.w doNotOverFlow ;If the comparison means that 999 is less than d0, then we branch to the label. Otherwise, keep going. Branching means the value has overflowed. The address we're currently at will be pushed onto the stack so we can return later.

move.w #$03E7, d0 ;if the result of the damage exceeds 999, make it 999.
rts ;the Program Counter looks at the return address that is on the stack and returns there.

Now keep in mind, the example I posted is really crappy and horribly optimized as I'm not very good at assembly myself. But the point is, in PSII the situation likely just never came up where they needed to have such a check so they never thought that it was imperative to make one. Also it should be noted that modern video game programming in 2017 looks nothing like this.

Also, PSIII even though it doesn't have this particular problem, is really nonsensical under the hood for a variety of other reasons. I still haven't been able to parse the game's damage formula yet, among other crazy things.

Last edited by Hukos on Thu Oct 19, '17, 7:09 pm, edited 2 times in total.

PostPosted: Fri Oct 20, '17, 11:45 am 
HUKOS : thank you for this large and long explanation about those integers ! This is very interesting !

Honnestly, I haven't understand all of it ! Especially because it's in english so it may sometimes be hard to understand all BUT even in french I don't think I would haave understand all ! :rofl:

 Page 2 of 2  [ 12 posts ]  Go to page Previous  1, 2

Who is online

Users browsing this forum: No registered users and 0 guests

Display posts from previous:
Sort by  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to: