Disclaimer

The RE-Invent blog posts are designed to break down complex information into simpler terms, specifically for beginners learning the basics of Reverse Engineering. These posts are strictly for educational and experimental use and are not intended to target any software. Please ensure to use this knowledge responsibly and in accordance with ethical and legal standards.

Introduction

This article expects basic knowledge about sections in applications (.text, .data), offsets and basic assembly knowledge.

In the realm of game modification or vulnerability research, you may often encounter functions named pattern_scan or byte patterns like 48 8B 05 ? ? ? ? 48 8B 0C C8 48 8D 04 D1 48 85 C0 in the source code. These mysterious codes seem to return an offset or an address from a called function. More often than not, these addresses point to functions within an application. But why is this the case? Why don’t these applications use the offset directly? And why do these byte patterns contain these ? symbols? Let’s delve into these questions.

The Idea

To better understand this, let’s consider an example: Games need to keep track of various elements during gameplay, such as the number of players in an online game. The game maintains a structure or array that contains all the players in your match, along with basic information about their position, health, ammo, etc. If you want to determine the position of each player, you need to locate the player array stored by the engine. So, our first task is to figure out how the engine accesses these players.

Let’s simplify this with an example: The engine uses specific structures for the players:

// Basic player struct
struct Player
{
    int x;
    int y;
    float health;
    int ammo;
};

// Basic Player array
struct PlayerArray {
    Player* player1;
    Player* player2;
    Player* player3;
};

Now the engine has 2 functions for the players:


// A global pointer to the players
PlayerArray* players;

// A function that allocates bytes for the players
void createPlayerStruct() {
    //allocate 0x3000 bytes for the players
    players = (PlayerArray*)malloc(0x3000);
    // Do something...
}

// A function that gets the Players
void getPlayerOne() {
    Player* player1 = players->player1;
    // Do something...
}

As illustrated, the engine initializes the players using the createPlayerStruct function, and the getPlayerOne function retrieves the first player. Both of these functions access the global variable PlayerArray* players. This global variable is located in the .data section of the engine. Therefore, our objective is to identify where this pointer is within the .data section and then read the pointer to access the players.

Now, let’s examine an example assembly output of the createPlayerStruct function:

...
.text:00000000000121BB B9 00 30 00 00           mov     ecx, 3000h      ; Size
.text:00000000000121C0 FF 15 6A F2 00 00        cs:__imp_malloc
.text:00000000000121C6 48 89 05 A3 AF 00 00     mov     cs:?players@PlayerArray@, rax ; PlayerArray * players
.text:00000000000121CD 48 8D A5 C8 00 00 00     lea     rsp, [rbp+0C8h]
...

The malloc function is called and the result is stored in the variable cs:?players@PlayerArray@, which is our global players variable. If we trace this variable, it leads us to the .data section of the binary:

...
.data:000000000001D170      ; PlayerArray *players
.data:000000000001D170      ?players@PlayerArray@ dq 0     ; DATA XREF: createPlayerStruct(void)+26w
.data:000000000001D170                                     ; getPlayerOne(void)+1Br
...

There we have our offset: 0x1D170. This offset allows us to read and modify the player array. However, this offset changes with each binary update, which can be problematic if the game updates frequently.

This is where binary signatures come in handy. We can scan the entire binary for specific byte patterns until we find our createPlayerStruct function, which moves the malloc result into the global variable. The byte pattern for the mov instruction would be 48 89 05 A3 AF 00 00.

But this pattern won’t work in newer versions as it changes with each binary update. To address this, we use wildcards (?) in our pattern: 48 89 05 ? ? ? ?. This pattern will work for new updates too!

The only issue is that this pattern is too short and has many wildcards, so it may match many instructions or random bytes. To solve this, we create a larger pattern by adding bytes from instructions before or after our target instruction. In practise its better to add the bytes from the next instructions: 48 89 05 ? ? ? ? 48 8D A5 C8 00 00 00. We don’t need more wildcards as it’s rare for the lea instruction to change. We expand this pattern until we get a single result in the binary. Our current binary pattern is sufficient for my application and works after multiple updates. The process of obtaining the offset involves a straightforward calculation by the CPU. Let’s delve deeper into it: The first three bytes of the mov instruction (48 89 05) specify the type of mov operation, in this case, a mov from the rax register into an offset. The remaining bytes (A3 AF 00 00, or 0xAFA3 in hex) represent the offset. This offset is added to the offset of the next instruction (in our example, 0x121CD). So, it’s essentially 0x121CD + 0xAFA3 = 0x1D170.

If we were to write a function for this, it would take a binary pattern as input, get the address, read bytes 4 to 7, and add this to the address of the next instruction. A pseudocode would look like this:

uint64_t getOffset() {
    // Get the address of the specific instruction with additional bytes so we get only one match
    uint64_t address = getAddressByPattern("48 89 05 ? ? ? ? 48 8D A5 C8 00 00 00");
    // The address variable now contains the memory address where the pattern has ben found.
    // Normally, theres nothing more we have to do if this is our target destination.
    // However in our case, we also want to get the offset of the global variable players.

    // Get the offset from the 4th - 7th byte
    int offset = *(int*)(address + 3); // 48 89 05 ? ? ? ? <-- read out the wildcards

    // Calculate the real offset
    // + 7 because the mov instrunction is 7 bytes long
    // => next instrunction is at address + 7
    return address + 7 + offset;
}

Conclusion

With this function, you can effortlessly obtain the offset with each game update. However, bear in mind that if the function you’ve created the signature for changes, the signature may break, necessitating its recreation.

This approach is significantly more efficient than manually searching for the offset and has the added advantage of supporting multiple versions simultaneously. This is why signatures are widely used in game modification applications - they offer a simple and reliable method for finding offsets.

Pattern scans are frequently used in real-world scenarios for hooking specific functions in games, not only for finding offsets. For instance, game capture programs like Medal or OBS employ these scans to capture a game’s video and audio output. They could use pattern scans to locate the target function they need.

In conclusion, while signature scanning requires some initial setup and occasional updates, its benefits in terms of efficiency, multi-version support, and reliability make it an invaluable tool in the realm of game modification, vulnerability research and much more. It’s a testament to the power of automation and smart design in making complex tasks more manageable.