Decompiling: Difference between revisions
Shibboleet (talk | contribs) (class mapping) |
Shibboleet (talk | contribs) |
||
Line 133: | Line 133: | ||
== Class Mapping == | == Class Mapping == | ||
=== Base Class === | === Base Class === | ||
The first step to decompiling a class is to map out the class itself. You need to be sure that you document every member, | The first step to decompiling a class is to map out the class itself. You need to be sure that you document every member, its type (as close as you can guess), its virtual functions, and more. The easiest way to achieve this is to look at the class's ''constructor''. Seen below, is an example of a ''constructor''. | ||
[[Image:NameObj_Ctor.png|frameless|700px|The constructor for ''NameObj''.]] | [[Image:NameObj_Ctor.png|frameless|700px|The constructor for ''NameObj''.]] |
Revision as of 12:19, 6 April 2024
This page is in progress and may contain incomplete information or editor's notes. |
---|
Introduction
Decompiling is the process of taking assembly code and turning it back into a higher level language such as C or C++. It is essentially the reverse of compiling. Matching decompilation is the process of decompiling, but having the compiled code match the original assembly 1:1. While matching decompilation is harder than normal decompiling, it can become easier when you understand the patterns of the compiler used. This page aims to let new people understand how this process works, and hopefully be able to get new people into decompilation! While you do not need to be an expert at C or C++ to decompile, it is recommended that you have some experience before attempting decompilation. It is also very recommended that you have some prior knowledge of PowerPC assembly, as that is the key to understanding how a function works. This document is a good way to learn or refresh knowledge of the PowerPC architecture. This document is also good to learn some of the patterns that CodeWarrior does.
Getting Set Up
To begin decompiling Super Mario Galaxy, you first need to set up the environment. You will need the following tools:
- Git (Windows)
- Any IDE (Visual Studio Recommended)
- Python 3.9.7
- IDA Pro (recommended) or Ghidra (Not recommended)
- SMG1 Korean IDB (For IDA)
- A Super Mario Galaxy Korean region DOL.
After you have acquired all of these, setting up Petari is very simple.
- With a new command prompt open, type in git clone https://github.com/shibbo/Petari. This will clone the repository into a directory called "Petari".
- In this new "Petari" folder, place the SMG1 Korean main.dol into this folder, and rename it to baserom.dol.
- Open a new command prompt in the "Petari" folder.
- Run the command python setup.py. This will verify your DOL and install all of the libraries used, and the compilers we use to compile the code.
- Run the command python build.py. This will build the entire project. If you see any warnings, do not worry about them.
Environment
To properly utilize and use Petari, it is necessary to understand the structure of the environment. Petari is structured in a way that makes it easy to access and use.
Folder Name | Description |
---|---|
archive | The folder that gets created when build.py -link is ran. Contains an archive of the object files in each library. |
build | The folder that gets created when build.py is ran. Contains the compiled object files. |
csv | Contains CSV files that store the status of functions being matched. |
data | Contains map files and the percentage badges for the GitHub repo. |
docs | The folder that gets created when progress.py is ran. Contains all of the Markdown documentation for matching status. |
include | Contains all of the header files for Super Mario Galaxy specific code. |
libs | Contains this folder structure but for different libraries used by the game. |
scripts | Various scripts used in IDA for generating headers. |
source | Contains all of the source files for Super Mario Galaxy specific code. |
Libraries
Super Mario Galaxy uses a lot of libraries for certain functionality such as heaps, layouts, OS specific code, and more. Each library described in the table below are statically linked to the game, so every library's used code is inside of the main.dol.
Non-SMG Libraries
Name | Language | Description |
---|---|---|
JSystem | C++ | Contains classes for backend things, such as heaps and linked lists. |
MetroTRK | C | Target Resident Kernel, for debugging. |
MSL_C | C & C++ | Contains standard library functions and types. |
nw4r | C++ | Contains classes for sounds, layouts, and more. (SMG only uses the layouts and some math functions) |
Runtime | C & C++ | Contains functions that relate to CodeWarrior's runtime code generation (ctor / dtor lists, etc) |
RVL_SDK | C | Contains functions that relate to the Wii's "OS". |
RVLFaceLib | C | Contains functions that relate to Miis. |
SMG Libraries
All of Super Mario Galaxy's libraries are written in C++.
Header text | Header text |
---|---|
Animation | Library for animation playing. |
AreaObj | Library for invisible areas that can be accessed by players in the game. |
AudioLib | N/A |
Boss | Library for all of the bosses and mini-bosses in the game. |
Camera | Library for all camera types. |
Demo | Library for all cutscenes. |
Effect | Library for all effect rendering. |
Enemy | Library for all enemies. |
GameAudio | N/A |
Gravity | Library for all of the gravity types in the game. |
LiveActor | Library for LiveActor, which is an actor that can switch states. |
Map | Library for map classes that do not directly interact with the player. (ie switches) |
MapObj | Library for all of the map objects in the game. |
NameObj | Library for the most basic form of an object in the game. |
NPC | Library for all of the non-playable characters. |
NWC24 | Library for the mail system in the game. |
Player | Library for all of the player related functions. |
RhythmLib | N/A |
Ride | Library for all of the actors that can be controlled by the player. |
Scene | Library for all of the game scene related code. |
Screen | Library for all of the layouts in the game. |
Speaker | Library for the sound effect playing done on the Wiimote. |
System | Library for a lot of the game's backend systems. |
Util | Library for utility functions and classes. |
Basics
To properly decompile, it is vital to know how a lot of the assembly will translate into C / C++ code. Here are a couple of patterns that you will see when decompiling code.
Class Mapping
Base Class
The first step to decompiling a class is to map out the class itself. You need to be sure that you document every member, its type (as close as you can guess), its virtual functions, and more. The easiest way to achieve this is to look at the class's constructor. Seen below, is an example of a constructor.
There are a couple of takeaways from this screenshot:
- The constructor passes an argument, which is a const char * (contained in r4) and is stored in (r3 + 0x4).
- (r3 + 0x0) is where the vtable is usually stored when a class has virtual functions. There are rare execptions.
- (r3 + 0x8) is stored with a sth, which means that it is a short datatype.
- (r3 + 0xA) is also stored with a sth, but with a -1 value, so we know for sure that this type is signed.
Keep in mind that a constructor does not have to initialized every single member variable in the class! So there could be other members in a class that aren't mentioned in the constructor at all. After you look at the constructor, look around at the member functions to see if they use any members that are not initialized in the constructor. You can always verify if your class setup is correct when you can find where this class is created using the new operator. Check if the size passed to the new call matches the size of the class that you have mapped. If it is smaller, you are missing arguments. If it is bigger, you have too many! Remember that the vtable is implicitly stored at (this + 0x0), so you do not have to explicitly define it. With all of these members documented, our class setup looks a little like this so far:
class NameObj { public: NameObj(const char *pName); /* remember that the vtable will be placed here once we define our virtuals! */ /* 0x4 */ const char* mName; /* 0x8 */ volatile u16 mFlags; /* 0xA */ s16 mExecutorIdx; };
After the members comes the vtable, or virtual table. It is an array of function pointers that can be overridden by classes that inherit the parent class. The vtable for NameObj looks like this:
To document the vtable, you simply document every single function placed here that contains the class name of the class you are currently decompiling. Since NameObj is a base class, every single function here is going to be defined. If a class overrides a function, you will only document the functions that are overridden. After documenting the vtable, our class looks something like this:
class NameObj { public: NameObj(const char *pName); virtual ~NameObj(); virtual void init(const JMapInfoIter &rIter); virtual void initAfterPlacement(); virtual void movement(); virtual void draw() const; virtual void calcAnim(); virtual void calcViewAndEntry(); /* remember that the vtable will be placed here once we define our virtuals! */ /* 0x4 */ const char* mName; /* 0x8 */ volatile u16 mFlags; /* 0xA */ s16 mExecutorIdx; };
Once the vtable is complete, you want to document all of the member functions that are in the class. Since Super Mario Galaxy 1 has a symbol map, we can easily find the member functions that NameObj contains. Once you have figured out their return types and their arguments, you can finish mapping out a class! After finding all of NameObj's member functions, the class looks like this:
class NameObj { public: NameObj(const char *pName); virtual ~NameObj(); virtual void init(const JMapInfoIter &rIter); virtual void initAfterPlacement(); virtual void movement(); virtual void draw() const; virtual void calcAnim(); virtual void calcViewAndEntry(); void initWithoutIter(); void setName(const char *pName); void executeMovement(); void requestSuspend(); void requestResume(); void syncWithFlags(); /* remember that the vtable will be placed here once we define our virtuals! */ /* 0x4 */ const char* mName; /* 0x8 */ volatile u16 mFlags; /* 0xA */ s16 mExecutorIdx; };
Loops
Predefined bounds
Let's take a simple loop that stores nullptr in each 8 elements of a pointer array.
class TestClass { public: TestClass(); int* mPointers[8]; }; TestClass::TestClass() { for (int i = 0; i < 8; i++) { mPointers[i] = nullptr; } }
The output assembly would look something like:
li r0, 8 # there are 8 elements in this loop li r5, 0 # the value to store in the element (nullptr) li r4, 0 # the current element offset in the loop mtctr r0 # move the number of iterations into the counter register (8) loop: stwx r5, r3, r4 # store 0 (r5) into the array at r3 (this) + r4 (our current offset, which is i * 4) addi r4, r4, 4 # increment our offset by sizeof(int) since integers are 32-bits bdnz+ loop # branch back to our loop again
Variable Length Bounds
Let's take a simple loop that stores nullptr in each element of a variable-length array. We will have a class with two members, one that contains the pointer array itself, and another that stores the number of pointers.
class TestClass { public: TestClass(); int** mPointers; int mNumPointers; }; TestClass::TestClass() { mNumPointers = 8; for (int i = 0; i < mNumPointers ; i++) { mPointers[i] = nullptr; } }
Because we do not know how many pointers we have stored, we cannot use the counter register like we did with a fixed-size array. Instead, the compiler will use a cmpw (signed integer) or cmplw (unsigned) instruction to compare the current iteration to how many pointers are stored in the class.
li r0, 8 # there are 8 elements in this loop li r7, 0 # the value to store in the element (nullptr) stw r0, 4(r3) # r3 + 4 is the offset to our member variable "mNumPointers" mr r6, r7 # simple copy of the 0 value so we can also use it for our counter li r4, 0 # load 0 into our offset b loop # branch into our loop loop: lwz r5, 0(r3) # load our pointer array from this + 0 addi r7, r7, 1 # increment our index by 1 stwx r6, r5, r4 # store our nullptr value (r6) into r5 + r4 (ptrArray + currentOffset) addi r4, r4, 4 # increment our offset by sizeof(int) since integers are 32-bits lwz r0, 4(r3) # load our number of pointers we will increment by (mNumPointers) cmpw r7, r0 # compare our number of pointers to the current index in our loop blt+ loop # branch if the number is less than mNumPointers
Structure Access In Arrays (Pointer Array)
More complex forms of loops comes into play when you are iterating through structures and storing / loading members from those structures. Let's take this class for example:
struct TestStruct { int SomeMember; int AnotherMember; }; class TestClass { public: TestClass(); void storeVals(); TestStruct** mStructures; int mNumStructures; };
For the sake of simplicity, let's assume that the TestClass constructor initializes the number of structures to 8, and constructs them accordingly. With that in mind, let's see how a struct store will work.
void TestClass::storeVals() { for (int i = 0; i < mNumStructures; i++) { mStructures[i]->AnotherMember = 5; } }
The output assembly would look something like:
li r7, 0 # our "i" used in the loop, starts at 0 li r4, 0 # our current offset into the array li r6, 5 # the value we are storing into the array b loop loop: lwz r5, 0(r3) # load the array pointer addi r7, r7, 1 # increment our current index (i) by 1 lwzx r5, r5, r4 # load the current structure, mStructures[i] where r4 is the current offset addi r4, r4, 4 # increment our offset by sizeof(TestStruct*), which is 4 stw r6, 4(r5) # store our value (5) into (mStructures[i] + 4), which is our AnotherMember lwz r0, 4(r3) # load the number of structures from TestClass cmpw r7, r0 # compare the current index to our value in TestClass blt+ loop # loop back if it is less than the value
Structure Access In Arrays (Direct Array)
Let's take the previous example and modify it a little. Instead of making an array of pointers to the struct instances, let's store the array directly into our class instance.
struct TestStruct { int SomeMember; int AnotherMember; }; class TestClass { public: TestClass(); void storeVals(); TestStruct mStructures[8]; };
Again, let us assume that the array has already been constructed and everything is initialized as it should be.
void TestClass::storeVals() { for (int i = 0; i < 8; i++) { mStructures[i].AnotherMember = 5; } }
The output assembly would look something like:
li r0, 8 # load our number of iterations (8) li r4, 0 # current offset into the array. initialized at 0 li r6, 5 # the value to store into the array mtctr r0 # move the number of iterations into the counter register loop: add r5, r3, r4 # jump to the offset into the array. r5 = (this + 0) + r4 addi r4, r4, 8 # increment our current offset by sizeof(TestStruct), which is 8 stw r6, 4(r5) # store our constant value (5) into the loaded struct + 4 (AnotherMember) bdnz+ loop # branch back to the loop if the counter is not 8