Importing classes from Windows DLLs
This is a work in progress...
What we want to achieve
During the work on my school's final work I had to "steal" some functionality from binary-only Windows DLLs which were unfortunately written in C++ (object-oriented). There was no static import library available so the only option was to load the DLL at runtime. Since there is no real standard in how methods/classes should be exported I've found myself in quite hopeless situation. It's quite easy to use plain C functions imported from DLLs since you just find the correct function address and then call it (you have to know the return type and arguments it expects). In case of C++, class methods are exported as plain C functions and their names are mangled (so that you can have both Class1::DoSomething() and Class2::DoSomething()). Also, you don't really have to know the return type and arguments since their description is a part of the mangled name. The main problem here is how to use these functions as instance specific (ie. how to wrap the whole thing to make it look like you're using classes and their methods).
How to achieve that
Word of caution
The described method is an ugly hack and it's likely it crash your computer, steal all your money and rape your dog. You've been warned...
This will work only on Windows (maybe only on Visual C++ compiler) on x86 architecture. Similar ugly hack is possible on other platforms as well but there are slight differences.
Method name de-mangling
There is no standard in how C++ method names should be mangled so the final function names are compiler-specific. Microsoft Visual C++ compiler mangles the names so they get exported as ??0SLNEPasswd@@QAE@ABV0@@Z or ?GetNEData@NACommDB@@QAE?AW4NAResult@1@VCString@@AAVNANEData@@@Z. Quite a mess, isn't it? Fortunately for us, the mangled named name contains all the information we need to successfully use the function in a object-oriented way. There are lots of tools laying around on the net which can be used to de-mangle the names to full method signatures. I've used IDA Pro (quite handy disassembler). Other tools should do fine, too. Here's some sample output:
; private: class CLBinary __thiscall SLNEPasswd::GetMD5Encryption(class CLBinary const &) public ?GetMD5Encryption@SLNEPasswd@@AAE?AVCLBinary@@ABV2@@Z ?GetMD5Encryption@SLNEPasswd@@AAE?AVCLBinary@@ABV2@@Z proc near
So now you know what the function expects as arguments and what it's going to return. Let's move on to how to actually call the function...
Windows C++ call convention
So far, we've been only studying the DLL. Now, let's get our hands dirty with all the ugliness I've mentioned before.
First, import the library using the LoadLibrary() WinAPI function. Then find the function address with GetProcAddress() call (use the mangled name). You get a generic function pointer so need to cast it a proper typed function pointer. Use what the disassembler told you to create a signature.
TODO: code sample?
Well, now you have the imported the function and can call it. Oh wait, can you? Unfortunately, you can't (unless you want to taste a sweet crash ;-)). The problem is this is a member method of an class so it expects the class instance to exist. Since we don't have any headers on static import library, there is no way to recover how the class is stored in the memory. The good news is we don't have to. As long as we stick to using member methods (do not try to access member variables directly), the class structure is completely hidden and unimportant. The only thing we have to do is allocate a block of memory to store the instance. Since we don't know what does the class signature looks like, we have to guess the size. It doesn't really matter if the allocated chunk is bigger, it just must not be smaller. A size of one megabyte could be considered safe but keep in mind that if you're going to make lots of "instances" the memory will be eaten quite fast (one kilobyte should be fine most classes but not all).
In the __thiscall calling convention, the this pointer is store in the ECX register so just before you call the function, you have to inline a piece of assembly to perform that. The called function will then use the pointer from ECX as this pointer and will think the memory chunk is a class instance.
TODO: code sample?
You can check with a debugger the memory is now somehow modified (only if the function's task is to modify the object somehow, of course).
So far, so good. You can now use all the methods returning simple data types (int, char *, class pointers, whatever...) but how about methods which return other classes (by reference)? This is where we're going to face another complication.
Class reference returning in __thiscall works like this: A memory is allocated for the to-be-returned instance and then the pointer is given to the method as the very first (hidden) argument. This means we have to modify the function signatures a bit to fulfill this convention. All methods returning a class reference should be modified so they return void now and should be extended by exactly one argument of type void * at first position.
TODO: sample code?
Now whenever you're going to use such function, be sure allocate the memory for the returned class first (you'll have to guess the size again).
Warning: Notice we haven't used the new operator so no constructors we being called. Be sure to import constructors just as other member methods and call them first to initialize the memory!
Bloody recursive imports
This is where the real fun starts. When you open the library with LoadLibrary(), the loader looks up the DLL's import table (the list of other DLLs this library depends on) and tries to load them recursively. For each loaded library, it finds and executed its DllMain() function. If for any single library the DllMain() returns false, the entire load fails. If you want to extract something from a simple DLL, you're fine. But in my case, there were dozens on DLLs with horrible dependencies and non-trivial DllMain()s. Since I only needed a part on just two DLLs (and didn't know how to drop the unneeded dependencies), I had to disassemble all the DLLs and modify their bytecode so all the DllMain()s were returning success.
So keep in mind that if your DLL fails to load, the problem may be several libraries away...