Extending (hacking) C++
Since I am planning to write a highly extensible language, I feel that it is important to be able to easily allow classes from C++ to be extended in Storm. This will be useful when defining custom languages for example when defining type lookup and similar things. This requires that the type system used for Storm is aware of the C++ classes and member functions, so that we can properly call and extend classes originally implemented in C++. The other way around is more difficult since C++ is not going to be magically aware of the type system I just implemented in C++. This means that we have to base the Storm type system at least partially on the C++ types, however C++ does not let us do this easily, and especially not at run-time (This is why we need Storm! It will allow such things!). It is fairly easy to extend C++ objects with more data fields. You simply allocate more memory than needed, and put your data after the object, much like this:
struct Extended { Base base; int extra; };
This is actually exactly how single inheritance works on my C++ compiler. In this case,
a pointer for Extended
will work perfectly as a substitute for Base
, since base
is at offset 0 in the struct. The same thing can also be done dynamically like this:
void *memory = new byte[sizeof(Base) + sizeof(int)]; Base *base = new (memory) Base();
Then we just need to keep track of which instances of base
are actually contain some
extra space for the integer. This is not a huge problem either since you can either rely
on the type system of the implemented language, or mark types in some common base class.
The tricky part of this is that we can not override virtual members of Base
easily.
This is vital to be able to extend or alter the behavior of the compiler when implementing
another language. As I see it, we have three options:
1: Find a way to extend the compiler without inheritance, for example passing function
pointers and make sure to expose this kind of interface at many places in the compiler.
2: Implement a custom dispatch for virtual functions and use that on all current virtual
function calls.
3: Find a way to exploit what you know about how the compiler does things.
As far as I see it, options 1 and 2 are not really good, since they will require extra code and potentially sacrificing both performance, readability and type-checking. For example, option 2 would require me to identify and replace (almost) all virtual function calls with something like this:
Function *fn = this->getFunction("foo", "Int", "Int"); fn->call(a, b);
Where getFunction
or similar will perform the function lookup. Notice that the identifier
we actually have is the string representation and not the actual vtable offset. This requires
the runtime to look up the function each time we want to call a function on an object, since
it may have been overridden by Storm.
Of course it is possible to optimize this, but that adds even more boilerplate and possibly
sacrifices even more type-checking from the C++ compiler. This is why I looked into
how the compiler implements virtual function calls.
Virtual function calls
To implement virtual function calls, the compiler uses something called a vtable. This table is unique to each type and can be thought of as a way to see which type an object actually is. The vtable itself usually contains at least some type identifier and a list of function pointers. These function pointers points to the functions to call in the case of a virtual function call. For example:
class Base { public: virtual ~Base() {} virtual int value() { return 1; } }; class Derived : public Base { public: virtual int value() { return 2; } }; // In some function: Base *object; object->value(); // Which function to call?
The function call in the last line would be implemented by a virtual function call, since the
compiler does not know at compile-time which type object
will actually point to. This is
the problem that vtables solve. When a class contains at least one virtual function, the
compiler adds a hidden pointer to the vtable. This vtable will then contain a pointer to
the correct version of value
to use. It is implemented approximatly like this:
void *base_vtable = { &Base::value }; class Base { public: void **vtable; Base() { vtable = base_vtable; } int value() { return 1; } }; void *derived_vtable = { &Derived::value }; class Derived : public Base { public: Derived() { vtable = derived_vtable; } int value() { return 2; } } // In some function: Base *object; (*object->vtable)(); // call the function (typecasts omitted for readability).
So the basic idea is that if we somehow get hold of the vtable for a type, we can make a copy of it and alter it anyway we like. When we then want to create an instance of a class partly defined in Storm, we can simply create the C++ object as above, possibly allocating extra memory for it, and then replace the vtable of the object to our version.
One way of doing this would be to wait for an object of the correct type, and then read the vtable then. However, since this requires an instance of an actual object, it is not optimal. Can we do better?
If we compile a simple source file with some virtual functions with cl /FAcs test.cpp
,
we get the following output (I have altered some symbol names for readability):
BASE_CLASS_DESC DD FLAT:TYPE_DESC ; VT::`RTTI Base Class Descriptor at (0,-1,0,64)' DD 00H DD 00H DD 0ffffffffH DD 00H DD 040H DD FLAT:HIERARCHY rdata$r ENDS ; COMDAT BASE_CLASS_ARR rdata$r SEGMENT BASE_CLASS_ARR DD FLAT:BASE_CLASS_DESC ; VT::`RTTI Base Class Array' rdata$r ENDS ; COMDAT HIERARCHY rdata$r SEGMENT HIERARCHY DD 00H ; VT::`RTTI Class Hierarchy Descriptor' DD 00H DD 01H DD FLAT:BASE_CLASS_ARR rdata$r ENDS ; COMDAT TYPE_DESC _DATA SEGMENT TYPE_DESC DD FLAT:??_7type_info@@6B@ ; VT `RTTI Type Descriptor' DD 00H DB '.?AVVT@@', 00H _DATA ENDS ; COMDAT OBJECT_LOCATOR rdata$r SEGMENT OBJECT_LOCATOR DD 00H ; VT::`RTTI Complete Object Locator' DD 00H DD 00H DD FLAT:TYPE_DESC DD FLAT:HIERARCHY rdata$r ENDS ; COMDAT VTABLE CONST SEGMENT VTABLE DD FLAT:OBJECT_LOCATOR ; VT::`vftable' DD FLAT:VEC_DTOR DD FLAT:VALUE_FN ; Function compile flags: /Odtp CONST ENDS
This is the interesting part of the output file, where the compiler defines the virtual
function table, along with its symbol name (see VTABLE
here). How convenient! It turns
out that the name of the vtable follows a fairly simple pattern, ??_7Class@namespace@namespace@@6B@
where Class
and namespace
is the reversed full name of the class itself. We can also see that
the virtual function table starts with something called an object locator, then goes on with
a pointer to the destructor, followed by the single virtual function in the class. I do not know
why, but it turns out the VTABLE
symbol actually points to the second DD
element, so
the object locator is at VTABLE[-1]
, and the destructor at VTABLE[0]
.
Now, how do we access a generic symbol from C++? The simple answer is that we don't, since the
name-mangling is designed to not accidentally collide with the vtable symbol. This is where
we have to resort to .asm
-files! We can generate a file with functions accessible from C++
that returns the symbol value to us. This is where I use my preprocessor that I described
earlier.
.386 .model flat, c .data ??_7Object@storm@@6B@ proto syscall cppVTable_storm__Object proto .code cppVTable_storm__Object proc mov eax, ??_7Object@storm@@6B@ ret cppVTable_storm__Object endp end
The two lines after the .data
directive declares the symbol we want, and the function
we export to C++. I think this declares the vtable pointer as a function with the syscall
calling convention, not too nice, but since both have pointer-size it works anyway!
The small function in the .code
section simply returns the value of this symbol in eax
,
as defined by the cdecl
calling convention. Now we can simply declare the function
in C++ and use it:
extern "C" void *cppVTable_storm__Object();
Now we just need to copy the vtable, and to do that we will need to know its size. This is a little tricky since the compiler never intended anyone to actually read and modify the vtable at runtime, so it does not include a size that I am aware of at least. Instead, we exploit our knowledge that the vtable is a table of pointers (we saw that from the asm listing earlier). Therefore we can write the following function:
bool isNewVTable(void *addr) { nat *p = (nat *)addr; if (p[0] < 0xFF && p[1] < 0xFF) return true; return false; } nat vtableSize(void *vtable) { void **table = (void **)vtable; assert(readable(table)); // Find the size by looking at each address. nat size = 1; while (readable(table + size)) { if (!readable(table[size])) return size; if (isNewVTable(table[size])) return size; size++; } return size - 1; }
Where readable is a small function that asks the operating system if the memory at the
given address is readable. The problem is that we may think the vtable is too long if
the linker decides to pack the vtables together. This is why I have a small heurustic
for detecting the start of a new vtable: The function isNewVTable
checks if addr
is
probably pointing to the complete object locator we saw in the asm listing, since this
is the first element of a new vtable. It relies on the fact that the
two integer values are usually much smaller than pointers. We may still fail, but at least
we're overestimating the size. When I tested this, it seems like at least some vtables
end with a 0x0 pointer, which is convenient.
Now, we know the start address and the size of the vtable. Now we can just copy the vtable to new memory and then replace the vtable of some object with our new copy! But to do something useful we actually need to be able to modify the vtable as well. How do we know where the compiler has allocated each function without having to reverse-engineer the logic the compiler uses (this involves parsing much of C++, which is hard)?
The answer lies in pointer-to-members. In C++, when you create a pointer to a virtual member, the resulting function pointer will actually respect overloaded functions. For example, consider the example classes above along with this code:
// create a pointer to the value function. int (Base::*ptr)() = &Base::value; Derived *d = new Derived(); d->*ptr(0); // Will call Derived::value()
This means that whatever ptr
points to must know which index in the virtual function
table to look at to find the specified function! What if we read the machine code
and figure it out?
It turns out that for cdecl
functions, the code that is pointed to by the ptr is
this:
8B 44 24 04: mov eax, DWORD PTR[esp + 4] 0B 00 : mov eax, DWORD PTR[eax] FF 60 XX : jmp DWORD PTR[eax + XX]
Where XX
is the offset in bytes into the vtable we should look! Simple, just make sure
we're actually reading the right machine code and then just look at XX
to find out.
Feels good to write code that can issue the error "Unexpected machine code"...
However, we're not really there yet. In debug mode, the Visual Studio compiler does a
trick to simplify incremental compilation (yes, it supports re-compilation of code
while debugging to some extent). Somewhere in the binary, they have a big table
of where each function is actually located. They use this so that when a function
is recompiled and therefore has to be moved, they only have to modify the table and
not everyone referring the function. To increase performance, the table actually consists
of machine code, jmp
instructions to be precise. This means that our pointer-to-member
will only give us a jmp
instruction that we need to follow before we get to the
interesting machine code. This is quickly done, all we have to do is to see if the
first byte of the function is E9
(which is jmp
) and read the relative offset.
Now, we know the offset in the vtable and can therefore replace it with a function of our own. It does not even have to be a member of the class as long as it has the right number of parameters and the right calling convention. You can use it like this for example:
/** * Test class for vtables. */ class VTest : public Object { STORM_CLASS; public: VTest(); ~VTest(); virtual int STORM_FN returnOne(); virtual int STORM_FN returnTwo(); }; int STORM_FN replaceOne(VTest *me) { return 3; } int STORM_FN replaceTwo(VTest *me) { return 4; } BEGIN_TEST(VTableTest) { code::VTable table(VTest::cppVTable()); VTest *test = new VTest(); // Replace the vtable with our copy table.setTo(test); // Does it still work? CHECK_EQ(test->returnOne(), 1); CHECK_EQ(test->returnTwo(), 2); // Replace two functions with the ones above table.set(address(&VTest::returnOne), address(&replaceOne)); table.set(address(&VTest::returnTwo), address(&replaceTwo)); // Now they act differently! CHECK_EQ(test->returnOne(), 3); CHECK_EQ(test->returnTwo(), 4); delete test; } END_TEST
The macro STORM_FN
expands to __cdecl
to set the correct calling convention. It
is also used to inform my pre-processor that they are to be exported to Storm. The
STORM_CLASS
macro declares the cppVTable
function, and tells my pre-processor
to generate the asm-code to extract the vtable pointer.
It actually works, which is quite amazing!
Dynamic calls
Another feature that is needed when trying to execute the syntax (as explained in It's alive!) is dynamic function calls. With this, I mean that the compiler need to be able to execute an arbitrary function with an arbitrary number of parameters. The number of parameters is not known at compile-time, so we need some kind of mechanism to handle this. Java has built-in support for this through its reflection API:
// Initialized from the parsed syntax tree: Class[] paramTypes = new Class[]{String.class, Integer.class}; Object[] paramValues = new Object[]{"Hello", 4}; MyClass obj = ...; // What is done to call the function: Class<MyClass> c = MyClass.class; Method m = c.getMethod("foo", paramTypes); m.invoke(obj, paramValues);
However, this is not possible in C++ without some creative use of inline assembly. It is worth
to note that function calls for C++ is more complex than in Java, since C++ allows arbitrary
types to be passed by value instead of just allowing by reference except for primitive types. Therefore
in C++ we can not just pass an array of void *
in the general case, since then we will only be
able to call a subset of the implemented functions. Instead I chose the following api:
int fn(std::string a, int b) { return b; } std::string p1 = "Hello"; int p2 = 4; FnCall call; call.param(p1).param(p2).call<int>(&fn);
The major trick here is that the param
member function is templated, so that we can extract
some type information when doing the call. What we need to be able to manipulate a value of
a type is the following:
- The size of the type.
- A way of copying instances of the type to new memory.
- A way of destroying instances of the type.
All this can be obtained by some clever use of templates and function pointers. The param
function is implemented like this:
struct Param { void (*copy)(const void *, void *); void (*destroy)(void *); const void *value; nat size; }; template <class T> FnCall ¶m(const T &p) { Param par = { &FnCall::copy<T>, &FnCall::destroy<T>, &p, sizeof(T) }; params.push_back(par); return *this; }
What the function does is to save the needed type information into the Param
struct, along with
a reference to the actual value we shall pass to the function.
As I have chosen to keep a reference to the parameter passed to the param
function,
which means that you can not do things like this:
FnCall call; call.param(std::string("foo")). ...;
since the temporary std::string
will have been destroyed by the time we will actually call the
function. We could copy the object as well, but that would make the implementation slightly more
complex and using an extra copy. I will probably implement this later when a inter-thread call
is needed.
The functions copy
and destroy
are declared like this:
template <class T> static void copy(const void *from, void *to) { const T* f = (const T*)from; new (to) T(*f); } template <class T> static void destroy(const void *obj) { T *o = (T *)obj; o->~T(); }
They are simply a wrapper around the copy-constructor of a type and the destructor. As far as I am aware, it is not possible to take the address of the constructor. Therefore we need this wrapper that does a placement new operation on some memory.
Now, there is only one tricky thing left, the return value. How is the return value passed (assuming
the cdecl
calling convention)? The answer is sadly quite complicated:
- if we return a number smaller than an int or a pointer or reference, it is returned in
eax
. - if we return a 64-bit number, it is returned in
eax
andedx
. - if we return a floating-point number, it is returned on the floating point stack.
- user-defined types (even those small enough) are returned by letting the calling function allocate space for the value on the stack and pass the address of this memory as the first parameter to the function.
Sadly, we need to take care of all these cases, but the API to the outside can be quite clear since we can use templates and overloading to figure out what we shall do. For primitive types, we can simply do something similar to this:
template <class T> T call(void *fn); template <> int call(void *fn); template <> int64 call(void *fn); // and so on.
The trick is that we need to know if T is a pointer or a reference as well. This is not (as far as I know) doable using simple template overloads, ie this is not valid, it will generate a compilation error:
template <class T> T call(void *fn); template <class Z> Z *call(void *fn);
Instead we can do this by a helper template struct. The partial specialization for structs and classes is more powerful than for functions in C++ (don't ask me why...). Therefore, we can do like this:
template <class T> struct TypeInfo { static inline bool reference() { return false; } static inline bool pointer() { return false; } }; template <class T> struct TypeInfo<T *> { static inline bool reference() { return false; } static inline bool pointer() { return true; } }; template <class T> struct TypeInfo<T &> { static inline bool reference() { return true; } static inline bool pointer() { return false; } }; template <class T> T call(void *fn) { if (TypeInfo<T>::pointer() || TypeInfo<T>::reference()) { // pointer is returned in eax } else { // we need to allocate memory and add another parameter } }
We get a struct TypeInfo
with two member functions in this case. The functions will tell us if
T is a pointer, a reference or none of it. This will be a constant expression at compile time,
so the compiler will most likely optimize our if-statement away. In C++11 we would use constexpr
as well.
When we have came this far, we can simply implement the different cases according to this pattern:
void FnCall::call(int *result, void *fn) { nat sz = paramsSize(); void *stackTop; __asm { sub esp, sz; mov stackTop, esp; } copyParams(stackTop); __asm { call fn; mov ebx, result; mov [ebx], eax; add esp, sz; } }
What the code does is that it first allocates some memory on the stack, much like alloca
does. We
also remember a pointer to the old stack pointer. Since stacks grows towards lower addresses on an
x86, the old stack pointer is the first address we can write parameters to. After that we copy parameters
to the stack, call the function and updates the integer result result
is pointing to to store the
return value.
Copy parameters is not too difficult now that we have our type information. We can simply run our copy function to the stack memory we allocated. We just need to make sure to align each object to 4 bytes. It can be done like this:
void FnCall::copyParams(void *to) { byte *at = (byte *)to; for (nat i = 0; i < params.size(); i++) { Param &p = params[i]; nat s = roundUp(p.size, sizeof(void *)); (*p.copy)(p.value, at); at += s; } }
With all that in place, we can finally do things like this in C++:
int foo(int a, int b) { return a + b; } int main() { int p1 = 1, p2 = 2; FnCall call; int r = call.param(p1).param(p2).call<int>(&foo); return r; }
However, we have not use the destroy
function, see Exploring function calls.
Comments
New comment
You can use GitHub flavored markdown here. Parsed by Parsedown, which does not support all of GitHub's features. For example, specifying the language of code listings is not supported.