<< PreviousNext >>

Extending (hacking) C++

Since I am planning to write a highly extensible language, I feel that it is important to be able to easily allow classes from C++ to be extended in Storm. This will be useful when defining custom languages for example when defining type lookup and similar things. This requires that the type system used for Storm is aware of the C++ classes and member functions, so that we can properly call and extend classes originally implemented in C++. The other way around is more difficult since C++ is not going to be magically aware of the type system I just implemented in C++. This means that we have to base the Storm type system at least partially on the C++ types, however C++ does not let us do this easily, and especially not at run-time (This is why we need Storm! It will allow such things!). It is fairly easy to extend C++ objects with more data fields. You simply allocate more memory than needed, and put your data after the object, much like this:

struct Extended {
  Base base;
  int extra;
};

This is actually exactly how single inheritance works on my C++ compiler. In this case, a pointer for Extended will work perfectly as a substitute for Base, since base is at offset 0 in the struct. The same thing can also be done dynamically like this:

void *memory = new byte[sizeof(Base) + sizeof(int)];
Base *base = new (memory) Base();

Then we just need to keep track of which instances of base are actually contain some extra space for the integer. This is not a huge problem either since you can either rely on the type system of the implemented language, or mark types in some common base class.

The tricky part of this is that we can not override virtual members of Base easily. This is vital to be able to extend or alter the behavior of the compiler when implementing another language. As I see it, we have three options: 1: Find a way to extend the compiler without inheritance, for example passing function pointers and make sure to expose this kind of interface at many places in the compiler. 2: Implement a custom dispatch for virtual functions and use that on all current virtual function calls. 3: Find a way to exploit what you know about how the compiler does things.

As far as I see it, options 1 and 2 are not really good, since they will require extra code and potentially sacrificing both performance, readability and type-checking. For example, option 2 would require me to identify and replace (almost) all virtual function calls with something like this:

Function *fn = this->getFunction("foo", "Int", "Int");
fn->call(a, b);

Where getFunction or similar will perform the function lookup. Notice that the identifier we actually have is the string representation and not the actual vtable offset. This requires the runtime to look up the function each time we want to call a function on an object, since it may have been overridden by Storm. Of course it is possible to optimize this, but that adds even more boilerplate and possibly sacrifices even more type-checking from the C++ compiler. This is why I looked into how the compiler implements virtual function calls.

Virtual function calls

To implement virtual function calls, the compiler uses something called a vtable. This table is unique to each type and can be thought of as a way to see which type an object actually is. The vtable itself usually contains at least some type identifier and a list of function pointers. These function pointers points to the functions to call in the case of a virtual function call. For example:

class Base {
public:
  virtual ~Base() {}
  virtual int value() { return 1; }
};

class Derived : public Base {
public:
  virtual int value() { return 2; }
};

// In some function:
Base *object;
object->value(); // Which function to call?

The function call in the last line would be implemented by a virtual function call, since the compiler does not know at compile-time which type object will actually point to. This is the problem that vtables solve. When a class contains at least one virtual function, the compiler adds a hidden pointer to the vtable. This vtable will then contain a pointer to the correct version of value to use. It is implemented approximatly like this:

void *base_vtable = { &Base::value };

class Base {
public:
  void **vtable;
  Base() { vtable = base_vtable; }
  int value() { return 1; }
};

void *derived_vtable = { &Derived::value };

class Derived : public Base {
public:
  Derived() { vtable = derived_vtable; }
  int value() { return 2; }
}

// In some function:
Base *object;
(*object->vtable)(); // call the function (typecasts omitted for readability).

So the basic idea is that if we somehow get hold of the vtable for a type, we can make a copy of it and alter it anyway we like. When we then want to create an instance of a class partly defined in Storm, we can simply create the C++ object as above, possibly allocating extra memory for it, and then replace the vtable of the object to our version.

One way of doing this would be to wait for an object of the correct type, and then read the vtable then. However, since this requires an instance of an actual object, it is not optimal. Can we do better?

If we compile a simple source file with some virtual functions with cl /FAcs test.cpp, we get the following output (I have altered some symbol names for readability):

BASE_CLASS_DESC DD FLAT:TYPE_DESC        ; VT::`RTTI Base Class Descriptor at (0,-1,0,64)'
    DD    00H
    DD    00H
    DD    0ffffffffH
    DD    00H
    DD    040H
    DD    FLAT:HIERARCHY
rdata$r    ENDS
;    COMDAT BASE_CLASS_ARR
rdata$r    SEGMENT
BASE_CLASS_ARR DD    FLAT:BASE_CLASS_DESC        ; VT::`RTTI Base Class Array'
rdata$r    ENDS
;    COMDAT HIERARCHY
rdata$r    SEGMENT
HIERARCHY DD    00H                    ; VT::`RTTI Class Hierarchy Descriptor'
    DD    00H
    DD    01H
    DD    FLAT:BASE_CLASS_ARR
rdata$r    ENDS
;    COMDAT TYPE_DESC
_DATA    SEGMENT
TYPE_DESC DD FLAT:??_7type_info@@6B@        ; VT `RTTI Type Descriptor'
    DD    00H
    DB    '.?AVVT@@', 00H
_DATA    ENDS
;    COMDAT OBJECT_LOCATOR
rdata$r    SEGMENT
OBJECT_LOCATOR DD    00H                    ; VT::`RTTI Complete Object Locator'
    DD    00H
    DD    00H
    DD    FLAT:TYPE_DESC
    DD    FLAT:HIERARCHY
rdata$r    ENDS
;    COMDAT VTABLE
CONST    SEGMENT
VTABLE  DD    FLAT:OBJECT_LOCATOR            ; VT::`vftable'
    DD    FLAT:VEC_DTOR
    DD    FLAT:VALUE_FN
; Function compile flags: /Odtp
CONST    ENDS

This is the interesting part of the output file, where the compiler defines the virtual function table, along with its symbol name (see VTABLE here). How convenient! It turns out that the name of the vtable follows a fairly simple pattern, ??_7Class@namespace@namespace@@6B@ where Class and namespace is the reversed full name of the class itself. We can also see that the virtual function table starts with something called an object locator, then goes on with a pointer to the destructor, followed by the single virtual function in the class. I do not know why, but it turns out the VTABLE symbol actually points to the second DD element, so the object locator is at VTABLE[-1], and the destructor at VTABLE[0].

Now, how do we access a generic symbol from C++? The simple answer is that we don't, since the name-mangling is designed to not accidentally collide with the vtable symbol. This is where we have to resort to .asm-files! We can generate a file with functions accessible from C++ that returns the symbol value to us. This is where I use my preprocessor that I described earlier.

.386
.model flat, c

.data

??_7Object@storm@@6B@ proto syscall
cppVTable_storm__Object proto

.code

cppVTable_storm__Object proc
    mov eax, ??_7Object@storm@@6B@
    ret
cppVTable_storm__Object endp

end

The two lines after the .data directive declares the symbol we want, and the function we export to C++. I think this declares the vtable pointer as a function with the syscall calling convention, not too nice, but since both have pointer-size it works anyway!

The small function in the .code section simply returns the value of this symbol in eax, as defined by the cdecl calling convention. Now we can simply declare the function in C++ and use it:

extern "C" void *cppVTable_storm__Object();

Now we just need to copy the vtable, and to do that we will need to know its size. This is a little tricky since the compiler never intended anyone to actually read and modify the vtable at runtime, so it does not include a size that I am aware of at least. Instead, we exploit our knowledge that the vtable is a table of pointers (we saw that from the asm listing earlier). Therefore we can write the following function:

bool isNewVTable(void *addr) {
  nat *p = (nat *)addr;
  if (p[0] < 0xFF && p[1] < 0xFF)
    return true;
  return false;
}

nat vtableSize(void *vtable) {
  void **table = (void **)vtable;
  assert(readable(table));

  // Find the size by looking at each address.
  nat size = 1;
  while (readable(table + size)) {
    if (!readable(table[size]))
      return size;

    if (isNewVTable(table[size]))
      return size;

    size++;
  }

  return size - 1;
}

Where readable is a small function that asks the operating system if the memory at the given address is readable. The problem is that we may think the vtable is too long if the linker decides to pack the vtables together. This is why I have a small heurustic for detecting the start of a new vtable: The function isNewVTable checks if addr is probably pointing to the complete object locator we saw in the asm listing, since this is the first element of a new vtable. It relies on the fact that the two integer values are usually much smaller than pointers. We may still fail, but at least we're overestimating the size. When I tested this, it seems like at least some vtables end with a 0x0 pointer, which is convenient.

Now, we know the start address and the size of the vtable. Now we can just copy the vtable to new memory and then replace the vtable of some object with our new copy! But to do something useful we actually need to be able to modify the vtable as well. How do we know where the compiler has allocated each function without having to reverse-engineer the logic the compiler uses (this involves parsing much of C++, which is hard)?

The answer lies in pointer-to-members. In C++, when you create a pointer to a virtual member, the resulting function pointer will actually respect overloaded functions. For example, consider the example classes above along with this code:

// create a pointer to the value function.
int (Base::*ptr)() = &Base::value;
Derived *d = new Derived();
d->*ptr(0); // Will call Derived::value()

This means that whatever ptr points to must know which index in the virtual function table to look at to find the specified function! What if we read the machine code and figure it out?

It turns out that for cdecl functions, the code that is pointed to by the ptr is this:

8B 44 24 04: mov eax, DWORD PTR[esp + 4]
0B 00      : mov eax, DWORD PTR[eax]
FF 60 XX   : jmp DWORD PTR[eax + XX]

Where XX is the offset in bytes into the vtable we should look! Simple, just make sure we're actually reading the right machine code and then just look at XX to find out. Feels good to write code that can issue the error "Unexpected machine code"...

However, we're not really there yet. In debug mode, the Visual Studio compiler does a trick to simplify incremental compilation (yes, it supports re-compilation of code while debugging to some extent). Somewhere in the binary, they have a big table of where each function is actually located. They use this so that when a function is recompiled and therefore has to be moved, they only have to modify the table and not everyone referring the function. To increase performance, the table actually consists of machine code, jmp instructions to be precise. This means that our pointer-to-member will only give us a jmp instruction that we need to follow before we get to the interesting machine code. This is quickly done, all we have to do is to see if the first byte of the function is E9 (which is jmp) and read the relative offset.

Now, we know the offset in the vtable and can therefore replace it with a function of our own. It does not even have to be a member of the class as long as it has the right number of parameters and the right calling convention. You can use it like this for example:

/**
 * Test class for vtables.
 */
class VTest : public Object {
  STORM_CLASS;
public:
  VTest();
  ~VTest();

  virtual int STORM_FN returnOne();
  virtual int STORM_FN returnTwo();
};

int STORM_FN replaceOne(VTest *me) {
  return 3;
}

int STORM_FN replaceTwo(VTest *me) {
  return 4;
}

BEGIN_TEST(VTableTest) {
  code::VTable table(VTest::cppVTable());

  VTest *test = new VTest();
  // Replace the vtable with our copy
  table.setTo(test);

  // Does it still work?
  CHECK_EQ(test->returnOne(), 1);
  CHECK_EQ(test->returnTwo(), 2);

  // Replace two functions with the ones above
  table.set(address(&VTest::returnOne), address(&replaceOne));
  table.set(address(&VTest::returnTwo), address(&replaceTwo));

  // Now they act differently!
  CHECK_EQ(test->returnOne(), 3);
  CHECK_EQ(test->returnTwo(), 4);

  delete test;

} END_TEST

The macro STORM_FN expands to __cdecl to set the correct calling convention. It is also used to inform my pre-processor that they are to be exported to Storm. The STORM_CLASS macro declares the cppVTable function, and tells my pre-processor to generate the asm-code to extract the vtable pointer.

It actually works, which is quite amazing!

Dynamic calls

Another feature that is needed when trying to execute the syntax (as explained in It's alive!) is dynamic function calls. With this, I mean that the compiler need to be able to execute an arbitrary function with an arbitrary number of parameters. The number of parameters is not known at compile-time, so we need some kind of mechanism to handle this. Java has built-in support for this through its reflection API:

// Initialized from the parsed syntax tree:
Class[] paramTypes = new Class[]{String.class, Integer.class};
Object[] paramValues = new Object[]{"Hello", 4};
MyClass obj = ...;

// What is done to call the function:
Class<MyClass> c = MyClass.class;
Method m = c.getMethod("foo", paramTypes);
m.invoke(obj, paramValues);

However, this is not possible in C++ without some creative use of inline assembly. It is worth to note that function calls for C++ is more complex than in Java, since C++ allows arbitrary types to be passed by value instead of just allowing by reference except for primitive types. Therefore in C++ we can not just pass an array of void * in the general case, since then we will only be able to call a subset of the implemented functions. Instead I chose the following api:

int fn(std::string a, int b) {
  return b;
}

std::string p1 = "Hello";
int p2 = 4;

FnCall call;
call.param(p1).param(p2).call<int>(&fn);

The major trick here is that the param member function is templated, so that we can extract some type information when doing the call. What we need to be able to manipulate a value of a type is the following:

  • The size of the type.
  • A way of copying instances of the type to new memory.
  • A way of destroying instances of the type.

All this can be obtained by some clever use of templates and function pointers. The param function is implemented like this:

struct Param {
  void (*copy)(const void *, void *);
  void (*destroy)(void *);
  const void *value;
  nat size;
};

template <class T>
FnCall &param(const T &p) {
  Param par = { &FnCall::copy<T>, &FnCall::destroy<T>, &p, sizeof(T) };
  params.push_back(par);
  return *this;
}

What the function does is to save the needed type information into the Param struct, along with a reference to the actual value we shall pass to the function.

As I have chosen to keep a reference to the parameter passed to the param function, which means that you can not do things like this:

FnCall call;
call.param(std::string("foo")). ...;

since the temporary std::string will have been destroyed by the time we will actually call the function. We could copy the object as well, but that would make the implementation slightly more complex and using an extra copy. I will probably implement this later when a inter-thread call is needed.

The functions copy and destroy are declared like this:

template <class T>
static void copy(const void *from, void *to) {
  const T* f = (const T*)from;
  new (to) T(*f);
}

template <class T>
static void destroy(const void *obj) {
  T *o = (T *)obj;
  o->~T();
}

They are simply a wrapper around the copy-constructor of a type and the destructor. As far as I am aware, it is not possible to take the address of the constructor. Therefore we need this wrapper that does a placement new operation on some memory.

Now, there is only one tricky thing left, the return value. How is the return value passed (assuming the cdecl calling convention)? The answer is sadly quite complicated:

  • if we return a number smaller than an int or a pointer or reference, it is returned in eax.
  • if we return a 64-bit number, it is returned in eax and edx.
  • if we return a floating-point number, it is returned on the floating point stack.
  • user-defined types (even those small enough) are returned by letting the calling function allocate space for the value on the stack and pass the address of this memory as the first parameter to the function.

Sadly, we need to take care of all these cases, but the API to the outside can be quite clear since we can use templates and overloading to figure out what we shall do. For primitive types, we can simply do something similar to this:

template <class T>
T call(void *fn);

template <>
int call(void *fn);

template <>
int64 call(void *fn);

// and so on.

The trick is that we need to know if T is a pointer or a reference as well. This is not (as far as I know) doable using simple template overloads, ie this is not valid, it will generate a compilation error:

template <class T>
T call(void *fn);

template <class Z>
Z *call(void *fn);

Instead we can do this by a helper template struct. The partial specialization for structs and classes is more powerful than for functions in C++ (don't ask me why...). Therefore, we can do like this:

template <class T>
struct TypeInfo {
  static inline bool reference() { return false; }
  static inline bool pointer() { return false; }
};

template <class T>
struct TypeInfo<T *> {
  static inline bool reference() { return false; }
  static inline bool pointer() { return true; }
};

template <class T>
struct TypeInfo<T &> {
  static inline bool reference() { return true; }
  static inline bool pointer() { return false; }
};

template <class T>
T call(void *fn) {
  if (TypeInfo<T>::pointer() || TypeInfo<T>::reference()) {
    // pointer is returned in eax
  } else {
    // we need to allocate memory and add another parameter
  }
}

We get a struct TypeInfo with two member functions in this case. The functions will tell us if T is a pointer, a reference or none of it. This will be a constant expression at compile time, so the compiler will most likely optimize our if-statement away. In C++11 we would use constexpr as well.

When we have came this far, we can simply implement the different cases according to this pattern:

void FnCall::call(int *result, void *fn) {
  nat sz = paramsSize();
  void *stackTop;
  __asm {
    sub esp, sz;
    mov stackTop, esp;
  }

  copyParams(stackTop);

  __asm {
    call fn;
    mov ebx, result;
    mov [ebx], eax;
    add esp, sz;
  }
}

What the code does is that it first allocates some memory on the stack, much like alloca does. We also remember a pointer to the old stack pointer. Since stacks grows towards lower addresses on an x86, the old stack pointer is the first address we can write parameters to. After that we copy parameters to the stack, call the function and updates the integer result result is pointing to to store the return value.

Copy parameters is not too difficult now that we have our type information. We can simply run our copy function to the stack memory we allocated. We just need to make sure to align each object to 4 bytes. It can be done like this:

void FnCall::copyParams(void *to) {
  byte *at = (byte *)to;
  for (nat i = 0; i < params.size(); i++) {
    Param &p = params[i];
    nat s = roundUp(p.size, sizeof(void *));
    (*p.copy)(p.value, at);
    at += s;
  }
}

With all that in place, we can finally do things like this in C++:

int foo(int a, int b) {
  return a + b;
}

int main() {
  int p1 = 1, p2 = 2;
  FnCall call;
  int r = call.param(p1).param(p2).call<int>(&foo);
  return r;
}

However, we have not use the destroy function, see Exploring function calls.

Comments

New comment

You can use GitHub flavored markdown here. Parsed by Parsedown, which does not support all of GitHub's features. For example, specifying the language of code listings is not supported.

Name: