<< PreviousNext >>

Type system

What I have been working on the last week is to improve the implementation of the type system in Storm. I have also been working on exposing some of the core types of the compiler to the type system, so that they can be manipulated and extended from within Storm itself.

Outline of the current implementation

The type system is based on what is used in for example Java. This means that I have support for single inheritance (no interface support yet) where every class directly or indirectly inherits from the Object class. Aside from objects and inheritance, some degree of template or generic programming will also be supported, but this is not implemented yet.

One thing that will distinguish the type system in Storm from the one in Java, is that I plan to have support for two set of types: classes and values. Classes will work like in Java: they will have pass by reference semantics and are always allocated on the heap. Values will on the other hand always be stack allocated and therefore they use call by value semantics instead. Values are usable when you have small types, where the memory and access overhead of heap allocation and inheritance become large in comparision to the data itself.

The motivation behind this separation into values and classes are based on an insight from Göran Rydqvist. In C++, the default behaviour when calling a function is call by value. This works well when the parameters are small enough, or when you need a copy anyway. What you usually end up having to do is to remember to pass large objects by reference explicitly every time. This is tedious to remember and write everytime, and the programmer has to remember which types are large and not. Since the decision between call by reference and call by value largely depends on the type itself, why not let the type decide by categorizing them into values and classes? Values are not yet implemented.

From an implementation perspective, the type system consists of four core classes in C++:

  • Engine, which is the root object that contains the entire world of the compiler.
  • Package, which is used to represent the tree of packages in which all language constructs are located.
  • Type, which is used to hold information abot every type in Storm. It contains all the members of the type (functions and variables), and keeps track of the in-memory representation of the object. It will represent both classes and value types (for example Int, Float).
  • Object, is the root object of all classes in the system.

The Object class initially looked something like this:

class Object {
    STORM_CLASS;
public:
    // Initializes the refcount to 1.
    Object(Type *type) : type(type), refs(1) {}

    virtual ~Object();

    // The actual type of this object.
    Type *const type;

    inline void addRef() {
        if (this)
            refs++;
    }

    inline void release() {
        if (this)
            if (--refs == 0)
                delete this;
    }

private:
    nat refs;
};

The interesting part to notice here is that we need to provide a Type object everytime we need to create an object in order to be able to store the actual type of the object. We can not only rely on the type information from C++, since Storm's type system is designed to be extendable at runtime. The problematic part is that in order to get the correct Type object, we will have to go ask the runtime for the correct instance. This involves a few hashtable lookups, which are far to slow for simple object creation. Since I intend to use this type system for many types in the compiler, the most significant drawback is that it will get really ugly to write everywhere!

So we need something more clever... Let's use the fact that I have got a preprocessor to generate information about types declared in C++. The preprocessor generates information about all types marked with STORM_CLASS. We can abuse this and hide a few static function declarations in the STORM_CLASS macro. The implementations of these functions are generated by the preprocessor as well and looks something like this:

Type *storm::Object::getType(Engine &engine) { return engine.builtIn(2); }
// Convenience
Type *storm::Object::getType(Object *obj) { return obj->type->engine.builtIn(2); }

Where the builtIn function returns the built-in class with id 2. The preprocessor takes care of creating class ids and providing the runtime sufficient information to be able to populate the table backing the builtIn function.

Great! Now we just need this to create an instance of Foo:

class Foo : public Object {
public:
    Foo(Type *t) : Object(t) {}
};

{
    Foo *f = new Foo(Foo::getType(engine));
}

Which looks fairly ok to do. I am not completely happy yet. We still need to specify a constructor for all classes, even though they would not need one otherwise. What I decided to do was to move the Type parameter to operator new, like this:

class Foo : public Object {};

{
    Foo *f = new (Foo::getType(engine)) Foo();
    // or with a macro:
    Foo *f = CREATE(Foo, engine);
}

Which is much cleaner from most points. How does the operator new look? How do we manage to pass the Type to the constructor of Object through operator new? The answer is that we can utilize the fact that C++ does not write anything to the memory of simple data types (like pointers) before the programmer does so explicitly. Therefore, we can write to the myType field of the allocated memory before we return it from operator new. When the Object constructor is executed later, the value will be set correctly.

void Object::operator new(size_t size, Type *type) {
    size_t s = type->size();

    // Sanity check! If 'type' is overridden runtime, s may be larger than size.
    // if s is smaller than size, then we have forgotten to declare a C++-implemented
    // subclass of 'type', which will lead to bad things!
    assert(size <= s);

    void *mem = malloc(s);
    size_t offset = OFFSET_OF(Object, type);
    OFFSET_IN(mem, offset, Type *) = type;

    return mem;
}

The OFFSET_OF macro computes the offset of the type variable inside Object. OFFSET_IN(base, offset, type), is just simplification of the pointer magic required to read or write type at base + offset bytes.

This is how the type system looks right now. Now to the next problem:

Exposing core types

When I worked on exposing core types to the type system itself, I came across something interesting when exposing the Type class. Since every object in the type system has to inherit from Object, and every Object instance needs a Type: how do we create the initial Type instance? When creating the Type object, we need to provide a Type object representing the type, but since we can not create any Types before, we are stuck, right?

With some magic, we can actually solve this! The important observations here is that the Type object describing the layout of Type (from here referred to as the Type type) will have its type pointer set to the object itself, and that Object does not do anything with type in the constructor. The second observation is important because it allows us to pass an incomplete object as type, as long as we make sure to initialize it properly later. Since type is the object we are currently initializing, we will initialize it very soon afterwards.

How can we implement this? Using the placement new operator makes it quite simple actually:

Type *createTypeType() {
    void *mem = malloc(sizeof(Type));
    size_t offset = OFFSET_OF(Object, type);
    OFFSET_IN(mem, offset, Type *) = (Type *)mem;
    return new (mem) Type();
}

This function is similar to what we did in operator new for Object above, but instead of asking for a separate type pointer, we assume it is to be set to the memory we are currently allocating. This is a little tricky, but actually works, and as long as we make sure to create the type type first, we can create the rest of the types as we desire.

The next problem occurred while exposing the Package class. The code that loaded the built in classes also put them into their respective packages at the same time. However, this does not work, as we need to create the Package type before we can create even the root package. If I had not implemented the table of built-in types earlier, this would not have been easily solveable since we would need to have packages to find the package class. In the current implementation it is actually enough to first populate the table of built-in types and then insert the classes to their respective package. An interesting artifact of this method is that the type inheritance is not fixed until after some objects have been created since the current implementation looks up the superclass by name, not by id. At the moment this does not cause any problems since the layout of built-in types is already decided by the C++ compiler.

So now we can really manipulate the type system of Storm from within Storm!

Comments

New comment

You can use GitHub flavored markdown here. Parsed by Parsedown, which does not support all of GitHub's features. For example, specifying the language of code listings is not supported.

Name: