<< PreviousNext >>

Type lookup

The Storm compiler uses packages to organize types, much like Java and Python. This means that the Storm compiler has a hierarchical package representation in which the actual types (eg classes) are located. Currently, types may only be placed inside a package, which means that you can not place a type within a type as you can do in C++ and Java:

class Foo {
  class Bar {
  };
};

This is something I find usable in C++, but I hope that it will be possible to make use of packages to solve this in another way. It is still not rule out completely, but it makes the compiler simpler for now.

The benefits of packages (and in C++, namespaces) is, aside from better organization of code, that you do not have to specify the entire name if you are in the same scope. For example, when you are writing code in the lang.test package, you do not have to write lang.test.Example to access the Example type within. Instead you can simply write Example, and the compiler will figure out what you mean since it knows you are working in lang.test. How do we implement this? Looks simple, right?

The reason I am discussing the type-lookup issue now is that I have recently realized that my initial approach had some flaws and was far too difficult to debug and have rewritten it.

Initial attempt

My initial attempt was based around the idea that you have something called a Scope that represents a position in the type tree. You will then ask the Scope for a name that you have found in the source code and it will look it up for you. A Name is a dot-separated string without whitespace, which the Scope uses to look up a Named object in the compiler.

Since we want to support overloading functions, two functions may have the same name, but a different type signature. Therefore functions can not be simple Named objects, the Named object is instead a collection of NameOverload objects that we can use.

For example, the type tree may look something like this:

Package (lang)
\-Package (test)
  \-Type (Example)
    |-NameOverload (bar)
    | |- Function (bar(Int))
    | \- Function (bar(String))
    \-NameOverload (toString)
      \- Function(toString())

The idea here was to implement the Scope interface on relevant classes in the hierarchy (in this case Package, and Type is enough). The Scope class would then contain a pointer to the previous scope to search, as well as an implementation of find, that tries to find the Name in the current scope first by calling findHere, and then follow the parent pointer, like this:

Named *Scope::find(Name name) {
  Named *found = findHere(name);
  if (parent)
    return parent->find(name);
  else
   return null;
}

If you do it like this, then you get some bad behaviour of the type lookup. For example, if you write foo.Bar in lang.test, it could mean either lang.test.foo.Bar, lang.foo.Bar or foo.Bar. However, the lang.foo.Bar option is not what you would expect. You expect either something relative to your current position or to the root position, not something in between.

This can of course be solved in this attempt by setting up the parent pointers correctly. However, this will quickly be hard to debug because of all the pointers. It does not get easier if you want to consider cases where you want to implement imports, ie you want to add packages to the possible candidates. For example, I want the compiler to automatically include types from core.

These two issues made me realize that my initial attempt was not good enough and needed to be rewritten.

Second attempt

As a second attempt, i made the Scope to a simple class containing a table of different paths to search in the desired order. This made me introduce a NameLookup interface, which provides a find function (which replaces findHere). The Scope would then simply ask all lookup objects in turn for the name to be found and choose the first one.

Then, instead of letting Package and Type inherit from Scope, they will simply contain a Scope instance correctly set up for them.

This made it much easier to model more complex package lookups, but I still felt it was too difficult to debug, since usually objects would take the parent's Scope and remove some lookups to later add some other lookups. I realized that I had not gained too much in ease of debugging. I also realized that much of the complexity lies in that I am trying to define parts of the lookup while I am building the package and type tree, not when I am actually trying to look up a type. This could potentially be something bad when someone tries to make a language which relies on a different set of scope rules. Then I have more or less hard-coded my rules into the structure of the compiler, making it hard to implement something different on top.

This is what led me to reconsider once more and implement what I am now happy with.

Final attempt

The idea behind the last attempt is to store as little as possible in the Scope. What is really needed to be able to perform a type lookup? It is actually sufficient to have a pointer into our "current position" in the type tree along with a lookup function, which finds the type from that position. Building upon some concepts of the previous idea, our current position is a pointer to a NameLookup. To restrict the current position to a NameLookup makes the assumption that our type lookup will not depend on something that can not lookup types for us, which is only functions at the moment. Since this assumption makes the implementation much easier, it is something I am willing to make. The lookup function is then simply a virtual function in the Scope class. This allows implementations to choose their lookup to suit them, if needed.

However, the lookup function needs some way of traversing the type tree. This is solved by providing a parent function in the NameLookup interface. Note that this parent is always the direct parent of the object, which is much easier to get right than someting that is usually the direct parent, but not always.

What is left now is to write the logic for looking up types, I have chosen the current way right now:

Named *Scope::find(Name name) {
  for (NameLookup *at = top; at; at = nextCandidate(at)) {
    if(Named *found = at->find(name))
      return found;
  }

  // Is 'name' present in 'core'?
  if (Package *core = corePkg(top))
    if (Named *found = core->find(name))
      return found;

  return null;
}

NameLookup *nextCandidate(NameLookup *prev) {
  if (Package *pkg = as<Package>(prev)) {
    // Package: jump directly to root
    Package *root = rootPkg(pkg);

    // If we're already at the root, we're done.
    if (pkg == root)
      return null;
    else
      return root;
  } else {
    // Default: look at the nearest parent.
    return prev->parent();
  }
}

Package *rootPkg(NameLookup *l) {
  while (p->parent())
    p = p->parent();
  return p;
}

Package *corePkg(NameLookup *l) {
  while (!as<Package>(l))
    l = l->parent();
  Package *root = rootPkg(as<Package>(l));
  return as<Package>(root->find(Name(L"core")));    
}

// as<X>(x) is dynamic_cast<X*>(x)

And this is what I think is the beauty of this solution. Almost the entire type lookup is defined in around 40 lines of code. Aside from also being easier to debug, it also has the added bonus of being extensible. In the end, I am quite happy with the result.

Comments

New comment

You can use GitHub flavored markdown here. Parsed by Parsedown, which does not support all of GitHub's features. For example, specifying the language of code listings is not supported.

Name: