Type lookup
The Storm compiler uses packages to organize types, much like Java and Python. This means that the Storm compiler has a hierarchical package representation in which the actual types (eg classes) are located. Currently, types may only be placed inside a package, which means that you can not place a type within a type as you can do in C++ and Java:
class Foo { class Bar { }; };
This is something I find usable in C++, but I hope that it will be possible to make use of packages to solve this in another way. It is still not rule out completely, but it makes the compiler simpler for now.
The benefits of packages (and in C++, namespaces) is, aside from better organization
of code, that you do not have to specify the entire name if you are in the same scope.
For example, when you are writing code in the lang.test
package, you do not have
to write lang.test.Example
to access the Example
type within. Instead you can
simply write Example
, and the compiler will figure out what you mean since it
knows you are working in lang.test
. How do we implement this? Looks simple, right?
The reason I am discussing the type-lookup issue now is that I have recently realized that my initial approach had some flaws and was far too difficult to debug and have rewritten it.
Initial attempt
My initial attempt was based around the idea that you have something called a Scope
that represents a position in the type tree. You will then ask the Scope
for
a name that you have found in the source code and it will look it up for you.
A Name
is a dot-separated string without whitespace, which the Scope
uses
to look up a Named
object in the compiler.
Since we want to support overloading functions, two functions may have the
same name, but a different type signature. Therefore functions can not be
simple Named
objects, the Named
object is instead a collection of
NameOverload
objects that we can use.
For example, the type tree may look something like this:
Package (lang) \-Package (test) \-Type (Example) |-NameOverload (bar) | |- Function (bar(Int)) | \- Function (bar(String)) \-NameOverload (toString) \- Function(toString())
The idea here was to implement the Scope
interface on relevant classes
in the hierarchy (in this case Package, and Type is enough). The Scope
class would then contain a pointer to the previous scope to search, as
well as an implementation of find
, that tries to find the Name
in
the current scope first by calling findHere
, and then follow the
parent
pointer, like this:
Named *Scope::find(Name name) { Named *found = findHere(name); if (parent) return parent->find(name); else return null; }
If you do it like this, then you get some bad behaviour of the type lookup.
For example, if you write foo.Bar
in lang.test
, it could mean either
lang.test.foo.Bar
, lang.foo.Bar
or foo.Bar
. However, the lang.foo.Bar
option is not what you would expect. You expect either something relative
to your current position or to the root position, not something in between.
This can of course be solved in this attempt by setting up the parent
pointers
correctly. However, this will quickly be hard to debug because of all the pointers.
It does not get easier if you want to consider cases where you want to implement
imports, ie you want to add packages to the possible candidates. For example, I
want the compiler to automatically include types from core
.
These two issues made me realize that my initial attempt was not good enough and needed to be rewritten.
Second attempt
As a second attempt, i made the Scope
to a simple class containing a table
of different paths to search in the desired order. This made me introduce
a NameLookup
interface, which provides a find
function (which replaces
findHere
). The Scope
would then simply ask all lookup objects in turn for
the name to be found and choose the first one.
Then, instead of letting Package
and Type
inherit from Scope
, they will
simply contain a Scope
instance correctly set up for them.
This made it much easier to model more complex package lookups, but I still felt
it was too difficult to debug, since usually objects would take the parent's Scope
and remove some lookups to later add some other lookups. I realized that I had
not gained too much in ease of debugging. I also realized that much of the complexity
lies in that I am trying to define parts of the lookup while I am building the package
and type tree, not when I am actually trying to look up a type. This could potentially
be something bad when someone tries to make a language which relies on a different set
of scope rules. Then I have more or less hard-coded my rules into the structure
of the compiler, making it hard to implement something different on top.
This is what led me to reconsider once more and implement what I am now happy with.
Final attempt
The idea behind the last attempt is to store as little as possible in the Scope
. What
is really needed to be able to perform a type lookup? It is actually sufficient to have
a pointer into our "current position" in the type tree along with a lookup function, which finds
the type from that position. Building upon some concepts of the previous idea,
our current position is a pointer to a NameLookup
. To restrict the current
position to a NameLookup
makes the assumption that our type lookup will not depend
on something that can not lookup types for us, which is only functions at the moment.
Since this assumption makes the implementation much easier, it is something I am willing
to make. The lookup function is then simply a virtual function in the Scope
class.
This allows implementations to choose their lookup to suit them, if needed.
However, the lookup function needs some way of traversing the type tree. This is solved
by providing a parent
function in the NameLookup
interface. Note that this parent
is always the direct parent of the object, which is much easier to get right than someting
that is usually the direct parent, but not always.
What is left now is to write the logic for looking up types, I have chosen the current way right now:
Named *Scope::find(Name name) { for (NameLookup *at = top; at; at = nextCandidate(at)) { if(Named *found = at->find(name)) return found; } // Is 'name' present in 'core'? if (Package *core = corePkg(top)) if (Named *found = core->find(name)) return found; return null; } NameLookup *nextCandidate(NameLookup *prev) { if (Package *pkg = as<Package>(prev)) { // Package: jump directly to root Package *root = rootPkg(pkg); // If we're already at the root, we're done. if (pkg == root) return null; else return root; } else { // Default: look at the nearest parent. return prev->parent(); } } Package *rootPkg(NameLookup *l) { while (p->parent()) p = p->parent(); return p; } Package *corePkg(NameLookup *l) { while (!as<Package>(l)) l = l->parent(); Package *root = rootPkg(as<Package>(l)); return as<Package>(root->find(Name(L"core"))); } // as<X>(x) is dynamic_cast<X*>(x)
And this is what I think is the beauty of this solution. Almost the entire type lookup is defined in around 40 lines of code. Aside from also being easier to debug, it also has the added bonus of being extensible. In the end, I am quite happy with the result.
Comments
New comment
You can use GitHub flavored markdown here. Parsed by Parsedown, which does not support all of GitHub's features. For example, specifying the language of code listings is not supported.