Bug Vanquisher

6 June 2007

Reflecting on Reflection

Filed under: C++ — Tanveer Badar @ 5:50 PM

Don’t tell me you never did things like obj.GetType( ) or typeof( obj ). Everyone does, admit it. But have you ever thought what goes behind the scenes of all this raw power? If you had to implement such a system yourself what design decisions would you make?

Now that you have admitted about typeof( obj ), tell me if you ever wondered who moron wrote this enumeration?

[Flags] 
System.Reflection.BindingFlags 
{ 
    CreateInstance, 
    DeclaredOnly, 
    Default, 
    ExactBinding, 
    FlattenHierarchy, 
    GetField, 
    GetProperty, 
    IgnoreCase, 
    IgnoreReturn, 
    Instance, 
    InvokeMethod, 
    NonPublic, 
    OptionalParamBinding, 
    Public, 
    PutDispProperty, 
    PutRefDispProperty, 
    SetField, 
    SetProperty, 
    Static, 
    SuppressChangeType 
}

Why are all these access control flags mixed up with things like overload resolution and instance/static methods? Where does all the meta data about a type go? (Well in the meta data dictionary! Where else would it go?) Does MethodInfo.Invoke perform overload resolution? How are arguments coerced if the types don’t match exactly? Why do we seem to have a separate class for almost every lexical scoping construct? Why can you do this if you have proper access

class a 
{ 
    int member; 
} 

typeof( a ).GetField( "member" , BindingFlags.NonPublic | BindingFlags.GetField ).GetValue( );

but not this

class a 
{ 
    int member; 
} 

a obj = new a( ); 

Console.WriteLine( a.member );

Enough questions, lets discover the reasoning behind them in CLR. First the why be a moron? That moron made a really good decision, all these things would have required a separate slot in the class, just clump them together in a bit field to save considerable space. The meta data goes in a meta data dictionary in your assembly. Reflection APIs read it from there. Type size is essentially reduced.

And did you know that in IL you refer to an entity by its ordinal in some table. Want to call a method? Emit a call instruction on the object/class with the argument equal to its slot. Instantiating some class? Emit a call to newobj with the ordinal of that class in the type table.

Method.Invoke is a big machine. It has to search all the methods which match the name depending on being case sensitive or not. Then, it must find the best method from that set using overload resolution rules which the compiler uses at compile time. Quite a bit of work for one function. And overload resolution involves type coercion and parameter matching from what was given to what is required.

All these classes are provided to match the language features. Same kind of effort goes into three places. Compiler for the language, a runtime code generation system which has a similar class hierarch and an even complex object hierarchy and the type discovery system which must match compiler’s implementation to support every lexical construct in the language.

Access to private members is allowed if you have proper permissions. Consider it from the compiler’s point of view. If it sees the second case, name lookup check succeeds but accessibility check fails. Now, consider the reflection case. “member” is just a string argument to some function for the compiler. The meta data is already available for anyone to use if they care to. Therefore, if you can get your hands at the meta data and have appropriate permissions, you can access private implementation specific parts too.

I encountered all design problems because I am writing a reflection framework for C++. The language natively supports one joke and a work around. The joke is called typeid( ) operator and the work around is dynamic_cast< >( ) operator.

Considering the modern needs of runtime discovery of types, plug-in architectures, design patterns like IoC we need a strong type system and an equally strong runtime support system for type discovery and dynamically invoking members of these types. I call typeid a joke because it return type_info and there is no requirement that this type_info contains valid (don’t even think about things as high as useful) information for the object it was invoked. name( ) function may return an empty string, if a non-empty string was returned it may not necessarily correspond to the compile time name of type. You can do nothing else with a type_info apart from the name( ) and before( ) functions. before( ) does not order types lexicographically, the details are hidden from mere mortals (read programmers).

dynamic_cast is a trial and error game. You have a pointer to some base class and it is your burden to find out which exact derived class object it really is by repeatedly down casts. If you have a reference, conditions are much worse for you as your first cast must succeed otherwise you get a bad_cast exception. If you have multiple virtual base classes, dynamic_cast is the only hope, static_cast is forbidden.

Boost goes a little further than that primitive state of affairs. They provide a typeof operator which allows you to infer type at compile time. gcc also has a typeof operator which works similar to boost’s version, i.e., compile time inference of a type from some expression, nothing better than that. And don’t get me started about VC++. They are slow enough to get their partial specialization correct after five years and dependent name lookup is still messed up.

For my reflection framework, I have chosen to implement access control and declaration specifiers as bit fields to save space. Consider adding a bool for things like public, private, protected, pointer, reference, constant, volatile, template and extern or packing them all in one int. One bool for each results in 36 bytes of additional storage for just 9 bits of information which will easily fit in a four byte integer.

Dynamically invoking functions on a type must have overload resolution because C++ supports overloading. Arguments must be converted to correct type because compiler does that at compile time. In short, every aspect of function call resolution that happens at compile time must happen for invoking where possible. Things like argument dependent lookup is possible only for base classes, there is no way to influence namespace level lookup or introduce new identifier with using declarations. Similarly, template argument inference is a hard thing to do at compile time. The thing is Turing Complete at compile time, only one front end (sold only to big names like Microsoft) and an open source compiler implement them correctly, I am not going to burst my brain over it.

For the type hierarchy used in reflection framework, I have ReflectionObject, Type, Function, MemberFunction, Field and Parameter classes. Type class is abstract and my model for reflection has each class define a private implementation of Type and return the static object of that private type when GetType is called. Also I require a type to implement a static StaticGetType which returns the contained object without first creating an instance.

Since Type class contains complete information about a class/struct it will possible to access private implementation of a class if appropriate access is requested.

Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: