AST Representation
Starlasu operates on Abstract Syntax Trees (ASTs) (or Code Models). These are tree-like data structures to represent the information contained in a piece of formal language or “code”. For example, the statements in a procedural program, the data elements in a SQL query, or the steps in a business workflow.
All the Starlasu ASTs are based on a few primitive elements.
The structure is similar to the one used by other Modeling solutions such as EMF, MPS, or LionWeb. In particular we aim to (mostly) converge to the same terminology used in LionWeb (see the LioWeb’s specifications).
This is a representation of the whole structure.
Note that this is the terminology used since Kolasu 1.6. In previous versions of Kolasu and other libraries the terminology could be slightly different.
We call this structure the Meta-Metamodel of Starlasu or the Starlasu’s M3 (see https://en.wikipedia.org/wiki/Meta-Object_Facility for a definition of M0, M1, M2, and M3).
In our case:
The M0 or the observed element is the code itself
The M1 or the model of the code, is an instance of an AST. For example, an actual instance of ClassDeclaration with name foo, no fields and no methods
The M2 or the metamodel is the set of AST classes defined for a certain language, for example the definition of the class ClassDeclaration, describing the fact that a ClassDeclaration has a name, can extend zero or one other ClassDeclarations and can implement zero to many InterfaceDeclarations
The M3 or the meta-metamodel is the structure that we use to define AST. See the picture above
Language
At the top level there is a Language. See for example, KolasuLanguage
A Language has a qualified name. It has also a simple name, which is simply the portion of the qualified name following the last dot (or the entire qualified name, if it has no dots)
A Language groups various Types.
Types can be either Concepts, ConceptInterfaces, PrimitiveTypes, or EnumTypes
Concept, ConceptInterface, Annotation
A Concept can extend zero or one Concepts. It can implement zero to many ConceptInterfaces
Concepts and ConceptInterfaces have a name (because they are Types)
Concepts and ConceptInterfaces have features
ConceptInterfaces are similar to interfaces in programming languages
An Annotation can extend another Annotation. It can implement zero many ConceptInterfaces
Features
All features have a name and could be either Properties or Links
Multiplicity is one of: optional (zero or one instances), singular (exactly one instance), or many (zero to many instances). There is not alternative for one or more instances
Properties have a type which is a DataType. Properties can have multiplicity singular or optional, but not many
A Link is either a Reference or a Containment
A Link has as type a Classifier (either a Concept or a ConceptInterface)
A Reference can have multiplicity singular or optional, but not many
Primitive Types
A Primitive Type is anything that is not a Node is considered a primitive type. Typically these are Strings, Chars, Ints. This does not include Enums, which are treated separately. Arbitrary other types can be used. They may require additional configuration for example, for serialization.
For example, consider this class from the RPG Language Module:
// Class _not_ extending Node sealed class EditCode { ... } data class OutputSpecificationFieldDescription( val fieldIndicators: List<Indicator?> = emptyList(), val fieldName: FieldNameType, // EditCode is used in an AST and it is not a Node neither a // ReferenceValue so it is a PrimitiveType. // editCode is a Property, as the value is a PrimitiveType val editCode: EditCode? = null, val blankAfter: Boolean = false, val endPosition: EndPosition? = null, val dataFormat: DataFormat = DataFormat.Blank, val various: Expression? = null, val comments: String? = null ) : OutputSpecification(), WithInlineDeclaration
Enum Types
These are just types representing a finite set of possibilities.
Obtaining Concepts from classes
Concepts are automatically derived from inspecting classes.
In the case of Kolasu, we inspect all the properties (in the “Kotlin sense” of property) of a Class, ignoring the one marked as @Internal
.
For each property:
If the type is
ReferenceValue
(previously calledReferenceByName
), then it indicates a ReferenceIf the type is a Node or a Node subclass, then it indicates a Containment
If the type a MutableList of Node or a Node subclass, then it indicates a Containment
If the type is anything else, then we have a Property
We want to keep this structure as minimal as possible, because in this way it is much easier to support more features such as serialization, transformers, interoperability with LionWeb, etc.
For this reason we do not support sets or maps or other collections.
For example:
Guidelines on defining ASTs using Kolasu
Now that we understand the Starlasu M3 and we know how it is derived from the Kotlin classes we create, we can see how we should write our Kotlin classes to get the M2 (or metamodel or conceptual language) that we want.
We typically specify attributes and single or optional containments with var
.
We typically specify references and multiple containments with val
. In that case we can change the content of the reference or the list, without the need to reassign them.
Annotations
The internal annotation can be used to indicate fields that should not appear in the AST
The derived annotation can be used to indicate features that are obtained by processing other features, and that can be recalculated from them.
Prefix
We typically want to use a prefix for all the classes we use to define our language M2. This is useful because in transpiler and other applications we may want to combine multiple languages. For example, in a transpiler from RPG to Java we may want to include both the RPG AST classes and the Java AST classes. Both languages could share constructs with the same name, and without a prefix it would be cumbersome to distinguish them.
Common Concepts
Starlasu includes a few Common ConceptInterfaces. They can be used as markers, so that different tooling can look for Concepts implementing those, and treated them in a special way.
They are:
Statement: it represents a statement in the language (i.e., something that can be executed)
Expression: it represents an expression in the language (i.e., something that can be evaluated and produce a value
EntityDeclaration: it should be used for the definition of classes, interfaces, and structures. These declarations introduces new types. This interface should not be used for parameter declarations or variable declarations.
PlaceholderElement: this indicates special Concepts used to represent variability when defining patterns. If you are not building support for patterns in your language, you can ignore them (see Placeholders)
How to use Named and PossiblyNamed
Named is to be used for things that have always a name, PossiblyNamed is to be used for things that may or may not have name.
For example, in some languages one can define functions and anonymous functions. If we want to represent both with the same Concept, then we can make that Concept implement PossiblyNamed.
For a local variable which is guaranteed to have a name (supposing our language do not support anonymous variables) then we would use Named.
When to use ReferenceValue
A ReferenceByName indicates that a name can indicate something else, where something else is typically another node in the same AST or another AST or a reference to some external symbol (like a builtin-function).
Examples of usages of ReferenceByName:
When to use enums
Enums should be used when we have a set of alternatives. For example:
Modifiers
When representing modifiers we have two alternatives:
Represent the list of modifiers itself. We tend to do that to preserve the information on the order of the modifiers. This is a lower level representation, more suited for the parse tree than the AST. We should avoid it.
To represent the modifiers through flags and enums in the declaration to which the modifiers are applied. This is closer to the mental model a developer would have of the code, and this is what we should do.
Statements
We typically define a top level rule to represent all statements of our language:
All other statements should end with the Stmt
postfix.
Expression Statements
In most languages methods or other blocks contain list of statements. They cannot contain directly expressions. However expressions can be wrapped in ExpressionStatements:
For example, in Java we may have:
Expressions
We typically define a top level rule to represent all expressions of our language:
All other expressions should end with the Expr
postfix.
Binary expressions
Arithmetic expressions
Comparison expressions
Logical expressions
Field accesses
Field accesses are typically represented like this:
Note that in case of chain of field accesses we would have a recurring structure.
We should not aim to represent chains of field accesses as a single expression:
Note that we discuss only qualified field accesses. So field accesses in the form:
We do not consider unqualified field accesses. The reason is that unqualified field accesses are undistinguishable from references to other expressions during parsing. We can distinguish them only during symbol resolution. For this reason unqualified field accesses are represented using reference expressions.
Function and Method calls
Function and method calls are typically expressions, as they can produce a value:
These represents method calls where the receiver (the thing on which the method is invoked) is not specified.
If both qualified and unqualified method calls are possible, a single abstract parent class should be used. It should contain the method
and args
parameters specified here.
Qualified method calls
These represents method calls where the receiver (the thing on which the method is invoked) is specified.
This is how they can be represented:
We discourage having a single AST class to represent both unqualified and qualified method calls, by making the container optional.
Unary expressions
Method definitions
Partitions
Partitions are special constructs needed for LionWeb interoperability to represent collections of ASTs.
They are not represented as Kolasu nodes (the annotation LionWebPartition has been removed)
Placeholders
Placeholders are used to create patterns:
Why do we need pattern? To match ASTs or produce following a certain structure. Consider this example:
This piece of code:
Match in an RPG AST every for statement defining a variable called I, irrespectively of what the starting and ending expressions are and irrespectively of the statements it contains
It then produce a piece of Java code, filling in the translation of the corresponding elements captured in the RPG code
For example, if we parse:
The output Java code would be:
In order to represent “AST with holes” we use the placeholder parsing rules and the placeholder AST elements.
A demo of this feature is presented in this video: Let's talk about transpilers.