Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

1. Introduction

TweetyLang is a object-oriented systems programming language that combines the control, performance, and portability of low-level languages with the developer experience of high-level languages.

TweetyLang development is driven by it's Core Concepts & Goals.

The Specification

This specification outlines the implementation of the language as of version 0 (v0). Unless otherwise stated, all references expressed in the present tense pertain specifically to this version. In this context, the term 'up to date' should be interpreted as current of the official v0 release.

This release landmark is determined by the TweetyLang committee, which is also responsible for authoring this version of the specification and for its ongoing updates.

Suggesting Ideas & Changes

TweetyLang is built for developers and feedback is highly valuable. Whether you're reading a printed or online version of this specification, readers are encouraged to visit the TweetyLang Spec repository to open issues, suggest language improvements, propose changes, or discuss ideas.

TweetyLang is pronounced as 'TWEET-ee-lang' (/ˈtwiːti læŋ/).

1.1 Core Concepts and Goals

TweetyLang is built around several fundamental principles and objectives:

  1. Developer Experience

TweetyLang heavily emphasises a strong developer experience. The language and its relative implementation/toolchain should be intuitive, as frictionless as possible, and predictable.

  1. Portability

TweetyLang is designed to run across platforms and environments. Its architecture should enable code that can be deployed without relying on a runtime environment, external dependencies, or Just-In-Time compilation.

  1. Memory Safety

The language provides language-integrated options for writing memory safe code whilst still exposing 'legacy' memory models (i.e., raw pointers) to interop with non-memory safe code.

  1. Language Interoperability

The language should provide capability to integrate with assemblies produced by other languages, offering first class support for native assemblies.

1.2. Glossary

This specification uses the following term definitions throughout it's contents.

  • Assembly - Output of program compilation.
  • Entry Point - Function where program execution begins.
  • Application/Executable - Executable assembly with an entry point.
  • Source Code - Human-readable program written in TweetyLang.
  • Compiler - Application that translates source code into an assembly.

2. Grammar

This is the complete, up-to-date TweetyLang grammar in EBNF format. Upon merging updates into this grammar, the changes are propagated to the ANTLR version, which is used directly by TweetyLang.Parser. As such, this document serves as the single source of truth for the TweetyLang grammar.

If you notice differences between this document and the ANTLR grammar, please open an Issue or Pull Request to correct them. For issues in the EBNF grammar, open an issue in the specification repository. For issues in the ANTLR grammar, open an issue in the TweetyLang repository.

(* EBNF (Extended Backus–Naur Form) Grammar definitions for TweetyLang *)

(* GRAMMAR START *)
(* ---------------- *)

program = { top_level_declaration } ;

top_level_declaration = module_definition 
                      | import_statement ;

(* Modules *)
(* ---------------- *)
module_definition = "module", module_name, module_block ;
module_name = identifier , { "::", identifier } ;
module_block      = "{" , { definition } , "}" ; (* Allowed items inside of a module *)

import_statement = "import", module_name, ";" ;

(* Identifiers *)
(* ---------------- *)
identifier = character , { character | digit | "_" } ;

(* Structs *)
(* ---------------- *)
struct_definition = { modifier } , "struct" , identifier , object_block ;

object_block = "{" , { function_definition | field_declaration } , "}" ; (* Allowed items inside of an object definition (struct, class, etc) *)

(* Functions *)
(* ---------------- *)
function_definition = { modifier } , ( type | "void" ) , identifier , "(" , [ parameters ] , ")" , ( statement_block | ";" ) ;

function_call = identifier , "(", [ arguments ] , ")" ;
arguments = expression , { "," , expression } ;

definition = struct_definition
           | function_definition ;

(* Fields *)
(* ---------------- *)
field_declaration = type , identifier , [ "=" , expression ] , ";" ;

(* Statements *)
(* ---------------- *)
statement_block = "{" , { statement | compound_statement } , "}" ;

statement = raw_statement , ";" ;
raw_statement = return_statement
          | assignment
          | declaration
          | expression_statement ;

(* A Compound Statement is a statement that includes other statements, i.e., an if statement *)
compound_statement = if_statement ;

assignment = identifier , "=" , expression ;
declaration = type , whitespace , identifier , "=" , expression ;
return_statement = "return" , [ expression ] ;
expression_statement = expression ; (* Allow function calls, object instantiation, etc *)

(* If Statement *)
(* ---------------- *)
if_statement = "if" , "(" , expression , ")" , statement_block , [ else_block ] ;
else_block = "else" , statement_block ;

(* Expressions *)
(* ---------------- *)
expression = term , { ("+" | "-") , term } ;
term = factor , { ("*" | "/") , factor } ;

factor = member_access | primary ;
primary = number
        | boolean_literal
        | char_literal
        | string_literal
        | object_instantiation
        | function_call
        | identifier
        | "(" , expression , ")" ;

member_access = primary , { ".", identifier, [ "(", [ arguments ], ")" ] } ;
boolean_literal = "true" | "false" ;
object_instantiation = "new" , identifier , "(", [ arguments ] , ")" ;

(* Types *)
(* ---------------- *)

(* Parameters are formatted as: type id, type id, type id etc *)
parameters = parameter , { "," , parameter } ;
parameter = type , identifier ;
type = raw_type , pointer_suffix ;
pointer_suffix = { "*" } ;

raw_type = "i32"
         | "bool"
         | "char"
         | identifier ;

(* Common *)
(* ---------------- *)
modifier = "export"
         | "extern" ;

string_literal = "\"", { character | escape_sequence }, "\"" ; (* IMPORTANT: In actual implementation, string literals can have any printable character *)
char_literal = "'" , ( character | escape_sequence ), "'" ; (* IMPORTANT: In actual implementation, char literals can have any printable character *)

escape_sequence = "\\" , ( "n" | "t" | "r" | "\"" | "'" | "\\" ) ;

whitespace = " " | "\t" | "\n" | "\r" ;

character = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J" | "K" | "L"
       | "M" | "N" | "O" | "P" | "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X"
       | "Y" | "Z"
       | "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l"
       | "m" | "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x"
       | "y" | "z" ;


number = digit , { digit } ;

digit  = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;

3. Modules

Modules are top-level containers for code. Each module serves as a compilation unit, housing functions, types, structs, and other definitions. Every module is given a program-unique name which is used for referencing it. A single source file may contain any number of modules.

Modules are used primarily for code organisation. Functionally, modules act as TweetyLang's analog to namespaces; however, modules need to be explicitly imported.

3.1 Module Declaration

A Module declaration consists of the module keyword followed by the module's identifier and then it's body. Module declarations do not support applying any modifiers.

module MyModule 
{
    // Module contents
}

A new module declaration can only occur at the top level of a source file. It's identifier must be unique within the program and cannot collide with the identifier of any existing module declaration.

Module identifiers may include :: for organisation.

Exported Definitions

By default, definitions within a module are private and can only be accessed inside the module where they are declared. To make a definition available to other modules when the module is imported, it must be marked with the export keyword:

module MyModule
{
    // This function can only be calling from within this module
    i32 PrivateNumFunc()
    {
        return 16;
    }

    // This function can be called from another module
    export i32 NumFunc()
    {
        return PrivateNumFunc();
    }
}

3.2 Import Statements

Import statements allow the use of module's contents within another module. An import statement consists of the import keyword followed by the identifier of the module being imported and a semicolon.

import MyModule;

4. Functions

Functions are a block of reusable code that accept input parameters (arguments) and produce an output value (return value). Functions can also include modifier keywords that alter or extend their behaviour.

Invoking Functions

Functions can be static or instance-based. Static functions can be invoked directly without reference to an object:

Add(1, 3);

Instance-based functions require an instance of the owning type to be invoked:

Maths maths = new Maths();
maths.Add(1, 3);

4.1 Function Declaration

A Function declaration consists of optional modifier keywords, a specified return type, a unique function name, a parameter list enclosed in parantheses, and a function body defining it's behaviour.

i32 Add(i32 a, i32 b) 
{
    return a + b;
}

By default, functions defined at the module level are static, whereas functions defined within a type are instance-based. Type-enclosed functions can be explicitly declared static by applying the static modifier.

5. Structs

Structs are user-defined types for grouping related variables into a single entity. They allow organization of data logically under one name. Structs can also define instance methods, enabling encapsulation of both data and behavior within the same construct.

5.1 Struct Declaration

A Struct declaration consists of optional modifier keywords, followed by the struct keyword, it's identifier and then its body.

export struct MyStruct 
{
    // Struct Body
}

A Struct body can consist of functions, fields and properties.

6. Fields

Fields are named variables declared inside a type body (such as a struct or class) that store data specific to instances of that type.

Fields declarations consist of optional modifier keywords, it's type, a unique name and optionally an assignment.

public i32 MyNumber = 16;

7. Properties

Properties are encapsulated fields that allow functions to be ran when their value is read (getter) or modified (setter).

Property declarations consist of optional modifier keywords, it's type, a unique name, getters and setter definitions, and optionally an assignment.

Both of these examples are valid properties:

public i32 MyNumber { get; set; } = 16;
public i32 MyNumber 
{
    // Type.MyNumber will return MyNumber's current value + 10.
    get 
    {  
        return value + 10;
    }

    // Type.MyNumber = 10 will print some text and then update MyNumber's current value.
    set 
    {
        printf("Set MyNumber.");
        field = value;
    }
}

8. Type Access Operators

Different operators are used to access members of a type depending on whether the member is static, instance-based, or accessed through a pointer. These are referred to as "Type Access Operators".

Static Access

The :: operator is used on a type to access its static members.

Type::StaticMethod();

Instance Access

The . operator is used on an instance of a type to access its instance members.

Type myInstance = new Type();
myInstace.InstanceMethod();

Pointer Access

The -> operator is used on a pointer to an instance of a type to access its instance members.

myInstancePointer->InstanceMethod();