Diving into source code: TLisp and XML implementations 101

This is a moderated forum that collects tutorials, guides, and references for creating Transcendence extensions and scripts.
Post Reply
EditorRUS
Militia Lieutenant
Militia Lieutenant
Posts: 148
Joined: Tue Oct 30, 2012 6:30 pm

It took me forever, but I finally managed to find everything I need.

TLisp
TLisp is defined and implemented mostly here: Alchemy/CodeChain

Structure of TLisp
The most basic element of TLisp is ICCItem defined in Include/CodeChain.h @ 92 and implemented in CodeChain/ICCItem.cpp
This class represents all objects in TLisp.

API functions are implemented using structs called "primitives", defined in Include/CodeChain.h @ 41.

A primitive is defined as follows:
string Symbol;
function Function;
integer (macros are used) Macro;
string Help Text;
string Type String;
bitflags Flags;
Symbol defines what symbol is used in TLisp for this function. "<" will bind this function to <, so when you call (< 2 1) you will call this function.
Function is, well, what function is called when this symbol is called in TLisp.

Each C function inputs 3 arguments of following types: pointer to CEvalContext, pointer to ICCItem (it is always a list), integer).
CEvalContext* is used to represent the environment in which the function was called, for example to know which extension has called this function.
ICCItem* stores all arguments. pArgs->GetElement(index) is used to get arguments by position.
Last integer is actually Macro.
Many C functions implemented by George actually implement multiple TLisp functions at once. Macro argument is used exactly for that. For example, @ and set@ are implemented in the same body, that is fnItem, but fnItem uses a switch of Macro.
set@ defines FN_SET_ITEM and @ defines FN_ITEM, but they both call fnItem.
Why do that instead of implementing two separate functions? Beats me, probably to reuse code rather than copypasting it all over the place.

Macros are defined in CodeChain/Functions.h. They are simply integers.
Help Text is used by (help fun), obviously.
Type String is a fun one. It defines what type is expected for each primitive. CodeChain/Functions.cpp @ 1 to 20 described possible values. "si*", for example, means that first argument is expected to be a string, second to be an integer and * defines that last type char is to be applied for all following arguments. Therefore, (fun x1 x2 x3 .. xn), x1 must be a string, x2, x3... xn to be an integer.

Finally, Flags are used to define specific properties of primitives. All flags can be found in Include/CodeChain.h @ 35 to 39. This will be 0 if not specified.

Files of interest
CodeChain/DefPrimitives.h defines primitives for all most basic functions that deal with primitives or represent a certain basic function. Math operations, logical operators and things like that are implemented here. Not functions that create new ships and whatnot.

CodeChain/Functions.cpp implements all C functions defined in CodeChain/DefPrimitives.h
These functions are the (second) things that are called when you call something in TLisp.

CodeChain/CC*.cpp implement all type-specific functions that are called in CodeChain/Functions.cpp and also general purpose functions.
For example, sorting of lists is implemented in CodeChain/CCLinkedList.cpp @ 892. This function calls QuickSort from the same file.

CodeChain/pageLibrary.cpp? What is this file? It's been around for a long time and it seems that it implements functionality for accessing files on hard drive and there are traces that accessing the Web was also considered. Two primitives are defined in there too.



Important information about (sort lst)
QuickSort is not guaranteed to be a stable sort, meaning that it does NOT preserve the relative order of things it sorts. That means that if you try to sort, say, something like (("A" 3) ("B" 3) ("C" 2)) by second element, it might not sort it like so: (("C" 2) ("A" 3) ("B" 3)). It might as well yield (("C" 2) ("B" 3) ("A" 3)).
QuickSort is also not randomized here, but unless you do something stupid, it doesn't really matter. However, if your list contains a lot of repetitions and it is big enough, DO NOT use (sort). It will be horrible for performance. Consider implementing some other sorting algorithm.

Important information about lnk functions
They work in-place if first argument is a variable that is a list or they return a new object if the argument is an expression! You don't need to do something like (setq lst (lnkAppend lst 1)) because lnkAppend will append 1 to the lst as is. But you must do it here (setq lst (lnkAppend (list 1 2 3) 4)) because (list 1 2 3) is not a variable, but an expression that returns a list.

Non-primitive functions and functions with side effects
Side effect is when a function does something else except returning a value. Changing a global variable and similar things.
Those are implemented in a completely different repository. WHYY?!
This repository, Mammoth/TSE, contains other API functions, but they are mixed in with everything else.
All primitives are defined here, TSE/CCExtensions.cpp but also all C functions are implemented here as well. This one enormous file does the same things as all three files of interest in CodeChain.
TSE/C*.cpp implements actual objects, including their functions that are used by C functions in TSE/CCExtensions.cpp
Also dockscreen and gam functions are defined and implemented here, in a completely different repository. Again. Transcendence/CodeChainExtensions.cpp. Why does George keep doing that?



XML
Unlike TLisp, this is not going to be an easy walk. XML is much less structured.
Let's start.

TLisp is implemented in more or less declarative style in code whereas XML is kinda all over the place. Quite literally, actually.
Fortunately we don't need to look into code for the most part because George decided to use macros to define things.

Definitions and implementations are in TSE/C*.cpp. Nearly all files start with preprocessor macros. Many of these macros are used for XML tags and attributes and whatnot. George uses a simple naming scheme to differentiate between literals.

Macro mask: *_TAG
Used to: define container tags.
Explanation: That is, tags that are allowed to be defined inside this tag.
Functions that are used together with this macro: GetContentElementByTag

Macro mask: *_ATTRIB
Used to: define attributes, duh
Explanation: Tags are basically a container of attributes and other tags. Attributes are constant and cannot be changed once set.
Notes: Actually, there is a function SetArgument, but it only works for strings and it's used only in a couple of places.
You cannot access this function from TLisp
Functions that are used together with this macro: FindAttributeInteger, GetAttribute, GetAttribute{TYPE}, GetAttribute{NUMERIC TYPE}Bounded, GetAttributeCount..., basically a bunch of typed functions that simply convert attribute's value to a required type and some miscellaneous functions.

Macro mask: *_EVENT, sometimes EVENT_*
Used to: define events, duh [2]
Explanation: Now those are pretty interesting. When some event is getting fired, it gets a context set up for it and each event is handled differently. From TLisp this is basically those variables that become defined inside event tags.
Notes: Unfortunately there is no easy way to create a map of all events and when they are fired so go and read the sources instead.
Functions that are used together with this macro: FindEventHandler, however, this is not always the case. Most events are fired by specialized functions that are usually named like Fire[EventName]. These functions are usually defined in Include/TSE*.h and implemented in respective TSE/C*.cpp files.

Macro mask: FIELD_*, also some files have *_FIELD
Used to: Internal use
Explanation: Fields are more or less private to whatever class they are defined in. You don't set them and you don't have direct access to them from TLisp or XML. They are basically used internally.
Functions that are used together with this macro: N/A

Macro mask: PROPERTY_*
Used to: Define properties, duh [3]
Explanation: Properties mean what you would expect - public fields which can be changed and which require you to call a function to set/get their value. They are also defined per object. Those can be accessed from TLisp and their initial values are defined in XML. There is a huge list of available properties if you call (objSetProperty obj property value) or (objGetProperty obj property). Calling this will format the input/output and other things that will be done before setting the property or getting it.
Functions that are used together with this macro: OnGetProperty, GetProperty{TYPE}, SetProperty{TYPE} and the mighty fnObjSet.

Macro mask: SPECIAL_*
Used to: Define special attributes, duh [4]
Explanation: These guys are special. They have some very specific implementation assigned to them.
Functions that are used together with this macro: OnHasSpecialAttribute and HasSpecialAttribute

Macro mask: ACTION_SPECIAL_*
Used to: ?
Explanation: They are not used anywhere. Perhaps George wanted to refactor code related to dockscreens?
Functions that are used together with this macro: N/A



That's pretty much all. Hope you know where to search for things you wish to know now.
Last edited by EditorRUS on Sat Apr 15, 2017 5:54 am, edited 2 times in total.
User avatar
digdug
Fleet Admiral
Fleet Admiral
Posts: 2620
Joined: Mon Oct 29, 2007 9:23 pm
Location: Decoding hieroglyphics on Tan-Ru-Dorem

Stuck ! :D
EditorRUS
Militia Lieutenant
Militia Lieutenant
Posts: 148
Joined: Tue Oct 30, 2012 6:30 pm

The code is actually pretty well-commented and easy to understand once you get a hang of it, but it is absolutely massive.

But this guide should give a head start for those who wish to merely get the whole idea of how to search information in the source code in general. Once you dive deep into it, you won't need this guide anymore probably, but it should provide just enough information to start doing it effortlessly.
Post Reply