Featured image of post Dynamic Dispatch: Memory Organization

Dynamic Dispatch: Memory Organization

How exection of a possibly overridden function works

Introduction

Suppose a class and its two children have the following structures.

$$ \begin{align*} &\texttt{class Pet:}\newline &\texttt{\qquad string name}\newline &\texttt{\qquad int age} &\newline &\newline &\texttt{class Cat extends Pet:}\newline &\texttt{\qquad string catBreed}\newline &\text{}\newline &\texttt{class Dog extends Pet:}\newline &\texttt{\qquad string dogBreed} \end{align*} $$

So maybe a person wants to keep track of their pets. Since both $\texttt{Cat}$ and $\texttt{Dog}$ are subtypes of $\texttt{Pet}$, we can store them in a $\texttt{Pet}$ array.

$$\texttt{Pet[] pets}$$

Note that elements in $\texttt{pets}$ won’t know if they are a $\texttt{Cat}$ or $\texttt{Dog}$ so we can only access fields that belong to $\texttt{Pet}$ ($\texttt{name}$ and $\texttt{age}$).

Single Inheritance

If a $\texttt{Pet}$ doesn’t know if it is a $\texttt{Cat}$ or a $\texttt{Dog}$, we need to make sure that the procedure for accessing $\texttt{name}$ and $\texttt{age}$ is the same in both of its children.

Fields

What if we put the parent ($\texttt{Pet}$) fields at the top of the memory structure.

pneaatmgee*cantcaaBamgrteee*ed*dongdaaBomgrgeee*ed*

Strings are just pointers (*) since there is no bound on string length.

Now, no matter what a $\texttt{Pet pet}$ actually is, $\texttt{pet+0x0}$ is going to be $\texttt{name}$, and $\texttt{pet+0x8}$ is going to be $\texttt{age}$.

This is in our theoretical memory model, though the effect would be similar in your favorite programming language.

Non-virtual Functions

Think of non-virtual functions as just normal class methods. We will discuss virtual functions afterwards.

Now, let’s add a function to our class hierarchy.

$$ \begin{align*} &\texttt{class Pet:}\newline &\texttt{\qquad string name}\newline &\texttt{\qquad int age}\newline &\text{}\newline &\texttt{\qquad void noise():}\newline &\texttt{\qquad\qquad print(“pet noises”)} \end{align*} $$

Could we do the same thing?

n.npnodoeaaiaitmgstseeeae**canntcaaoBamgirteese*ee*d*donngdaaoBomgirgeese*ee*d*

$\texttt{.data}$ just stores the function definition.

While this is correct, we can optimize based on the observation that method definitions are shared between all objects of the same class.

First, note that a method call $\texttt{pet.noise()}$ is syntactic sugar for $\texttt{Pet.noise(pet)}$ (object is passed as parameter). Since $\texttt{noise}$ is not specific to the object $\texttt{pet}$, it can be resolved at compile-time.

.npndoeaaaitmgtseeae*cantcaaBamgrteee*ed*dongdaaBomgrgeee*ed*

Everytime $\texttt{Pet.noise(pet)}$ is called, just replace it (at compile-time) with the known address of $\texttt{Pet.noise}$ (in our memory model: $\texttt{.data+0x0}$).


For example:

$$ \begin{align*} &\texttt{Cat cat}\newline &\texttt{Dog dog}\newline &\texttt{cat.noise()}\newline &\texttt{dog.noise()}\newline \newline &\implies\newline \newline &\texttt{Cat cat}\newline &\texttt{Dog dog}\newline &\texttt{(.data+0x0)(cat)}\newline &\texttt{(.data+0x0)(dog)} \end{align*} $$

We save space on every object holding a pointer to $\texttt{noise}$ ($\texttt{+0x8}$ size per object).

Virtual Functions

Introduction

The heart of inheritance is the concept of a virtual function. The idea is that an inherited method can take different definitions depending on the class hierarchy lineage.

Let’s take our code from above and modify it so that $\texttt{noise}$ is a virtual function.

Other class data omitted.

$$ \begin{align*} &\texttt{class Pet:}\newline &\texttt{\qquad void noise():}\newline &\texttt{\qquad\qquad print(“pet noises”)}\newline &\text{}\newline &\texttt{class Cat extends Pet:}\newline &\texttt{\qquad void noise():}\newline &\texttt{\qquad\qquad print(“meow”)}\newline &\text{}\newline &\texttt{class Dog extends Pet:}\newline &\texttt{\qquad void noise():}\newline &\texttt{\qquad\qquad print(“bark”)} \end{align*} $$

The children can override the definition of a virtual method (in this case, $\texttt{noise}$).

Note that if the child does not provide a definition override, it keeps the parent’s definition (recursively).

$$ \begin{align*} &\texttt{class Fish extends Pet:}\newline &\texttt{\qquad string color}\newline &\text{}\newline &\texttt{Fish fish}\newline &\texttt{fish.noise()\qquad\thickspace\thickspace// “pet noises”}\newline &\text{}\newline &\texttt{class Betta extends Fish:}\newline &\texttt{\qquad string bettaSpecies}\newline &\text{}\newline &\texttt{Betta betta}\newline &\texttt{betta.noise()\qquad// “pet noises”} \end{align*} $$

The issue now is that we cannot know at compile-time which virtual function (in this case $\texttt{noise}$) to call.

$$ \begin{align*} &\texttt{Pet[] pets = [Cat(), Dog()]}\newline &\texttt{pets[0].noise()\qquad// Cat.noise() -> “meow”}\newline &\texttt{pets[1].noise()\qquad// Dog.noise() -> “bark”} \end{align*} $$

One (naive) solution is to again provide the function pointer as part of the object’s structure.

PPCDeeaot.ttgpn.d...eaanannntmgotoooeeiaiii*sssseeee*Cactacn.taaanBtmgoreeie*seed**Dodgodn.goaanBgmgoreeie*seed**

But again, objects of the same type share the same set of virtual functions (every $\texttt{Cat}$ is going to have the same $\texttt{noise=cat+0x10}$ value).

Vtable

Enter the virtual method table. Every object in a class hierarchy (with virtual methods) now also contains a pointer (virtual pointer) to a table (virtual table) containing all of the virtual methods.

PPPPCDeeeeaottt.ttgp.n..d...evaavnannnttmgtotoooaeeaiaiiib*bsssslleeeeee**CCCaaacttta...ntvncvaaBtoatmgraitaeeebsb*elelde*e**DDDooodgggo...ngvndvaaBtootmgraigaeeebsb*elelde*e**

In this case, there is more memory used than in our naive approach. But for every additional virtual function, under the naive approach every object increases in size linearly.

With vtables, only the appropriate vtable sizes increase (and it is expected that there are significantly less vtables than object instances).

$$ \begin{align*} &\texttt{Pet[] pets = [Cat(), Dog()]}\newline &\texttt{pets[0].noise()}\newline &\texttt{pets[1].noise()}\newline &\texttt{/* }\newline &\texttt{ * vptr = pets[\textit{x}]+0x0}\newline &\texttt{ * vtable = \&vptr}\newline &\texttt{ * noise = vtable+0x0}\newline &\texttt{ */} \end{align*} $$

This also works with inheritance lineages. Suppose that $\texttt{Pet}$ is a child class of $\texttt{Thing}$.

Other class data omitted.

$$ \begin{align*} &\texttt{class Thing:}\newline &\texttt{\qquad string class():}\newline &\texttt{\qquad \qquad return “Thing”}\newline &\text{}\newline &\texttt{class Pet extends Thing:}\newline &\texttt{\qquad string class():}\newline &\texttt{\qquad \qquad return “Pet”}\newline &\text{}\newline &\texttt{class Dog extends Pet:}\newline &\texttt{\qquad string class():}\newline &\texttt{\qquad \qquad return “Dog”}\newline &\text{}\newline &\texttt{class Cat extends Pet:}\newline &\texttt{\qquad string class():}\newline &\texttt{\qquad \qquad return “Cat”} \end{align*} $$

TPTTPPPThehheeehPCDPCDitiittt.ieaoeaonp.nnn...dnttgttgtgevaaggvcnag......h.ttmg..tlot.cccnnnivaeevcaaiaclllooontb*tlbsslaaaiiigalaalseassssssbebse**sssseeel*lssee**CCCCaaaactttta....ntvcncvaaBtloatmgraaitaeeebssb*elselde**e**DDDDoooodggggo....ngvcndvaaBtlootmgraaigaeeebssb*elselde**e**

The only constraint is that parent virtual functions should be represented in memory before child virtual functions. Doing so keeps an ordering across a lineage. In this case, the $\texttt{class}$ virtual function will always be at $\texttt{vtable+0x0}$ for any class that has that function.

If a class doesn’t have that virtual function, the type checker would catch the error at compile time (i.e. $\texttt{Thing.noise()}$).

Multiple Inheritance

Introduction

New example.

$$ \begin{align*} &\texttt{class Printable:}\newline &\texttt{\qquad string output}\newline &\text{}\newline &\texttt{\qquad void print():}\newline &\texttt{\qquad \qquad print(output)}\newline &\text{}\newline &\texttt{class Stringable:}\newline &\texttt{\qquad string value}\newline &\text{}\newline &\texttt{\qquad string toString():}\newline &\texttt{\qquad \qquad return value}\newline &\text{}\newline &\texttt{class Person extends Printable, Stringable:}\newline &\texttt{\qquad string name}\newline &\text{}\newline &\texttt{\qquad void print():}\newline &\texttt{\qquad \qquad print(output)}\newline &\texttt{\qquad \qquad print(value)}\newline &\texttt{\qquad \qquad print(name)}\newline &\text{}\newline &\texttt{\qquad string toString():}\newline &\texttt{\qquad \qquad return output + value + name} \end{align*} $$

Ok, so now we can have two parents. How can we possibly represent this in memory?

The Problem

What ordering can we keep?

PPrrPiirnnipttnraoatibubanltlbtepela.u.ebvtv.lt*tpeaarbbillneet*SSStttrrriiinsnngtggaravabibablnlllegeue.a.e.tbv*volttSeaatbbrllieen*gPPePerePrsrespoovsroenuanosnr.tlano.svpum.ntotueev.onat**tpSb*artlbirelni*etng

If we put $\texttt{print}$ first in the $\texttt{Person}$ vtable, $\texttt{toString}$ calls and $\texttt{value}$ accesses fail.

$$ \begin{align*} &\texttt{Stringable[] stringables = [Stringable(), Person()]}\newline &\texttt{Stringable[0].toString()}\newline &\texttt{Stringable[1].toString()}\newline &\texttt{/* }\newline &\texttt{ * vptr = stringables[\textit{x}]+0x0}\newline &\texttt{ * vtable = \&vptr}\newline &\texttt{ * toString = vtable+???}\newline &\texttt{ * // 0x0 in Stringable.vtable; 0x8 in Person.vtable}\newline &\texttt{ */}\newline &\newline &\texttt{Stringable[0].value}\newline &\texttt{Stringable[1].value}\newline &\texttt{/* }\newline &\texttt{ * value = stringables[\textit{x}]+???}\newline &\texttt{ * // 0x0 in Stringable layout; 0x8 in Person layout}\newline &\texttt{ */} \end{align*} $$

The Solution

Part 1: Contiguous Layout

psPPeteePrrPrrrsirPssioniroonngninnptatn..raobatPSibulbartnltelbirtepelnia.u.etnbvtv.aglt*tpbaeaarlbbbiellln.eeetv.*PtvPeaterbarslbsoelonen..SSSPtSttrortrriuivnriintnaainsntpglmngtgauauegaravbtbe*abPibal*l*blenlleelePrgeu..e.esa.evv.trobv*ttvosnltaatSo.eabbatntbllbr.oleelipSe**enrt*girnitng

The key idea is this: whenever we pass a $\texttt{Person}$ where $\texttt{Stringable}$ is expected, it is replaced at compile-time with the $\texttt{stringable}$ address $\texttt{person+0x10}$.

$$\begin{align*} \texttt{\&person}&\neq\texttt{\&cast<Stringable>(person)}\newline \texttt{\&person+0x10}&=\texttt{\&cast<Stringable>(person)} \end{align*} $$

sPterrisnogna.bSlteringable.Pvetrasbolne.Strivnnaaglmauebe*Pl*eer.svotna.btloeS*tring

$$ \begin{align*} &\texttt{Person person = Person()}\newline &\texttt{Stringable[] stringables = [Stringable(), person] // person+0x10}\newline &\texttt{Stringable[0].toString()}\newline &\texttt{Stringable[1].toString()}\newline &\texttt{/* }\newline &\texttt{ * vptr = stringables[\textit{x}]+0x0}\newline &\texttt{ * vtable = \&vptr}\newline &\texttt{ * toString = vtable+0x0}\newline &\texttt{ */}\newline \newline &\texttt{Stringable[0].value}\newline &\texttt{Stringable[1].value}\newline &\texttt{/* }\newline &\texttt{ * value = stringables[\textit{x}]+0x8}\newline &\texttt{ */} \end{align*} $$

Relative memory location is consistent!

One problem: how does $\texttt{Person.toString}$ access $\texttt{output}$ (a field derived from $\texttt{Printable}$)?

$ \texttt{string Person.toString():}\newline \texttt{\qquad return output + value + name}\newline \texttt{\qquad // output is at person+0x8}\newline \texttt{\qquad // output is lost when using person+0x10}\newline $

Part 2: Top Offset

psPPeteerrPrrsirPssoniroongninnatn..batPSlbartel0birexlni.0etnv.agtpbaarlbbielln.eetv.PtvPeaterbarslbsoelonen..SSPtStrortriuivnrintnaaintpglmngauauegabtbe*abPl*l*b0leeelxePr..e0.esvv.trottvos-naatS0o0.bbatxnxtllbr0.1oeelip0S**enrtgirnitng0x10

We’ve added two new entries to the $\texttt{Person.vtable}$: the top offsets.

Recall that $\texttt{person.toString()}$ is syntactic sugar for $\texttt{Person.toString(person)}$. So now, when we call functions in the vtable, instead of passing the object address, we pass the object address plus the top offset (which gives a pointer to the original object). This is called this pointer adjustment.

$$\begin{align*} &\texttt{Person person = Person()}\newline &\texttt{Stringable[] stringables = [Stringable(), person] // person+0x10}\newline &\texttt{Stringable[0].toString()}\newline &\texttt{Stringable[1].toString()}\newline &\texttt{/* }\newline &\texttt{ * vptr = stringables[\textit{x}]+0x0}\newline &\texttt{ * vtable = \&vptr}\newline &\texttt{ * toString = vtable+0x8}\newline &\texttt{ * topOffset = *(vtable+0x0)}\newline &\texttt{ * this = stringables[\textit{x}]+topOffset}\newline &\texttt{ * toString(this)}\newline &\texttt{ */} \end{align*}$$

Now, $\texttt{output}$ is at $\texttt{this+0x8}$.

Observe that this works with $\texttt{Printable}$.

$$\texttt{\&person}=\texttt{\&cast<Printable>(person)}$$

The $\texttt{Person.Printable}$ top offset is $\texttt{0x0}$, so the this pointer adjustment has no effect.

In general, if $\texttt{Child extends A, B, C, D}$:

CABCDCCCCChhhhhhiiiiiilllllldddddd.....vABCDt....avvvvbttttlaaaaebbbbllllCCCCeeeeChhhhhiCiCiCiCiClhlhlhlhlhdididididi.l.l.l.l.lAdBdCdDdvd…..…..…..…..…t.vAvBvCvDadt.t.t.t.CCCCbaadadadadChhhhltbabababahiiiiealtltltltillll*eaeaeaealdddd****d.....ABCD0f-.-.-.-.xu…af…bf…cf…df…0nuuuucnnnnatccccittttoiiiibnoooosnnnnsssscd
My heart is in the work ― Andrew Carnegie
Built with Hugo
Theme Stack designed by Jimmy