Peter Alexander

Posted: • 7 min read

I’ve never really gone very deep into graphics or rendering, but I’ve long been quite fascinated by the use of signed distance fields for defining geometry and implicit surfaces more generally. Like many people, I found out about this through Inigo Quilez’s work and Shadertoy in particular. A long time ago I did spend some time playing with it. I’d like to create more if I find the time / inspiration.

What I like most about SDFs for defining geometry is that simple geometry can be expressed simply. A sphere is defined with just a couple of mathematical operations:

float sphere(vec3 p, float r)
{
  return length(p) - r;
}

Compare this to regular triangle meshes where you can’t even define an exact sphere and trying to approximate it requires a huge number of vertices. This is not too difficult to define programmatically, but you are essentially forced to use authoring tools to deal with the complexity of defining geometry, no matter how simple. Many times when I’ve been prototyping game ideas, I’ve been frustrated that I’m forced to open up Blender just to create the most trivial piece of geometry.

What I’d really like is just some textual format for defining geometry similar to how you do it with SDF. OpenSCAD is there for CSG, but as far as I can tell you can’t do anything like smooth unions or domain transformations, which are needed for more organic-looking geometry. Using pure shader code is ok for this, but with shapes being represented as functions, transformations really need a language that supports first class functions (an SDF is a function of ℝ³→ℝ, then a union transformation is (ℝ³→ℝ)→(ℝ³→ℝ)→(ℝ³→ℝ)).

Finally, with LLMs in the picture, as cool as Blender MCP is, it would be nice if the LLM could just spit out a textual scene format. I tend to agree with Carmack here:

To that end, I started playing around with an idea of an SDF-based file format. As far as I can find, there isn’t really anything like this (which was quite surprising!). Would be interested to know if something like this exists.

The basic idea is that you can define shapes textually via simple primitives, transformations, and operations:

# Define sphere at origin, and rotated cube.
shape s = sphere(radius: 2);
shape b = box(width: 3, height: 3, depth: 3) |> rotate(x: 100, y: 200, z: 100);

# Translate and then combine via smooth union.
# As the last expression in the file, this is the output shape.
smooth_union(
  s |> translate(x: -1, y: 0, z: 0),
  b |> translate(x: 1, y: 0, z: 0),
  k: 1.0
);

This file defines the shape. I haven’t put a lot of thought into syntax yet, but the shape type here actually is an SDF, and we have primitives such as sphere and box as leaf nodes and higher-order SDFs such as translate and smooth_union for transformations.

With geometry defined this way, you can then render via a simple tool that understands how to parse the format. The tool does basic parsing and semantic analysis producing a runtime data model that supports querying the field at any coordinate. The tool also supports rendering using basic raymarching (its only function right now).

sdftool demo.sdf -o demo.png

This produces:

Example output of SDF tool defined earlier

Since we have an analytical shape, we can also extract normals via tetrahedron technique.

sdftool demo.sdf -o normals.png --render-mode normal

Example of rendering normals

If we need a mesh, we can generate one (currently using marching cubes with basic QEM simplification):

sdftool demo.sdf -o mesh.obj --mesh

Example of generated mesh

With this language in place, I can quite easily generate non-trivial meshes purely with text such as a chess pawn with smooth curves:

shape base = cylinder(radius: 1.4, height: 0.35) |> color(0.4, 0.35, 0.33);
shape base_rim = cylinder(radius: 1.2, height: 0.15) |> translate(y: 0.45);
shape stem = cylinder(radius: 0.36, height: 2.4) |> translate(y: 1.5);
shape mid_stem = cylinder(radius: 0.7, height: 1.4) |> translate(y: 0.5);
shape collar = cylinder(radius: 0.78, height: 0.01) |> translate(y: 2.7) |> expand(0.04);
shape head = sphere(radius: 0.7) |> translate(y: 3.5);

shape base_assembly = smooth_union(base, base_rim, k: 0.1);
shape mid = smooth_union(base_assembly, mid_stem, k: 0.4);
shape lower_pawn = smooth_union(mid, stem, k: 0.5);
shape pawn_collar = smooth_union(lower_pawn, collar, k: 0.1);
shape pawn = smooth_union(pawn_collar, head, k: 0.30);

translate(pawn, y: -2.5);
Chess pawn normals Chess pawn mesh

Another benefit of this format is size: the raw text for this pawn mesh is just 757 bytes, and this is without any attempt at compression. Could easily be under 100 bytes with minimal effort. The mesh by comparison is a few hundred kilobytes.

Of course, there are downsides to SDFs: rendering SDFs directly is still very expensive and complex geometry may end up being extremely difficult to model this way.

There are some future directions I’d like to take this:

  • A runtime library could load the file and support an API for evaluating the field at points, or raymarching.
  • The library could also allow creating the data format in memory at runtime.
  • Build more scaffolding to allow LLMs to create geometry from a prompt.
  • Support approaches to texturing.
  • And of course, improved algorithms for mesh generation and rendering.

Will continue tinkering on this and if it proves useful will maybe open source.


Posted: • 4 min read

I’ve been getting back into a bit of D programming recently and catching up on what’s been happening in the community. Some things have changed (Andrei isn’t around any more), but most things have stayed the same. Walter is still around and the language has a small and loyal following. DConf is still happening every year and people are still shipping things.

It is significantly smaller than both Rust and Go though. I think this is a real shame since I do feel that D fits a niche of languages that give you systems level access with quite powerful features around expressiveness and generally pleasant (and familiar) syntax. By comparison, I find Rust a bit too opinionated on how it should be written (yes, the borrow checker) and while Go is pleasant, it’s not very powerful when it comes to setting up scalable abstractions. D fits the gap quite nicely (while having problems of its own).

A perennial problem that D seems to have is that it can’t quite decide what it wants to be, and as a result tries to please everyone. As an example, D has always had a GC, but you get a subset of devs that really want to avoid the GC (for legitimate reasons). As a result, a long time ago, D added the @nogc attribute to functions, which statically checks if the function may allocate memory. The idea is that you can wrap your latency sensitive code (e.g. a game’s update loop) in @nogc and then rest assured that you will avoid GC pauses.

// Simple @nogc function - this compiles fine
@nogc int add(int a, int b) {
    return a + b;
}

// This won't compile - array literals allocate
@nogc int[] getNumbers() {
    return [1, 2, 3];
}

On the surface, @nogc is fine, but now you have a problem with what to do with common libraries: do they use the GC, avoid the GC, or provide the option of GC/no-GC? Whatever you choose, you are not going to win: either deal with endless complexity of supporting both, or make one camp unhappy.

I was not a fan when @nogc was originally proposed and I still think it was a mistake. I should mention that as a game developer, I do often want to avoid the GC!

I’d much prefer that D did not implement @nogc and instead:

  • Provide runtime tooling that helps you find GC allocations.
  • Try to avoid GC as much as possible in the standard library.
  • Continue to provide language features / utilities to avoid GC when needed.
  • Continue to invest in improving the GC pause times.

If you are a fan of @nogc, I’d further offer some other observations:

  1. If you care about minimising pauses in your application, GC is only one source of those. You have to also avoid any sort of non-wait-free locks or concurrent data structures / blocking syscalls / pauses from reference counted deallocation cascades.
  2. Modern GC implementations can get those pauses down <1ms (see Go as proof of life). This tech continues to improve, so it is a good bet.
  3. You can achieve 99% of the desired outcome via runtime tooling (logging on each GC allocation). I’ve done a lot of work in Unity3D to avoid GC in C#. Yes, it does take some effort to track down those allocations and fix them, but really the work is not that different from fixing compilation errors from @nogc, and you have the option of just ignoring the GCs that happen on the rare paths that you don’t really care about.

All this being said, @nogc has shipped and I would never advocate for breaking existing code. However, there are active discussions going on about a “Phobos 3” and I would suggest that the approach is to go all-in and assume GC. Standard containers should just use GC unapologetically (but responsibly). There’s nothing stopping someone from creating an allocator-based container library and putting it on dub, but it doesn’t need to be in the standard library.

I continue to believe that D has its place in the set of useful and unique programming languages and really ought to have more usage. Complicating the language / compiler / libraries in order to try and make everyone happy I think goes against the pragmatism embodied by D and just stretches already thin contributor resources. Stay simple, choose a 90% solution over 100%, and focus on quality of what’s already there.


Posted: • 3 min read

Often when building APIs we’d like the caller to provide some information in a format, or using a protocol defined by the library. For example, we might want access to a contiguous buffer specified by a (void*, size_t) pair.

void cool_feature(void* buffer, size_t len);

An API like this is fine if the buffer will be read synchonously, but sometimes we need asynchronous access. This introduces a lifetime problem: how does the caller know when it can free the buffer?

A common way to solve this is to provide a callback that will be invoked once the processing has completed:

void cool_feature(void* buffer, size_t len, void (*done)(void*));

Here, done will be called with buffer once cool_feature has asynchonously finished its work.

This works most of the time, but has a couple of problems:

  1. You need to remember to call done in all cases.
  2. It only works if done only needs to be called with the buffer pointer. What if my buffer is part of a larger object?
struct Object {
  // ...
  std::string buffer;
  // ...
};

void do_stuff(std::unique_ptr<Object> obj) {
  cool_feature(obj.buffer.data(), obj.buffer.size(), ???);
}

There’s no obvious choice of function to pass in. We want to free obj once buffer processing is done, but we’ll be given a pointer to the string data, not obj.

A possible solution is to pass in an std::function<void()> as the callback, so that obj can be captured and destroyed as necessary. This works, but std::function has difficulty with move-only types, and there’s no guarantees that a capturing std::function won’t allocate memory.

We could, of course, define another interface (maybe call it ContiguousBufferProvider), but this also requires an extra memory allocation.

A little trick we can do to handle most of the common cases is to pass in a type-erased context to the function.

using Context = std::unique_ptr<void*, void(*)(void*)>;

void cool_feature(void* buffer, size_t len, Context context);

The context is essentially an extra piece of baggage that must be carried around until the buffer has been processed. Once processed, we just drop the context and it cleans up after itself.

We can also benefit from an extra function that converts any object into a Context through type erasure:

template <typename T>
void deleter(void* p) {
  delete static_cast<T*>(p);
}

template <typename T>
Context erase_type(std::unique_ptr<T> p) {
  return Context(static_cast<void*>(p.release()), &deleter<T>);
}

With this, the Object example is easy to solve:

void do_stuff(std::unique_ptr<Object> obj) {
  cool_feature(
      obj.buffer.data(),
      obj.buffer.size(),
      erase_type(std::move(obj));
}

This guarantees no extra memory allocations, and cool_feature just needs to drop the context once finished.

There are probably more general ways to solve this, but I like this approach as it is clean, simple, and solves all the problems I’ve encountered so far.


Posted: • 5 min read

Last night I rewatched Joe Armstrong’s excellent keynote from this year’s Strange Loop conference. He begins the talk with a fairly standard, but humorous lament about the sorry state of software, then goes on to relate this to some physical quantities and limits. He finishes by suggesting some possible directions we can look to improve things.

One of his suggestions is to build something called “The Condenser”, which is a program that takes all programs in the world, and condenses them down in such a way that all redundancy is removed. He breaks this task down into two parts:

Diagram showing The Condenser concept with two parts: Part 1 removes all duplicate files, Part 2 merges similar files using least compression difference

He also explains the obvious, and easy way to do Part 1.

Slide showing Part 1 implementation: Calculate SHA1 of all files and remove files with duplicate SHA1 signatures

At this point I remembered that I hadn’t written a lot of D recently, and although Joe is talking about condensing all files in the world, I was kind of curious how much duplication there was just on my laptop. How many precious bytes am I wasting?

It turns out that The Condenser Part 1 is very easy to write in D, and even quite elegant.

import std.algorithm;
import std.digest.sha;
import std.file;
import std.parallelism;
import std.stdio;
import std.typecons;

auto process(DirEntry e) {
  ubyte[4 << 10] buf = void;  // 4kb buffer, uninitialized
  auto sha1 = File(e.name).byChunk(buf[]).digest!SHA1;
  return tuple(e.name, sha1, e.size);
}

void main(string[] args) {
  string[ubyte[20]] hash2file;  // hash table of SHA1 -> filename
  auto files = dirEntries(args[1], SpanMode.depth, false)
               .filter!(e => e.isFile);
  foreach (t; taskPool.map!(process)(files)) {
    if (string* original = t[1] in hash2file) {
      writefln("%s %s duplicates %s", t[2], t[0], *original);
    } else {
      hash2file[t[1]] = t[0];
    }
  }
}

I’ll explain the code a little, starting from main. The files variable uses dirEntries and filter to produce a list of files (the filter is necessary since dirEntries also returns directories). dirEntries is a lazy range, so the directories are only iterated as necessary. Similarly, filter creates another lazy range, which removes non-file entries.

The foreach loop corresponds to Joe’s do loop. While I could have just iterated over files directly, I instead iterate over a taskPool.map, a semi-lazy map from David Simcha’s std.parallelism that does the processing in parallel on multiple threads. It is semi-lazy because it eagerly takes a chunk of elements at a time from the front of the range, processes them in parallel, then returns them back to the caller to consume. The task pool size defaults to the number of CPU cores on your machine, but this can be configured.

The process function is where the hashing is done. It takes a DirEntry, allocates an uninitialized (= void) 4kb buffer on the stack and uses that buffer to read the file 4kb at a time and build up the SHA1. Again, byChunk returns a lazy range, and digest!SHA1 consumes it. Lazy ranges are very common in idiomatic D, especially in these kinds of stream processing type applications. It’s worth getting familiar with the common ranges in std.algorithm and std.range.

I ran the program over my laptop’s home directory. Since the duplicate file sizes are all in the first column, I can just use awk to sum them up and find the total waste:

$ sudo ./the-condenser . | awk '{s+=$1} END {print s}'
2347353793

That’s 2.18Gb of duplicate files, which is around 10% of my home dir.

In some cases the duplicates are just files that I have absentmindedly downloaded twice. In other cases it’s duplicate music that I have both in iTunes and Google Play. For some reason, the game Left 4 Dead 2 seems to have a lot of duplicate audio files.

So, in theory, I could save 10% by deduplicating all these files, but it’s not a massive amount of waste. It doesn’t seem worth the effort trying to fix this. Perhaps it would be nice for the OS or file system to solve this automatically, but the implementation would be tricky, and still maybe not worth it.

Part 2 of the Condenser is to merge all similar files using least compression difference. This is decidedly more difficult, since it relies on near-perfect compression, which is apparently an AI-hard problem. Unfortunately, solving AI is out of scope for this post, so I’ll leave that for another time :-)


Posted: • 2 min read

If you do any sort of graphics, physics, or 3D programming in general then at some point you are going to be working with cubes, and at some point you are going to need to assign a number to each vertex in those cubes. Usually this is because you need to store them in an array, or maybe you are labelling the children of an octree, but often you just need some consistent way of identifying vertices.

For whatever reason, a surprisingly common way of numbering vertices is like this, with the bottom vertices numbered in counter-clockwise order followed by the top vertices in counter-clockwise order (warning, incoming ASCII art):

        7-------6
       /|      /|
      4-+-----5 | 
      | |     | |   y
      | 3-----+-2   | z
      |/      |/    |/
      0-------1     +--x

Depending on what you are doing, there’s generally only a few applications of the numbering system:

  • Converting a number to a coordinate.
  • Converting a coordinate to a number.
  • Enumerating adjacent vertex numbers.

Unfortunately, this common numbering scheme makes none of these tasks easy, and you have no choice but to produce tables of coordinates and adjacency lists for each vertex number. This is tedious, error-prone, and completely unnecessary.

Here is a better scheme:

        3-------7
       /|      /|
      2-+-----6 | 
      | |     | |   y
      | 1-----+-5   | z
      |/      |/    |/
      0-------4     +--x

This numbering uses the Lexicographic Ordering of the vertices, with 0 being the lexicographically first vertex, and 7 being the last.

Number
(decimal)
Number
(binary)
Coordinate
00000, 0, 0
10010, 0, 1
20100, 1, 0
30110, 1, 1
41001, 0, 0
51011, 0, 1
61101, 1, 0
71111, 1, 1

I’ve included the binary representation of the vertex number in table above, because it illuminates the key property of this scheme: the coordinate of a vertex is equal to the binary representation its vertex number.

\[ \begin{aligned} coordinate(n) &= ((n \gg 2) \mathrel{\&} 1, (n \gg 1) \mathrel{\&} 1, (n \gg 0) \mathrel{\&} 1) \\ number(x, y, z) &= (x \ll 2) \mathrel{|} (y \ll 1) \mathrel{|} (z \ll 0) \end{aligned} \]

(Of course, the zero shifts are unnecessary, but I’ve added them to highlight the symmetry.)

This numbering scheme also makes adjacent vertex enumeration easy. As an adjacent vertex is just a vertex that differs along one axis, we just need to flip each bit using XOR to get the adjacent vertices:

\[ \begin{aligned} adj_x(n) &= n \oplus 100_2 \\ adj_y(n) &= n \oplus 010_2 \\ adj_z(n) &= n \oplus 001_2 \end{aligned} \]

It should be clear that this numbering scheme trivially extends into higher dimension hypercubes by simply using more bits in the representation, and extending the formulae in obvious ways.

So there you have it. Whenever you are looking to number things and aren’t sure what order to use, start with lexicographic. It usually has nice encoding and enumeration properties, and if not then it is probably no worse than any other ordering.