Ligatures in programming fonts:
hell no

Lig­a­tures in pro­gram­ming fonts—a mis­guided trend I was hop­ing would col­lapse un­der its own il­logic. But it per­sists. Let me save you some time—

Lig­a­tures in pro­gram­ming fonts are a ter­ri­ble idea.

And not be­cause I’m a purist or a grump. (Some days, but not to­day.) Pro­gram­ming code has spe­cial se­man­tic con­sid­er­a­tions. Lig­a­tures in pro­gram­ming fonts are likely to ei­ther mis­rep­re­sent the mean­ing of the code, or cause mis­cues among read­ers. So in the end, even if they’re cute, the risk of er­ror isn’t worth it.

First, what are lig­a­tures? Lig­a­tures are spe­cial char­ac­ters in a font that com­bine two (or more) trou­ble­some char­ac­ters into one. For in­stance, in ser­ifed text faces, the low­er­case f of­ten col­lides with the low­er­case i and l. To fix this, the fi and fl are of­ten com­bined into a sin­gle shape (what pros would call a glyph).

fi fj fl ffi gg gyok
fi fj fl ffi gg gywrong
fi fj fl ffi gg gyright

In this type de­signer’s opin­ion, a good lig­a­ture doesn’t draw at­ten­tion to it­self: it sim­ply re­solves what­ever col­li­sion would’ve hap­pened. Ide­ally, you don’t even no­tice it’s there. Con­versely, this is why I loathe the Th lig­a­ture that is the de­fault in many Adobe fonts: it re­solves noth­ing, and al­ways draws at­ten­tion to itself.

Lig­a­tures in pro­gram­ming fonts fol­low a sim­i­lar idea. But in­stead of fix­ing the odd trou­ble­some com­bi­na­tion, well-in­ten­tioned am­a­teur lig­a­tur­ists are adding dozens of new & strange lig­a­tures. For in­stance, these come from Fira Code, a heav­ily lig­a­tured spin­off of the open-source Fira Mono.

So what’s the prob­lem with pro­gram­ming ligatures?

  1. They con­tra­dict Uni­code. Uni­code is a stan­dard­ized sys­tem—used by all mod­ern fonts—that iden­ti­fies each char­ac­ter uniquely. This way, soft­ware pro­grams don’t have to worry that things like the Greek let­ter Δ (= up­per­case Delta) might be stashed in some spe­cial place in the font. In­stead, Uni­code des­ig­nates a unique name and num­ber for each char­ac­ter, called a code point. If you have a Δ in your font, you as­so­ciate it with its des­ig­nated Uni­code code point, which is 0x0394. In ad­di­tion to al­pha­betic char­ac­ters, Uni­code as­signs code points to thou­sands of sym­bols (in­clud­ing emoji).

    The prob­lem? Many of the pro­gram­ming lig­a­tures shown above are eas­ily con­fused with ex­ist­ing Uni­code sym­bols. Sup­pose you’re look­ing at a code frag­ment that uses Uni­code char­ac­ters and see the sym­bol . Are you look­ing at a != lig­a­ture that’s shaped like ? Or the ac­tual Uni­code char­ac­ter 0x2260, which also looks like ? The lig­a­ture in­tro­duces an am­bi­gu­ity that wasn’t there before.

  2. They’re guar­an­teed to be wrong some­times. There are a lot of ways for a given se­quence of char­ac­ters, like !=, to end up in a source file. De­pend­ing on con­text, it doesn’t al­ways mean the same thing.

    The prob­lem is that lig­a­ture sub­sti­tu­tion is “dumb” in the sense that it only con­sid­ers whether cer­tain char­ac­ters ap­pear in a cer­tain or­der. It’s not aware of the se­man­tic con­text. There­fore, any global lig­a­ture sub­sti­tu­tion is guar­an­teed to be se­man­ti­cally wrong part of the time.

When we’re us­ing a ser­ifed text font in or­di­nary body text, we don’t have the same con­sid­er­a­tions. An fi lig­a­ture al­ways means f fol­lowed by i. In that case, lig­a­ture sub­sti­tu­tion that ig­nores con­text doesn’t change the meaning.

Still, some ty­po­graphic trans­for­ma­tions in body text can be se­man­ti­cally wrong. For in­stance, foot and inch marks are of­ten typed with the same char­ac­ters as quo­ta­tion marks. (See straight and curly quotes.) But whereas quo­ta­tion marks want to be curly, foot and inch marks want to be straight (or slanted slightly to the up­per right). So if we ap­ply au­to­matic smart (aka curly) quotes, we have to be care­ful not to cap­ture foot and inch marks in the transformation.

Does that mean pro­gram­mers can never have nice things? It’s to­tally fine to re­design in­di­vid­ual char­ac­ters to dis­tin­guish them from oth­ers. For in­stance, in Trip­li­cate, I in­clude a spe­cial “Code” vari­ant that in­cludes re­designed ver­sions of cer­tain char­ac­ters that are eas­ily confused.

`$te_fl{1234*567~890} Reg­u­lar
`$te_fl{1234*567~890} Code

But in this case, the point is dis­am­bigua­tion: we don’t want the low­er­case l to look like the digit 1, nor the zero to look like a cap O. Whereas lig­a­tures are go­ing the op­po­site di­rec­tion: mak­ing dis­tinct char­ac­ters ap­pear to be others.

Bot­tom line: this isn’t a mat­ter of taste. In pro­gram­ming code, every char­ac­ter in the file has a spe­cial se­man­tic role to play. There­fore, any kind of “pret­ti­fy­ing” that makes one char­ac­ter look like an­other—in­clud­ing lig­a­tures—leads to a swamp of de­spair. If you don’t be­lieve me, try it for 10 or 15 years.

—Matthew But­t­er­ick
29 March 2019

by the way
  • Yes, I do a lot of programming.

  • “What do you mean, it’s not a mat­ter of taste? I like us­ing lig­a­tures when I code.” Great! In so many ways, I don’t care what you do in pri­vate. Al­though I pre­dict you will even­tu­ally burn your­self on this hot mess, my main con­cern is ty­pog­ra­phy that faces other hu­man be­ings. So if you’re prepar­ing your code for oth­ers to read—whether on screen or on pa­per—skip the lig­a­tures. Not least be­cause you won’t even know when they go wrong. See trade­mark and copy­right sym­bols for a re­lated cau­tion­ary tale.

  • One in­spi­ra­tion for this piece was the La­TeX crowd, who would rou­tinely write me to in­sist their ty­pog­ra­phy was in­fal­li­ble. And yet. I kept see­ing La­TeX-pre­pared books that in­cor­rectly sub­sti­tuted curly quotes for back­ticks. For in­stance, the ex­am­ple be­low is from Kent Dy­b­vig, The Scheme Pro­gram­ming Lan­guage, 4th ed. In this chunk of Scheme code, the open­ing-quote marks are sup­posed to be back­ticks; the clos­ing-quote mark is sup­posed to be a sin­gle straight quote:

    “But code sam­ples like these aren’t really am­bigu­ous, be­cause every­one knows that you don’t type the curly quotes.” A sloppy ar­gu­ment, though it may be true for lan­guages that only ac­cept ASCII in­put. But many of to­day’s pro­gram­ming lan­guages (e.g., Racket) ac­cept UTF-8 in­put. In that case, curly quotes can le­git­i­mately be part of the in­put stream. So am­bi­gu­ity is a real pos­si­bil­ity. Same prob­lem with ligatures.

  • The other in­spi­ra­tion for this piece were the peo­ple who re­peat­edly asked me when Trip­li­cate would get lig­a­tures, Pow­er­line char­ac­ters, and so on. An­swer, as nicely as pos­si­ble: never.

undock move Heliotrope Equity Valkyrie Century Supra Concourse Triplicate buy font close