Roland's homepage

My random knot in the Web

Extracting glyphs from an OpenType file

In order to create a compact representation of a monogram, I wanted to get the raw curve info from the Latim Modern Roman font, not the rendered glyphs.

This article documents how I did that.

Converting the font to readable form

This was done in several steps.

First analyze the font by dumping the internal table names:

otfinfo -t lmroman10-regular.otf|grep CFF
61140 CFF

The fact that it has a CFF table indicates that this is a PostScript Type 1 font.

Note

The procedure outlined in this article only works for PostScript fonts! OpenType fonts can also contain TrueType fonts.

Extract the internal PostScript Type 1 font:

otftotfm lmroman10-regular.otf

This yields the PostScript font binary LMRoman10-Regular.pfb. The generated tfm and enc files can be deleted.

For the final step we download detype1.c from and compile it. Then we run it on the font binary to produce the font in PostScript code:

cc -o detype1 detype1.c
./detype1 LMRoman10-Regular.pfb > LMRoman10-Regular.ps

Extracting the glyphs and converting to regular PostScript

I’m interested in the letters R, F and S. We can look them up in the PostScript code for the font. First, R:

/R ## -| { 0 736 hsbw -22 22 hstem 652 31 hstem 35 103 vstem 138 86 vstem
224 103 vstem 732 88 rmoveto 6 0 11 -13 vhcurveto -11 0 0 -9 -1 -7 rrcurveto
-6 -71 -35 -18 -25 0 rrcurveto -49 0 -8 51 -14 93 rrcurveto -13 80 rlineto
-18 64 -49 33 -55 19 rrcurveto 97 24 78 61 0 78 rrcurveto 96 -114 84 -147 vhcurveto
-314 hlineto -31 vlineto 24 hlineto 77 2 -11 -36 hvcurveto -527 vlineto
93 4 callsubr 581 callsubr 36 3 71 0 39 0 rrcurveto 39 0 71 0 36 -3 rrcurveto
31 vlineto -24 hlineto -77 -2 11 36 hvcurveto 253 vlineto 115 hlineto 16 0 42 0 35 -34 rrcurveto
38 -36 0 -31 0 -67 rrcurveto 0 -65 0 -40 41 -38 rrcurveto 94 4 callsubr
643 callsubr closepath endchar } |-

Before the glyphs there is an array of subroutines called Subrs in the font file. These are called from the glyph definitions by callsubr. Subroutine 4:

dup 4 ## -| { 1 3 callother pop callsubr return } |

We can ignore callother. So N 4 callsubr for all intents and purposes can be interpreted as N callsubr.

Subroutine 93 & 94:

   dup 93 ## -| { 0 31 hstem 331 22 hstem 35 103 vstem 138 86 vstem 224 103 vstem
return } |
   dup 94 ## -| { -22 22 hstem 331 22 hstem 652 31 hstem 35 103 vstem 138 86 vstem
224 103 vstem return } |

We can ignore these settings of the stems as well. They are important for hinting, not for the contours. All the N 4 callsubr calls that I’ve found concern hinting. So these can generally be ignored.

Subroutine 581:

dup 581 ## -| { 574 callsubr -24 hlineto -31 vlineto return } |
dup 574 ## -| { 603 callsubr return } |
dup 603 ## -| { -36 -2 -11 -77 vhcurveto return } |

Subroutine 643:

dup 643 ## -| { 41 -36 55 -6 30 0 rrcurveto 78 17 82 28 hvcurveto closepath
-225 415 rmoveto -69 -24 -81 -148 vhcurveto -111 hlineto 259 vlineto
0 23 0 12 22 3 rrcurveto 10 2 29 0 20 0 rrcurveto
90 112 -4 -145 hvcurveto return } |

Replacing all the subroutines and ignoring the stem info, we get for R:

0 736 hsbw 732 88 rmoveto 6 0 11 -13 vhcurveto -11 0 0 -9 -1 -7 rrcurveto
-6 -71 -35 -18 -25 0 rrcurveto -49 0 -8 51 -14 93 rrcurveto
-13 80 rlineto -18 64 -49 33 -55 19 rrcurveto 97 24 78 61 0 78 rrcurveto
96 -114 84 -147 vhcurveto -314 hlineto -31 vlineto 24 hlineto
77 2 -11 -36 hvcurveto -527 vlineto -36 -2 -11 -77 vhcurveto -24 hlineto
-31 vlineto 36 3 71 0 39 0 rrcurveto 39 0 71 0 36 -3 rrcurveto 31 vlineto
-24 hlineto -77 -2 11 36 hvcurveto 253 vlineto 115 hlineto
16 0 42 0 35 -34 rrcurveto 38 -36 0 -31 0 -67 rrcurveto
0 -65 0 -40 41 -38 rrcurveto 41 -36 55 -6 30 0 rrcurveto
78 17 82 28 hvcurveto closepath
-225 415 rmoveto -69 -24 -81 -148 vhcurveto -111 hlineto 259 vlineto
0 23 0 12 22 3 rrcurveto 10 2 29 0 20 0 rrcurveto 90 112 -4 -145 hvcurveto
closepath fill

Similar expansions can be done for F:

0 653 hsbw 610 455 rmoveto -28 225 rlineto -549 hlineto -31 vlineto
24 hlineto 77 2 -11 -36 hvcurveto -524 vlineto -36 -2 -11 -77 vhcurveto
-24 hlineto -31 vlineto 35 3 78 0 39 0 rrcurveto 41 0 91 0 36 -3 rrcurveto
31 vlineto -33 hlineto -95 0 13 35 hvcurveto 246 vlineto 86 hlineto
96 10 -32 -85 hvcurveto 25 hlineto 265 vlineto -25 hlineto
-84 -10 -33 -96 vhcurveto -86 hlineto 253 vlineto 33 2 7 47 vhcurveto
120 hlineto 150 0 25 -56 16 -138 rrcurveto
closepath

And finally S:

0 556 hsbw 499 186 rmoveto 0 100 -66 82 -84 20 rrcurveto -128 31 rlineto
-62 15 -39 54 0 58 rrcurveto 70 54 61 78 vhcurveto 167 0 22 -164 6 -45 rrcurveto
1 -6 0 -6 11 0 rrcurveto 13 0 5 19 hvcurveto 201 vlineto 17 0 7 -11 vhcurveto
-7 0 -1 -1 -7 -12 rrcurveto -35 -57 rlineto -30 29 -41 41 -89 0 rrcurveto
-111 -84 -88 -106 hvcurveto 0 -83 53 -73 78 -27 rrcurveto
11 -4 51 -12 70 -17 rrcurveto 27 -7 30 -7 28 -37 rrcurveto
21 -26 10 -33 0 -33 rrcurveto -71 -50 -72 -84 vhcurveto
-29 0 -76 5 -53 49 rrcurveto -58 54 -3 64 -1 36 rrcurveto
-1 10 -8 0 -3 0 rrcurveto -13 0 -7 -18 hvcurveto
-200 vlineto -17 0 -7 11 vhcurveto 7 0 1 2 7 11 rrcurveto
0 0 3 4 33 53 rrcurveto 31 -34 64 -36 89 0 rrcurveto 117 80 98 110 hvcurveto
closepath

To see what these commands means, we need to look in the Type 1 font format (PDF) documentation.

  • sbx wx hsbw sets the sidebearing point (first parameter) and the character width (second parameter). Sets the current point to (sbx, 0).
  • dx dy rmoveto relative moveto, like in PostScript.
  • dx hmoveto equivalent to dx 0 rmoveto.
  • dy vmoveto equivalent to 0 dy rmoveto.
  • dx dy rlineeto relative lineto, like in PostScript.
  • dx hlineto equivalent to dx 0 rlineto.
  • dy vlineto equivalent to 0 dy rlineto.
  • dy1 dx2 dy2 dx3 vhcurveto equivalent to 0 dy1 dx2 dy2 dx3 0 rrcurveto.
  • dx1 dx2 dy2 dx3 hvcurveto equivalent to dx1 0 dx2 dy2 0 dy3 rrcurveto.
  • dx1 dy1 dx2 dy2 dx3 dy3 rrcurveto relative rcurveto. Equivalent to dx1 dy1 (dx1+dx2) (dy1+dy2) (dx1+dx2+dx3) (dy1+dy2+dy3) rcurveto in PostScript.

I wrote a Python program to translate these calls to standard absolute postscript drawing commands. The commands to translate are in the txt variable, one command per line.

cmds = [ln.split() for ln in txt.splitlines() if not ln.startswith("%")]
icmds = [[int(j) for j in cmd[:-1]] + [cmd[-1]] for cmd in cmds]

cpx, cpy = 0, 0
print("%!PS-Adobe-3.0")
print("0 23 translate")
for n in range(len(cmds)):
   var = icmds[n][-1]
   if var in ("hmoveto", "hlineto"):
      cpx += icmds[n][0]
      cpy += 0
      icmds[n] = [cpx, cpy, var[1:]]
   elif var in ("vmoveto", "vlineto"):
      cpx += 0
      cpy += icmds[n][0]
      icmds[n] = [cpx, cpy, var[1:]]
   elif var in ("rmoveto", "rlineto"):
      cpx += icmds[n][0]
      cpy += icmds[n][1]
      icmds[n] = [cpx, cpy, var[1:]]
   elif var == "vhcurveto":
      dy1, dx2, dy2, dx3 = icmds[n][:-1]
      dx1, dy3 = 0, 0
      icmds[n] = [
            cpx+dx1, cpy+dy1,
            cpx+(dx1+dx2), cpy+(dy1+dy2),
            cpx+(dx1+dx2+dx3), cpy+(dy1+dy2+dy3),
            "curveto"
      ]
      cpx, cpy = icmds[n][4], icmds[n][5]
   elif var == "hvcurveto":
      dx1, dx2, dy2, dy3 = icmds[n][:-1]
      dy1, dx3 = 0, 0
      icmds[n] = [
            cpx+dx1, cpy+dy1,
            cpx+(dx1+dx2), cpy+(dy1+dy2),
            cpx+(dx1+dx2+dx3), cpy+(dy1+dy2+dy3),
            "curveto"
      ]
      cpx, cpy = icmds[n][4], icmds[n][5]
   elif var == "rrcurveto":
      dx1, dy1, dx2, dy2, dx3, dy3 = icmds[n][:-1]
      icmds[n] = [
            cpx+dx1, cpy+dy1,
            cpx+(dx1+dx2), cpy+(dy1+dy2),
            cpx+(dx1+dx2+dx3), cpy+(dy1+dy2+dy3),
            "curveto"
      ]
      cpx, cpy = icmds[n][4], icmds[n][5]
   elif var == "hsbw":
      sbx = icmds[n][0]
      icmds[n] = [sbx, 0, "moveto"]
      cpx, cpy = sbx, 0
   elif var == "closepath":
      if n == len(cmds)-1:
         icmds[n] = ["closepath", "fill", "showpage"]
      else:
         icmds[n] = ["closepath"]
   icmds[n] = [str(j) for j in icmds[n]]
   print(" ".join(icmds[n]))

The generated postscripts only needs to have the BoundingBox added. Both the R and S descend below the baseline slightly, so I had to insert 0 23 translate as the first command in both of them to make the whole character visible.

The resulting output when written to a file can be viewed with a PostScript viewer like e.g. gv.

For inclusion in thiw webpage they were rendered as PNG files as follows:

gs -q -dDEVICEWIDTHPOINTS=750 -dDEVICEHEIGHTPOINTS=800
-sDEVICE=pngalpha -o R.png R.eps

The glyphs are defined on a grid of 1000x1000. But since these characters are not that large, I reduced the device width and height.

This is what the rendered characters look like:

Latin Modern Roman R Latin Modern Roman F Latin Modern Roman S

For comments, please send me an e-mail.


Related articles


←  Using sed Inplace editing  →