[pve-devel] [RFC PATCH 3/4] add genfont2 and unifont build-dependency

Tue May 23 14:50:40 CEST 2017

On Mon, May 22, 2017 at 06:14:57PM +0200, Dominik Csapak wrote:
> this patch introduces a new tool to generate font data
> it uses a new build-dependency, namely unifont, which is an opensource
> font with printable charachters for the unicode range from
> 0x0000 to 0xFFFF
> 
> we generate a file which is easily mmap'able and has the same bitformat
> we used before (minus the codepoint -> address translation)
> 
> note that some characters are using 2 columns as width, so to have a
> static lookup, we need to make each character 32 byte long
> (1 byte per line, 16 lines per terminal column)
> 
> Signed-off-by: Dominik Csapak <d.csapak at proxmox.com>
> ---
>  Makefile       |   9 +++-
>  debian/control |   1 +
>  genfont2.c     | 131 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 140 insertions(+), 1 deletion(-)
>  create mode 100644 genfont2.c
> 
> diff --git a/Makefile b/Makefile
> index 792b5bd..e981dbb 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -24,6 +24,13 @@ glyphs.h: genfont
>  genfont: genfont.c
>  	gcc -g -O2 -o $@ genfont.c -Wall -D_GNU_SOURCE -lz
>  
> +font.data: genfont2
> +	./genfont2 > font.data.tmp
> +	mv font.data.tmp font.data
> +
> +genfont2: genfont2.c
> +	gcc -g -O2 -o $@ genfont2.c -Wall -D_GNU_SOURCE -lz
> +
>  .PHONY: vnc
>  ${VNCLIB} vnc: ${VNCSRC}
>  	rm -rf ${VNCDIR}
> @@ -70,7 +77,7 @@ upload: ${DEB}
>  
>  .PHONY: clean
>  clean:
> -	rm -rf vncterm vncterm.1 vncterm_*.deb genfont *~ ${VNCDIR} vncterm-*.tar.gz glyph.h.tmp build *.changes
> +	rm -rf vncterm vncterm.1 vncterm_*.deb genfont genfont2 *~ ${VNCDIR} vncterm-*.tar.gz glyph.h.tmp build *.changes font.data.tmp font.data
>  
>  .PHONY: distclean
>  distclean: clean
> diff --git a/debian/control b/debian/control
> index d5e9830..75627f1 100644
> --- a/debian/control
> +++ b/debian/control
> @@ -7,6 +7,7 @@ Build-Depends:
>      libpng-dev,
>      libglib2.0-dev,
>      libgnutls28-dev,
> +    unifont
>  Standards-Version: 3.9.3
>  
>  Package: vncterm
> diff --git a/genfont2.c b/genfont2.c
> new file mode 100644
> index 0000000..234aff2
> --- /dev/null
> +++ b/genfont2.c
> @@ -0,0 +1,131 @@
> +/*
> +
> +     Copyright (C) 2007 Proxmox Server Solutions GmbH
> +
> +     Copyright: vzdump is under GNU GPL, the GNU General Public License.
                   ^~~~~~
*ahem*, I smell copypasta ;-)

> +
> +     This program is free software; you can redistribute it and/or modify
> +     it under the terms of the GNU General Public License as published by
> +     the Free Software Foundation; version 2 dated June, 1991.
> +
> +     This program is distributed in the hope that it will be useful,
> +     but WITHOUT ANY WARRANTY; without even the implied warranty of
> +     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +     GNU General Public License for more details.
> +
> +     You should have received a copy of the GNU General Public License
> +     along with this program; if not, write to the Free Software
> +     Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
> +     02111-1307, USA.
> +
> +     Author: Dominik Csapak <d.csapak at proxmox.com>
> +
> +*/
> +
> +
> +#include <stdio.h>
> +#include <stdlib.h>
> +
> +#define FONTFILE "/usr/share/unifont/unifont.hex"

I'd find it nicer if this was argv[1]. While still specific to our
purpose, it's a nicer interface since all the external paths to files
from other packages are then confined to the Makefile.

> +#define NUMCODEPOINTS 0xFFFF
> +#define MAXLINESIZE 128
> +#define GLYPHLINES 16
> +#define CODEPOINTLENGTH 4

A bit confusing as it's the length of the code point index. 

> +//#define DEBUG
> +
> +void
> +printglyph(char glyph[])

Talking about glyphs and using the type name 'char' is always confusing,
especially since in this case it's referring to the binary data. I find
uint8_t would be nicer (below in main() as well).

> +{
> +    int i;
> +    for(i = 0; i < GLYPHLINES*2;i++) {
> +	printf("%c", glyph[i]);
> +    }

or just fwrite(glyph, GLYPHLINES, 2, stdout);

> +}
> +
> +/* parses strings like 00F0 to the integer */

-the

> +unsigned int
> +parsehex(char val[], short length)

I'm against using anything but `size_t` for lengths and sizes. The type
itself in this case can't represent the limit of how much of data from
val[] you can put into the returned type (uint) anyway (and a `char`
would be closer to that than a short anyway).
(I'd also prefer `cahr *val`)

> +{
> +    unsigned int value = 0;
> +    short i;

(size_t)

> +
> +    for (i = 0; i < length; i++) {
> +	value *= 16;
> +	if (val[i] >= '0' && val[i] <= '9') {
> +	    value += (val[i] - '0');
> +	} else if (val[i] >= 'A' && val[i] <= 'F') {
> +	    value += (val[i] - 'A' + 10);
> +	} else if (val[i] >= 'a' && val[i] <= 'f') {
> +	    value += (val[i] - 'a' + 10);
> +	} else {
> +	    fprintf(stderr, "Malformed Character in Hex Conversion: '%c' (0x%02x)", val[i], val[i]);
> +	    exit(4);
> +	}
> +    }
> +
> +    return value;
> +}
> +
> +int
> +main (int argc, char** argv)
> +{
> +    int i, j, curcodepoint = 0;
> +    FILE* fd;
> +    char *line;

While I do get the whole "fd is an opaque thing" vs "line is a pointer
to a thing" distinction with respect to where the `*` is in the
declaration, I still don't like it.

And please declare loop variables with their respective loops.
(The `+1+j*2` parts inside a loop inside a bigger loop using `i` is a
bit tedious to read. The `i` loop should probably call it `codepoint`
anyway since its body is bigger.)

> +    size_t linesize = sizeof(char)*MAXLINESIZE;
> +    char emptyglyph[GLYPHLINES*2] = { 0 };
> +    char glyph[GLYPHLINES*2] = "";
> +
> +    fd = fopen(FONTFILE, "r");
> +    if (fd == NULL) {
> +	perror("Error opening FONTFILE");

The macro doesn't expand here ;-)
(But this will have to become an fprintf() anyway to replace the
FONTFILE macro with argv[1]).

> +	exit(1);
> +    }
> +
> +    line = (char*)malloc(sizeof(char)*MAXLINESIZE);

Either use linesize as parameter here (to avoid needing to keep 2
calculations in sync), or just initialize them to NULL and 0, since
getline() will take care of this anyway.

> +
> +    if (getline(&line, &linesize, fd) == -1) {
> +	perror("Error reading Line");
> +	exit(2);
> +    }
> +    curcodepoint = parsehex(line, CODEPOINTLENGTH);
> +    for (i = 0; i < NUMCODEPOINTS; i++) {
> +	if (i < curcodepoint) {
> +#ifdef DEBUG
> +	    fprintf(stderr, "Printing emptyglyph for codepoint %d\n", i);
> +#endif
> +	    printglyph(emptyglyph);
> +	    continue;
> +	}
> +
> +#ifdef DEBUG
> +	    fprintf(stderr, "Parsing codepoint %d\n", i);
> +#endif
> +
> +	/* if we have a wide char */
> +	if (*(line+CODEPOINTLENGTH+33) != '\n') {
> +	    for (j = 0; j < GLYPHLINES*2; j++) {
> +		glyph[j] = parsehex(line+CODEPOINTLENGTH+1+j*2, 2);
> +	    }
> +	} else {
> +	    for (j = 0; j < GLYPHLINES; j++) {
> +		glyph[j] = parsehex(line+CODEPOINTLENGTH+1+j*2, 2);
> +		glyph[j+GLYPHLINES] = 0;
> +	    }
> +	}

Reading and really seeing the point of pointers moved like
`*(line+CODEPOINTLENGTH+33)` and `line+CODEPOINTLENGTH+1+j*2` is rather
cumbersome. When parsing it's definitely nicer to actually move the
current location further when done with a piece of data.
And seeing how the right half of a wide character is really the second
"block" of data in glyph[], this could also be written as:

    char *tmp = line + CODEPOINTLENGTH + 1; // or previously initialized parsing the codepoint index
    size_t j = 0;
    while (*tmp && j < sizeof(glyph))
        glyph[j++] = parsehex(tmp+j*2, 2);
    while (j < sizeof(glyph))
        glyph[j++] = 0;

(Also note the sizeof(glyph) vs GLYPHLINES*2, similar to the malloc()
calculation in the previous comment)

> +
> +#ifdef DEBUG
> +	    fprintf(stderr, "Printing glyph for codepoint %d\n", i);
> +#endif
> +	printglyph(glyph);
> +
> +	if (getline(&line, &linesize, fd) == -1) {

(Could/should check errno?)

It might be easier to read if you use a more concise loop form. Rather
than initially getting the first line and parsing half of it to get the
first codepoint index before starting the actual loop, then repeating
those steps at the end of the loop before jumping back up, maybe use:
    nextchar = 0;
    while (getline != -1) {
        codepoint = parse();
        while (nextchar++ < codepoint)
            print(empty);
        parse and print glyph
        nextchar = codepoint+1;
    }
    check errno
    while (++nextchar < 0x10000)
        print(empty)

In this case the repeated step is the printing of empty glyphs, but that
makes somewhat sense in that it literally reads as "fill the rest with
empty glyphs".

> +	    /* either read error or EOF */
> +	    curcodepoint = NUMCODEPOINTS+1;
> +	} else {
> +	    curcodepoint = parsehex(line, CODEPOINTLENGTH);
> +	}
> +    }
> +

While not strictly necessary in a oneshot tool, I would prefer to see a
`free(line)` here. (I'm also not too happy about the exit(4) in the hex
parsing due to this, but meh...)

> +    exit(0);
> +}
> -- 
> 2.11.0