libb64: Base64 Encoding/Decoding Routines

Overview:

libb64 is a library of ANSI C routines for fast encoding/decoding data into and from a base64-encoded format. C++ wrappers are included, as well as the source code for standalone encoding and decoding executables.

Base64 uses a subset of displayable ASCII characters, and is therefore a useful encoding for storing binary data in a text file, such as XML, or sending binary data over text-only email.

Sections: References | Why? | License | Download | Building and Installing | Command-line Use | Programming | Implementation Details | Author(s) | Acknowledgement

References:

Why another implementation?

I did this because I needed a fast C++ implementation of base64 encoding and decoding, without any licensing problems. Some implementations are released under either the GNU/GPL, or a BSD-variant, which is not what I require. Some decent ones are available in Java, but as mentioned, I need C++ code.

The available code is mostly slow as well, and complicated to understand, use and maintain. Base64 encoding and decoding is ideally suited to be implemented using co-routines, which make the code more compact, easier to read, and easier to use.

Also, the chance to actually use a co-routine implementation in production C++ code is rare; I couldn't pass up the chance. For more information on this technique, see "Coroutines in C", by Simon Tatham, which can be found on-line here: http://www.chiark.greenend.org.uk/~sgtatham/coroutines.html and read the Implementation Details section at the bottom of this page.

So then, under which license do I release this code? On to the next section...

Download:

Download via the project page: https://sourceforge.net/projects/libb64/files
The latest release is libb64-1.2.src.zip.

Access via CVS is also possible. Info here: http://sourceforge.net/cvs/?group_id=152942

License:

This work is released into the Public Domain.

It basically boils down to this: I put this work in the public domain, and you can take it and do whatever you want with it. An example of this "license" is the Creative Commons Public Domain License, a copy of which can be found in the LICENSE file in the distribution, and also on-line at http://creativecommons.org/licenses/publicdomain/

Building and Installing

The code can be built using standard make and gcc tools in Linux (and others such as MinGW and Cygwin), or using a Visual Studio solution.

Currently, no install-script is available, but since the library is static, and the executable is standalone, this is not a big problem: Simply copy the executable somewhere in your path. For usage of the base64 executable, see the next section, Command-line Use, for details. The static library may be used along with the header files in the include directory to embed and use the functionality in your own application (see the Programming section for more details.)

Building using GNU tools

Unpack the tarball somewhere and change into the directory. Change into the resulting libb64-xxx directory and run make. The results will be a static library in the src directory, and an executable file in the base64 directory.

Building using Microsoft Visual Studio C++ Express 2010

Not that I like doing it, but it is possible to build the code using MS VCC++ Express 2010, using the base64.sln solution file in the base64/VisualStudioProject sub-directory. This will only build the base64 executable, putting it in the relevant build directory.

Command-line Use:

Running the base64 program with no parameters shows how to use it:

$ ./base64
base64: Encodes and Decodes files using base64
Usage: base64 [-e|-d] [input] [output]
   Where [-e] will encode the input file into the output file,
         [-d] will decode the input file into the output file, and
         [input] and [output] are the input and output files, respectively.
As an example, to encode a file, do the following:
$ ./base64 -e file_a file_b
file_b will now be the BASE64 encoded version of file_a. Similiarly, doing the following:
$ ./base64 -d file_b file_c
will result in file_c, which will be identical to the original file_a.

Programming:

Some C++ wrappers are provided as well, so you don't have to get your hands dirty. Encoding from standard input to standard output is as simple as

	#include 
	#include 
	int main()
	{
		base64::encoder E;
		E.encode(std::cin, std::cout);
		return 0;
	}

Implementation Details:

It is DAMN fast, if I may say so myself. The C code uses a little trick which has been used to implement coroutines, of which one can say that this implementation is an example.

(To see how the libb64 codebase compares with some other BASE64 implementations available, see the BENCHMARKS file available in the source code)

The trick involves the fact that a switch-statement may legally cross into sub-blocks. A very thorough and enlightening essay on co-routines in C, using this method, can be found in the above mentioned "Coroutines in C", by Simon Tatham: http://www.chiark.greenend.org.uk/~sgtatham/coroutines.html

For example, an RLE decompressing routine, adapted from the article:

1	static int STATE = 0;
2	static int len, c;
3	switch (STATE)
4	{
5		while (1)
6		{
7			c = getchar();
8			if (c == EOF) return EOF;
9			if (c == 0xFF) {
10				len = getchar();
11				c = getchar();
12				while (len--)
13				{
14					STATE = 0;
15					return c;
16	case 0:
17				}
18			} else
19				STATE = 1;
20				return c;
21	case 1:
22			}
23		}
24	}

As can be seen from this example, a coroutine depends on a state variable, which it sets directly before exiting (lines 14 and 119). The next time the routine is entered, the switch moves control to the specific point directly after the previous exit (lines 16 and 21).hands

(As an aside, in the mentioned article the combination of the top-level switch, the various setting of the state, the return of a value, and the labeling of the exit point is wrapped in #define macros, making the structure of the routine even clearer. Read the article, it's worth it)

The obvious problem with any such routine is the static keyword. Any static variables in a function spell doom for multi-threaded applications. Also, in situations where this coroutine is used by more than one other coroutines, the consistency may be... disturbed.

What is needed is a structure for storing these variables, which is passed to the routine separately. This obviously breaks the modularity of the function, since now the caller has to worry about and care for the internal state of the routine (the callee). The obvious solution would be to wrap the state along with the function into an object in C++.

This allows for a fast, multi-thread-able implementation, which is safe to use and easy to understand and maintain.

The base64 encoding and decoding functionality in this package is implemented in exactly this way, providing both a high-speed high-maintenance C interface, and a wrapped C++ which is low-maintenance and of comparable performance.

Author(s)

Chris Venter : chris.venter[anti-spam]gmail.com : http://man9.wordpress.com

Acknowledgement

Many thanks to Source Forge for hosting this project.