libb64: Base64 Encoding/Decoding Routines

Overview:

libb64 is a library of ANSI C routines for fast encoding/decoding data into and from a base64-encoded format. C++ wrappers are included, as well as the source code for standalone encoding and decoding executables.

Base64 uses a subset of displayable ASCII characters, and is therefore a useful encoding for storing binary data in a text file, such as XML, or sending binary data over text-only email.

Sections: References | Why? | License | Download | Building and Installing | Command-line Use | Programming | Implementation Details | Author(s) | Acknowledgement

References:

Why another implementation?

I did this because I needed a fast C++ implementation of base64 encoding and decoding, without any licensing problems. Some implementations are released under either the GNU/GPL, or a BSD-variant, which is not what I require. Some decent ones are available in Java, but as mentioned, I need C++ code.

The available code is mostly slow as well, and complicated to understand, use and maintain. Base64 encoding and decoding is ideally suited to be implemented using co-routines, which make the code more compact, easier to read, and easier to use.

Also, the chance to actually use a co-routine implementation in production C++ code is rare; I couldn't pass up the chance. For more information on this technique, see "Coroutines in C", by Simon Tatham, which can be found on-line here: http://www.chiark.greenend.org.uk/~sgtatham/coroutines.html and read the Implementation Details section at the bottom of this page.

So then, under which license do I release this code? On to the next section...

Download:

Download via the project page: https://sourceforge.net/project/showfiles.php?group_id=152942
The latest release is libb64-1.1.src.tar.gz .

Access via CVS is also possible. Info here: http://sourceforge.net/cvs/?group_id=152942

License:

This work is released into the Public Domain.

It basically boils down to this: I put this work in the public domain, and you can take it and do whatever you want with it. An example of this "license" is the Creative Commons Public Domain License, a copy of which can be found in the LICENSE file in the distribution, and also on-line at http://creativecommons.org/licenses/publicdomain/

Building and Installing

Unpack the tarball somewhere and change into the directory. Change into the resulting libb64-xxx directory and run make. The results will be 4 executable files in the src directory (see the next section, Command-line Use, for details), along with a static library.

The static library may be used along with the header files in the include directory to embed and use the functionality in your own application (see the Programming section for more details.)

Currently, no install-script is available, but since the library is static, and the executables are standalone, this is not a big problem. Simply copy the executables somewhere in your path.

Command-line Use:

Two pairs of executables are available:

Both pairs function in the same way: the encoding half accepts data from the standard input, and spits out the base64-encoded data to standard output. The decoding half does the reverse: it accepts base64-encoded data on standard input, and spits out the plain data on standard output.

The only difference between the two pairs is that the b64enc/b64dec pair are C, and use the C routines directly, while the encoder/decoder pair are C++, and use the wrapper objects. This allows for speed comparisons between the C and C++ code bases.

This allows for direct use, as well as easy integration into a piped command.

For example, to encode file, run

$ cat file | ./encode > file.txt

and to decode the text into the original file

$ cat file.txt | ./decode > file2

file and file2 will be 100% identical, while file.txt will be a 100% printable ASCII file.

Programming:

Some C++ wrappers are provided as well, so you don't have to get your hands dirty. Encoding from standard input to standard output is as simple as

	#include 
	#include 
	int main()
	{
		base64::encoder E;
		E.encode(std::cin, std::cout);
		return 0;
	}

Both standalone executables and a static library is provided in the package,

Implementation Details:

It is DAMN fast, if I may say so myself. The C code uses a little trick which has been used to implement coroutines, of which one can say that this implementation is an example.

The trick involves the fact that a switch-statement may legally cross into sub-blocks. A very thorough and enlightening essay on co-routines in C, using this method, can be found in the above mentioned "Coroutines in C", by Simon Tatham: http://www.chiark.greenend.org.uk/~sgtatham/coroutines.html

For example, an RLE decompressing routine, adapted from the article:

1	static int STATE = 0;
2	static int len, c;
3	switch (STATE)
4	{
5		while (1)
6		{
7			c = getchar();
8			if (c == EOF) return EOF;
9			if (c == 0xFF) {
10				len = getchar();
11				c = getchar();
12				while (len--)
13				{
14					STATE = 0;
15					return c;
16	case 0:
17				}
18			} else
19				STATE = 1;
20				return c;
21	case 1:
22			}
23		}
24	}

As can be seen from this example, a coroutine depends on a state variable, which it sets directly before exiting (lines 14 and 119). The next time the routine is entered, the switch moves control to the specific point directly after the previous exit (lines 16 and 21).hands

(As an aside, in the mentioned article the combination of the top-level switch, the various setting of the state, the return of a value, and the labeling of the exit point is wrapped in #define macros, making the structure of the routine even clearer. Read the article, it's worth it)

The obvious problem with any such routine is the static keyword. Any static variables in a function spell doom for multi-threaded applications. Also, in situations where this coroutine is used by more than one other coroutines, the consistency may be... disturbed.

What is needed is a structure for storing these variables, which is passed to the routine separately. This obviously breaks the modularity of the function, since now the caller has to worry about and care for the internal state of the routine (the callee). The obvious solution would be to wrap the state along with the function into an object in C++.

This allows for a fast, multi-thread-able implementation, which is safe to use and easy to understand and maintain.

The base64 encoding and decoding functionality in this package is implemented in exactly this way, providing both a high-speed high-maintenance C interface, and a wrapped C++ which is low-maintenance and of comparable performance.

Author(s)

Chris Venter : chris.venter[anti-spam]gmail.com : http://rocketpod.blogspot.com

Acknowledgement

Many thanks to Source Forge for hosting this project.