Writing your own Convert.ToBase64String in C#

Have you ever wondered what Base64 is? How it works? Why you need it? Have you ever wanted to write your own Base64 encoder? Well, my friend, you are in luck because that’s what we’re talking about today. To get started…

What is Base64?

Base64 is a common way to convert binary data into a text form. This is commonly used to store and transfer data over media that was designed to store and transfer only text, such as including an image in an XML document.

It works by converting the data into a base-64 representation and displaying it using a common character set. The most common character set used is A-Z, a-z, 0-9, + and /, although different implementations can use different character sets. The goal is to use a common set of characters that can be represented in most encoding schemes. Here’s the index table of the most common set:

IndexCharacter
0A
1B
2C
3D
4E
5F
6G
7H
8I
9J
10K
11L
12M
13N
14O
15P
16Q
17R
18S
19T
20U
21V
22W
23X
24Y
25Z
26a
27b
28c
29d
30e
31f
32g
33h
34i
35j
36k
37l
38m
39n
40o
41p
42q
43r
44s
45t
46u
47v
48w
49x
50y
51z
520
531
542
553
564
575
586
597
608
619
62+
63/
## How does it work?

It works by grouping the bits of the data into chunks 24 bits, treating those as 4 chunks of 6 bits (sextets), converting each sextet into base10 and looking up the corresponding character for that decimal number. A single 24 bit string is represented by 4 encoded characters.

For instance, to start encoding the first 3 characters of my name we first have to convert the letters into bytes, and the bytes into bits. In this instance, we’ll say the characters are encoded in ASCII. The byte representations for Dav are:

D: 68
a: 97
v: 118

Those numbers, written in 8 bit binary, are 01000100, 01100001, and 01110110 respectively. Group those together to form a 24 bit string and you get 010001000110000101110110.

Next, grab 4 sextets of bits, convert those to decimal and look up the corresponding character in the index table. 010001 is 17, 000110 is 6, 000101 is 5, and 110110 is 54. Looking those up in the index table gives the string RGF2. We just converted to Base64! Hooray!

Padding

But wait… we have a problem. What happens when the data we want to represent isn’t divisible by three and our last grouping doesn’t have 24 bits?

This is where padding comes in. When we lack 1 or 2 octects out of our 24 bit string, we need to pad the end of the base64 string with =. To extend our previous example, let’s encode my entire first name (Dave if you already forgot…). We know that Dav is encoded as RGF2 so we just need to encode the last letter, e.

e as a byte is 101, which is 01100101 in binary. If we attempt to get our sextet groupings out of that, we get 011001 and 01. Huh. That last sextet is missing a few bits.

What we need to do is pad the last sextet with 0 and note that we have 2 octects missing. That leaves us with 011001 and 010000, which are 25 and 16, which are Z and Q. Our final string, padded with = for the two missing octets, is RGF2ZQ==.

Writing your own encoder

First, a disclaimer. What we’re writing here is for educational purposes. It’s slow, unoptimized and pretty useless considering .NET comes with a respectable Base64 converter. This is a learning exercise.

The existing Convert.ToBase64String method in the System namespace takes a byte[] as a parameter and returns a string. Here’s the full method signature:

public static string ToBase64String(
    byte[] inArray
)

We’re going to write our own implementation of this method:

namespace MyBase64Converter
{
    public static string ToBase64String(byte[] inArray)
    {
        //Converter code goes here
    }
}

The good part about the method taking a byte[] parameter is that part of the work is already done for you – getting the byte representation of your data. From there, we need to convert each byte into it’s 8-bit binary representation. We could use one of the Convert.ToString() overloads in .NET, or we could use the one we wrote ourselves! We’re using the PadLeft method after our call to IntToBinaryString to ensure the binary string is a full 8-bits.

namespace MyBase64Converter
{
    public static string ToBase64String(byte[] inArray)
    {
        var bits = string.Empty;
        for(var i = 0; i < inArray.Length; i++)
        {
            bits += IntToBinaryString(inArray[i]).PadLeft(8, "0");
        }
    }
}

Now that we have our data represented as binary, we need to grab 24-bit chunks at a time. We’ll make use of the Skip and Take methods in LINQ to accomplish this.

string base64 = string.Empty;

const byte threeOctets = 8 * 3;
var octetsTaken = 0;
while(octetsTaken < bits.Length)
{
    var currentOctects = bits.Skip(octetsTaken).Take(threeOctets).ToList();

    // More code here

    octetsTaken += threeOctets;
}

Note that we loop while octectsTaken is less than the length. This will allow us to loop through the end of the string, regardless of whether or not we have full 24 bit chunks.

Next we go sextet by sextet, convert the binary to a byte and look it up in the table. We're making use of another LINQ method, Aggregate, which is basically a fancy way of joining the bits into a string again.

const byte sixBits = 6;
int hextetsTaken = 0;
while(hextetsTaken < currentOctects.Count())
{
    var chunk = currentOctects.Skip(hextetsTaken).Take(sixBits);
    hextetsTaken += sixBits;

    var bitString = chunk.Aggregate(string.Empty, (current, currentBit) => current + currentBit);

    if (bitString.Length < 6)
    {
        //This happens when we need to pad
        bitString = bitString.PadRight(6, '0');
    }
    var singleInt = Convert.ToInt32(bitString, 2);

    base64 += Base64Letters[singleInt];
}

Great! Finally, we'll check if we need to pad the end with =. If we check the remainder of the length of the full bit string divided by 3, that will tell us how many padding characters are required.

// Pad with = for however many octects we have left
for (var i = 0; i < (bits.Length % 3); i++)
{
    base64 += "=";
}

Below is the full code, including the index table for the base64 characters.

private static string Base64Encode(string s)
{
    var bits = string.Empty;
    foreach (var character in s)
    {
        bits += Convert.ToString(character, 2).PadLeft(8, '0');
    }

    string base64 = string.Empty;

    const byte threeOctets = 24;
    var octetsTaken = 0;
    while(octetsTaken < bits.Length)
    {
        var currentOctects = bits.Skip(octetsTaken).Take(threeOctets).ToList();

        const byte sixBits = 6;
        int hextetsTaken = 0;
        while(hextetsTaken < currentOctects.Count())
        {
            var chunk = currentOctects.Skip(hextetsTaken).Take(sixBits);
            hextetsTaken += sixBits;

            var bitString = chunk.Aggregate(string.Empty, (current, currentBit) => current + currentBit);

            if (bitString.Length < 6)
            {
                bitString = bitString.PadRight(6, '0');
            }
            var singleInt = Convert.ToInt32(bitString, 2);

            base64 += Base64Letters[singleInt];
        }

        octetsTaken += threeOctets;
    }

    // Pad with = for however many octects we have left
    for (var i = 0; i < (bits.Length % 3); i++)
    {
        base64 += "=";
    }

    return base64;
}

private static readonly char[] Base64Letters = new[]
                                        {
                                              'A'
                                            , 'B'
                                            , 'C'
                                            , 'D'
                                            , 'E'
                                            , 'F'
                                            , 'G'
                                            , 'H'
                                            , 'I'
                                            , 'J'
                                            , 'K'
                                            , 'L'
                                            , 'M'
                                            , 'N'
                                            , 'O'
                                            , 'P'
                                            , 'Q'
                                            , 'R'
                                            , 'S'
                                            , 'T'
                                            , 'U'
                                            , 'V'
                                            , 'W'
                                            , 'X'
                                            , 'Y'
                                            , 'Z'
                                            , 'a'
                                            , 'b'
                                            , 'c'
                                            , 'd'
                                            , 'e'
                                            , 'f'
                                            , 'g'
                                            , 'h'
                                            , 'i'
                                            , 'j'
                                            , 'k'
                                            , 'l'
                                            , 'm'
                                            , 'n'
                                            , 'o'
                                            , 'p'
                                            , 'q'
                                            , 'r'
                                            , 's'
                                            , 't'
                                            , 'u'
                                            , 'v'
                                            , 'w'
                                            , 'x'
                                            , 'y'
                                            , 'z'
                                            , '0'
                                            , '1'
                                            , '2'
                                            , '3'
                                            , '4'
                                            , '5'
                                            , '6'
                                            , '7'
                                            , '8'
                                            , '9'
                                            , '+'
                                            , '/'
                                        };
}