Writing your own Convert.ToBase64String in C#
Have you ever wondered what Base64 is? How it works? Why you need it? Have you ever wanted to write your own Base64 encoder? Well, my friend, you are in luck because that’s what we’re talking about today. To get started…
What is Base64?
Base64
is a common way to convert binary data into a text form. This is commonly used to store and transfer data over media that was designed to store and transfer only text, such as including an image in an XML document.
It works by converting the data into a base-64 representation and displaying it using a common character set. The most common character set used is A-Z
, a-z
, 0-9
, +
and /
, although different implementations can use different character sets. The goal is to use a common set of characters that can be represented in most encoding schemes. Here’s the index table of the most common set:
Index | Character |
---|---|
0 | A |
1 | B |
2 | C |
3 | D |
4 | E |
5 | F |
6 | G |
7 | H |
8 | I |
9 | J |
10 | K |
11 | L |
12 | M |
13 | N |
14 | O |
15 | P |
16 | Q |
17 | R |
18 | S |
19 | T |
20 | U |
21 | V |
22 | W |
23 | X |
24 | Y |
25 | Z |
26 | a |
27 | b |
28 | c |
29 | d |
30 | e |
31 | f |
32 | g |
33 | h |
34 | i |
35 | j |
36 | k |
37 | l |
38 | m |
39 | n |
40 | o |
41 | p |
42 | q |
43 | r |
44 | s |
45 | t |
46 | u |
47 | v |
48 | w |
49 | x |
50 | y |
51 | z |
52 | 0 |
53 | 1 |
54 | 2 |
55 | 3 |
56 | 4 |
57 | 5 |
58 | 6 |
59 | 7 |
60 | 8 |
61 | 9 |
62 | + |
63 | / |
It works by grouping the bits of the data into chunks 24 bits, treating those as 4 chunks of 6 bits (sextets), converting each sextet into base10
and looking up the corresponding character for that decimal number. A single 24 bit string is represented by 4 encoded characters.
For instance, to start encoding the first 3 characters of my name we first have to convert the letters into bytes, and the bytes into bits. In this instance, we’ll say the characters are encoded in ASCII
. The byte representations for Dav
are:
D: 68
a: 97
v: 118
Those numbers, written in 8 bit binary, are 01000100
, 01100001
, and 01110110
respectively. Group those together to form a 24 bit string and you get 010001000110000101110110
.
Next, grab 4 sextets of bits, convert those to decimal and look up the corresponding character in the index table. 010001
is 17
, 000110
is 6
, 000101
is 5
, and 110110
is 54
. Looking those up in the index table gives the string RGF2
. We just converted to Base64
! Hooray!
Padding
But wait… we have a problem. What happens when the data we want to represent isn’t divisible by three and our last grouping doesn’t have 24 bits?
This is where padding comes in. When we lack 1 or 2 octects out of our 24 bit string, we need to pad the end of the base64 string with =
. To extend our previous example, let’s encode my entire first name (Dave if you already forgot…). We know that Dav
is encoded as RGF2
so we just need to encode the last letter, e
.
e
as a byte is 101
, which is 01100101
in binary. If we attempt to get our sextet groupings out of that, we get 011001
and 01
. Huh. That last sextet is missing a few bits.
What we need to do is pad the last sextet with 0
and note that we have 2 octects missing. That leaves us with 011001
and 010000
, which are 25
and 16
, which are Z
and Q
. Our final string, padded with =
for the two missing octets, is RGF2ZQ==
.
Writing your own encoder
First, a disclaimer. What we’re writing here is for educational purposes. It’s slow, unoptimized and pretty useless considering .NET comes with a respectable Base64
converter. This is a learning exercise.
The existing Convert.ToBase64String
method in the System
namespace takes a byte[]
as a parameter and returns a string. Here’s the full method signature:
public static string ToBase64String(
byte[] inArray
)
We’re going to write our own implementation of this method:
namespace MyBase64Converter
{
public static string ToBase64String(byte[] inArray)
{
//Converter code goes here
}
}
The good part about the method taking a byte[]
parameter is that part of the work is already done for you – getting the byte
representation of your data. From there, we need to convert each byte
into it’s 8-bit binary representation. We could use one of the Convert.ToString()
overloads in .NET, or we could use the one we wrote ourselves! We’re using the PadLeft
method after our call to IntToBinaryString
to ensure the binary string is a full 8-bits.
namespace MyBase64Converter
{
public static string ToBase64String(byte[] inArray)
{
var bits = string.Empty;
for(var i = 0; i < inArray.Length; i++)
{
bits += IntToBinaryString(inArray[i]).PadLeft(8, "0");
}
}
}
Now that we have our data represented as binary, we need to grab 24-bit chunks at a time. We’ll make use of the Skip
and Take
methods in LINQ
to accomplish this.
string base64 = string.Empty;
const byte threeOctets = 8 * 3;
var octetsTaken = 0;
while(octetsTaken < bits.Length)
{
var currentOctects = bits.Skip(octetsTaken).Take(threeOctets).ToList();
// More code here
octetsTaken += threeOctets;
}
Note that we loop while octectsTaken
is less than the length. This will allow us to loop through the end of the string, regardless of whether or not we have full 24 bit chunks.
Next we go sextet by sextet, convert the binary to a byte and look it up in the table. We're making use of another LINQ
method, Aggregate
, which is basically a fancy way of joining the bits into a string again.
const byte sixBits = 6;
int hextetsTaken = 0;
while(hextetsTaken < currentOctects.Count())
{
var chunk = currentOctects.Skip(hextetsTaken).Take(sixBits);
hextetsTaken += sixBits;
var bitString = chunk.Aggregate(string.Empty, (current, currentBit) => current + currentBit);
if (bitString.Length < 6)
{
//This happens when we need to pad
bitString = bitString.PadRight(6, '0');
}
var singleInt = Convert.ToInt32(bitString, 2);
base64 += Base64Letters[singleInt];
}
Great! Finally, we'll check if we need to pad the end with =
. If we check the remainder of the length of the full bit string divided by 3, that will tell us how many padding characters are required.
// Pad with = for however many octects we have left
for (var i = 0; i < (bits.Length % 3); i++)
{
base64 += "=";
}
Below is the full code, including the index table for the base64
characters.
private static string Base64Encode(string s)
{
var bits = string.Empty;
foreach (var character in s)
{
bits += Convert.ToString(character, 2).PadLeft(8, '0');
}
string base64 = string.Empty;
const byte threeOctets = 24;
var octetsTaken = 0;
while(octetsTaken < bits.Length)
{
var currentOctects = bits.Skip(octetsTaken).Take(threeOctets).ToList();
const byte sixBits = 6;
int hextetsTaken = 0;
while(hextetsTaken < currentOctects.Count())
{
var chunk = currentOctects.Skip(hextetsTaken).Take(sixBits);
hextetsTaken += sixBits;
var bitString = chunk.Aggregate(string.Empty, (current, currentBit) => current + currentBit);
if (bitString.Length < 6)
{
bitString = bitString.PadRight(6, '0');
}
var singleInt = Convert.ToInt32(bitString, 2);
base64 += Base64Letters[singleInt];
}
octetsTaken += threeOctets;
}
// Pad with = for however many octects we have left
for (var i = 0; i < (bits.Length % 3); i++)
{
base64 += "=";
}
return base64;
}
private static readonly char[] Base64Letters = new[]
{
'A'
, 'B'
, 'C'
, 'D'
, 'E'
, 'F'
, 'G'
, 'H'
, 'I'
, 'J'
, 'K'
, 'L'
, 'M'
, 'N'
, 'O'
, 'P'
, 'Q'
, 'R'
, 'S'
, 'T'
, 'U'
, 'V'
, 'W'
, 'X'
, 'Y'
, 'Z'
, 'a'
, 'b'
, 'c'
, 'd'
, 'e'
, 'f'
, 'g'
, 'h'
, 'i'
, 'j'
, 'k'
, 'l'
, 'm'
, 'n'
, 'o'
, 'p'
, 'q'
, 'r'
, 's'
, 't'
, 'u'
, 'v'
, 'w'
, 'x'
, 'y'
, 'z'
, '0'
, '1'
, '2'
, '3'
, '4'
, '5'
, '6'
, '7'
, '8'
, '9'
, '+'
, '/'
};
}