Get some packages with Microsoft OneGet

Are you a Windows user? Do you see people using apt-get and Homebrew and get filled with rage? Are you not a fan of chocolate? Well, then, you are in luck! Introducing, OneGet.

What is OneGet?

OneGet is Microsoft’s new package manager which allows you to discover and install new software on Windows machines. It is similar to apt-get on Linux, Homebrew on OSX, and even the Powershell based Chocolately package manager. When I, however, say similar to Chocolately, I don’t mean that it replaces Chocolately. In fact, it embraces it. OneGet is essentially an interface to many different package repositories, each repository hosting any number of different pieces of software. Chocolately is one of those repositories and in fact is the one and only repository currently available. As more and more repositories become available you can add each of them as a source and query all of them at the same time. Awesome.

How do I get it?

To install OneGet, install the Windows Management Framework V5 Preview. This will, among a few other things, install Powershell 5 along with the OneGet Powershell module. Once installed OneGet will be available the next time you open Powershell. Please note that this is Windows 8/Windows Server 2012 only and that it’s a CTP and is subject to change!

How do I use it?

There are 7 cmdlets available, allowing you to manage repositories and packages. To view a list of the available cmdlets use the Get-Command command:

> Get-Command -Module OneGet

  CommandType     Name                                               Source
  -----------     ----                                               ------
  Cmdlet          Add-PackageSource                                  OneGet
  Cmdlet          Find-Package                                       OneGet
  Cmdlet          Get-Package                                        OneGet
  Cmdlet          Get-PackageSource                                  OneGet
  Cmdlet          Install-Package                                    OneGet
  Cmdlet          Remove-PackageSource                               OneGet
  Cmdlet          Uninstall-Package                                  OneGet

There currently is no documentation for these, so I’ll detail what they do below.

Get-PackageSource

This cmdlet lists the available repositories you have added to OneGet. As I stated above, Chocolatey is the only one so far.

> Get-PackageSource

Name                          Location                      Provider                                          IsTrusted
----                          --------                      --------                                          ---------
chocolatey                    http://chocolatey.org/api/v2/ Chocolatey                                            False

Add-PackageSource and Remove-PackageSource

These will add, and obviously remove, package repositories. You’ll (hopefully) use this soon when more repositories become available. The Add-PackageSource cmdlet takes name, Provider and Location parameters at a minimum.

> Add-PackageSource chocolatey -Provider Chocolatey -Location http://chocolatey.org/api/v2/
> Remove-PackageSource chocolatey

Get-Package

You can view a list of all packages currently installed on your system by using the Get-Package command:

> Get-Package

Name                             Version          Status           Source         Summary
----                             -------          ------           ------         -------
7zip                             9.22.01.20130618 Installed        Local File     7-Zip is a file archiver with a hi...
7zip.install                     9.22.01.20130618 Installed        Local File     7-Zip is a file archiver with a hi...

Find-Package

To view a list of packages available from all of your repositories, use the Find-Package command. The first time you run it, it will want to install and setup NuGet:

> Find-Package

  RequiresInformation
  The NuGet Package Manager is required to continue. Can we please go get
  [Y] Yes  [N] No  [S] Suspend  [?] Help (default is "Y"):

From there, it will give you a list of all available packages:

Name                             Version          Status           Source         Summary
----                             -------          ------           ------         -------
1password                        1.0.9.340        Available        chocolatey     1Password - Have you ever forgotte...
7zip                             9.22.01.20130618 Available        chocolatey     7-Zip is a file archiver with a hi...
7zip.commandline                 9.20.0.20130618  Available        chocolatey     7-Zip is a file archiver with a hi...
7zip.install                     9.22.01.20130618 Available        chocolatey     7-Zip is a file archiver with a hi...
ack                              2.04             Available        chocolatey     ack is a tool like grep, designed ...
acr                              2.6.0            Available        chocolatey
ActivePerl                       5.14.2.2         Available        chocolatey     ActivePerl is the leading commerci...

...

zabbix-agent                     2.2.1            Available        chocolatey     zabbix
zadig                            2.1.1            Available        chocolatey     USB driver installation made easy
zetaresourceeditor               2.2.0.11         Available        chocolatey     zetaresourceeditor
zoomit                           4.50             Available        chocolatey     ZoomIt is a screen zoom and annota...
zotero-standalone                4.0.19           Available        chocolatey     Zotero [zoh-TAIR-oh] is a free, ea...

You can also provide a filter to search for a specific package:

> Find-Package 7zip

Name                             Version          Status           Source         Summary
----                             -------          ------           ------         -------
7zip                             9.22.01.20130618 Available        chocolatey     7-Zip is a file archiver with a hi...

Install-Package and Uninstall-Package

To install a package, use Install-Package. You’ll have to be running Powershell as Administrator to install packages, and set your Execution-Policy to RemoteSigned.

> Install-Package 7zip

Installing Package '7zip' from untrusted source
WARNING: This package source is not marked as safe. Are you sure you want to install software from 'chocolatey'
[Y] Yes  [N] No  [S] Suspend  [?] Help (default is "Y"): Y

Name                             Version          Status           Source         Summary
----                             -------          ------           ------         -------
7zip.install                     9.22.01.20130618 Installed        chocolatey     7-Zip is a file archiver with a hi...
7zip                             9.22.01.20130618 Installed        chocolatey     7-Zip is a file archiver with a hi...

It will first prompt you that the package source (Chocolatey) is not marked as safe (but it is, because we know it is) but hit yes anyway (unless you’re scared, but you shouldn’t be.) By default, packages will be downloaded and installed to C:\Chocolatey\lib when using the Chocolatey repository.

If you hate what you installed, want it gone and killed with fire, use Uninstall-Package:

> Uninstall-Package 7zip

Name                             Version          Status           Source         Summary
----                             -------          ------           ------         -------
7zip.install                     9.22.01.20130618 Not Installed
7zip                             9.22.01.20130618 Not Installed

Why is this cool?

Because it drastically reduces the time it takes to find, download and install software. I have to run at most 2 Powershell commands and I’ll have whatever software I want installed. The packages are named so appropriately that many times you can guess it and reduce your command count down to 1! That’s a 50% increase in efficiency! Whoa!

This also means that Microsoft, again, is serious about supporting the developer community. First it was the .NET Foundation and Roslyn*, and now they’re embracing something that Linux and OSX users have had for years. For the first time in a while I’m really excited that I use Windows.

Now if you’ll excuse me, I’m going to uninstall 7zip just so I can OneGet it.


* Unless you count Steve Balmer’s promise.

Users see the UI, not the code

UI design is hard. Like, it’s way hard. And it’s also a very important piece of the software puzzle. In fact, some might say it’s the most important piece because to users, it is the software:

A good user interface is one of the most important aspects of an enterprise product. While the underlying architecture is extremely important to deliver the functionality, to the end-user, the interface is the product. They don’t know, (and don’t care, usually,) of what goes on behind the scenes, other than that they expect things to work. Every way they interact with the product is through the interface.

When a user opens an app, they see the interface. They don’t see the code behind it, the layers, the interfaces, the helper libraries; they see the UI. That is the software. If you perform massive technical improvements but leave the UI the same, no one will notice. This is why the interface is so critically important, but also why it’s one of the hardest things to do in software. Designing an interface that both looks good and is intuitive to all users takes effort and skill, and is something that Microsoft, Google and even Apple have yet to fully master.

Look, I am in no way a master at UI design. I kind of suck at it. But I can get by if I have to, and one thing that helps me as I’m working is to ask myself this question:

If I was a user how would I expect this to work?

You are a user of many more pieces of software than you will ever write yourself. You, like everyone, will have expectations of how something should function. So tap into those experiences. Put yourself in the shoes of a user and design the feature as you think it should work. Think about the different reasons a user would use this feature and the goals they might want to achieve while using it. Try to come up with something that minimizes the pain of accomplishing those goals. Chances are you’ll come up with something better than these.

Migrating from SVN to Git; How we did it

My team migrated from SVN to Git about 3 months ago. After a few tweaks, a few bugs and a little elbow grease we’ve been stable ever since. And you know what? It was one of the best moves we’ve ever made. Developers are more efficient and we have finally documented and streamlined our release workflow.

I was in charge of handling the migration. That included setting up an internal Git host, migrating the SVN repositories over, documenting the new Git development process and training the developers. One important requirement was that we couldn’t stop active development during the migration – developers always had to have a place to commit code.

Technical Side – What tools were used and how was it done?

Stash

Due to legal reasons, we weren’t able to make use of popular Git hosts such as GitHub or Bitbucket, so we needed to find an internal hosting solution. We looked at many open source hosts, along with GitHub Enterprise, and finally determined that Atlassian Stash was the best option for us. It offered most of the features we desired – internally hosted, pull requests, HTTP/HTTPS/SSH access, and the ability to connect with Active Directory – and was corporate backed and reasonably priced.

Other than the setup being archaic Stash was up and running within 20 minutes. Configuration was relatively trivial, and mostly included configuring permissions and user setup. We hooked up Stash with Active Directory so all employees can login using their domain accounts. This reduces the number of username/password pairs everyone has to remember which, imo, is a really good thing.

Initial SVN Migration

The initial SVN migration went relatively smoothly with little to no hiccups. We followed the steps I layed out in my previous post, Migrating from SVN to Git, with one small caveat – after we performed the first fetch, we left SVN as the primary repository, and all code was still committed there. We set the permissions on the Git server to be readonly so that developers could clone the repositories, get introduced and familiar to Git, and we could confirm that there were no connection or permission issues with Stash. Everyday we performed a fetch from the SVN repository and pushed the changes up to Git to keep things up to date. We left this process in place for about a week; once we confirmed there were no issues, and all devs had some sort of Git client they liked, we switched over. (As a side note, if we had many more repositories, and/or were going to leave this process in place for longer than a week, I would have set up a job to run daily (perhaps hourly?) to perform the fetch. If you’re in this boat, I recommend you do that using Powershell or similar, unless you like performing the same monotonous task every morning, in which case go for it.)

When we were ready to shut off SVN, we had all developers commit any pending changes to SVN, then we switched the repository to be readonly. We performed one final fetch/push from SVN, and opened up the Git server to the world. (Okay, opened to our office, but whatever.)

Tools

So we set up Stash on the server, but what clients did we use? We are a Microsoft shop, and as such have a mix between SourceTree, Posh-Git and bash. We didn’t really set any limitations on what client to use, as long as it works for the dev. (If you ask me, though, Posh-Git is the way to go. By a mile.)

Human Side – Git workflow, developer training and hiccups

Git Workflow

This is where the fun starts. Developers inevitably had questions, most of which I could answer but some of which we had to work out together. Most of these questions revolved around workflow – when do I branch, why do I branch, do I need to branch? Our SVN workflow was, well, not exactly much of a workflow. We had a develop branch, and most work went into that, and sometimes we would branch for features, but then we would have merge problems because SVN sucks at that, and then we’d release whenever from wherever, and… yeah. Not much of a workflow.

So, I took this opportunity to standardize our process, which is basically git-flow. We have a develop branch, all features get branched from there and merged back when they’re ready. When we decide to release, we branch into a release branch, perform fixes, and merge the production ready code into master. Hotfixes are merged off of master and merged back into master and develop. I laid this workflow out in a formal document that was available to everyone – developer or otherwise.

The fun part about documents, at least that I have found, is that nobody reads them. Ever. I still got a lot of questions about where to branch feature branches from, when to create a release branch, and where to release from. My answer, most of the time, was “read the documentation” (without being rude) to which I got a “what documentation?” response.

Training

So the next logical step was group training. I set aside 30 minutes to get all developers together and explain things – both about Git and the new workflow. We went over the differences and similarities between SVN and Git – what distributed means in practice, pushing, pulling, committing, stashing, adding it items to the index, etc. And then we covered the new workflow (with pictures!) and how Stash helps formalize the process with Pull Requests and such.

The training was a huge success even with it only being a 30 minute session. Everyone was able to ask questions and get on the same page. I highly advise giving a formal presentation if you can with as many visual aids as possible. It’s much easier to understand a live, visual presentation over emails and a Word document.

Issues

We luckily didn’t run into any technical issues. The only slight issue we ran into was getting developers to follow the new workflow. Again, training pretty much mitigated this issue and everything was smoothed out in a matter of days. We have yet to have any technical issues.

Wrap Up

If you’re on the fence about making the switch to Git, I highly recommend it. There are many benefits with little to no drawbacks. We’ve only been using it for 3 months and I can already see an increase in productivity and quality of output. Formal Pull Requests have strengthened our peer reviews and having a strict release process has increased our quality. It has been one of the best decisions we’ve made as a team in a long time.

Migrating from SVN to Git

So you’ve done it – you’ve finally made the decision to switch to Git. SVN does some things very well, and has been a great source control system since it’s creation in 2000. But the features that Git brings – distribution, performance, easy branches, easy merges, stash – are hard to pass up. After you make the switch, you’ll probably wonder how you ever worked without it.

So how do you get all of your data, branches, tags and history into Git? Git includes an incredibly useful tool, git-svn which is a bidirectional connection between Git and SVN. It allows you to pull, and if you so desire push, commits to and from SVN. I recommend avoiding pushing back to SVN because, well, why would you? We’re here to switch, not combine! We’ll just use it to pull down all commit history, branches and tags from SVN.

The general workflow for this process is:

  • Initialize a Git repository with the SVN repository as a remote
  • Configure the user mapping between SVN and Git
  • Fetch from the SVN repo
  • Convert the SVN tags and branches into Git tags and branches
  • Push the repository to a bare repo on the Git host

To start, initialize a git repository with the svn repository as a remote:

> git svn init http://url.to.svn/ --prefix svn

The --prefix svn will prefix all branches and tags with the word svn which will make it easier to distinguish them later on. If you have a non-standard SVN layout (i.e. not named trunk, branches and tags), you can specify each of those with -T for Trunk, -B for Branches and -t for Tags:

> git svn init http://url.to.svn/ -T Trunk -B Branches -t Tags

Next, create an authors.txt file that maps the SVN usernames to the desired usernames in Git. The format is My Svn Username = My Name <myemail>. For instance:

davidzych = David Zych <dave@example.com>
johncandy = John Candy <john@example.com>
michaelscott = Michael Scott <michael@dundermifflin.com>

Once you have the authors file, configure Git-Svn to use the file when performing the fetch:

> git config svn.authorsfile ../authors.txt

You could also configure this in the global config if you have many repos to migrate and only want to specify it once.

It is at this point that I recommend switching the permissions on SVN to be readonly. This way, no one can commit to the repository while you’re performing the migration and no commits will be lost. Once you’ve done that, fetch from svn:

> git svn fetch

After what is probably going to be a long time, you’ll have a git repository with a lot of commits with funny looking commit messages and some remote branches that apparently are your SVN branches and you might be feeling pretty good right now. But you have more work to do! If you do a git branch -a, you’ll see all of your branches from SVN listed as remote branches:

> git branch -a
 * master
   remotes/svn/trunk
   remotes/svn/feature-123
   remotes/svn/feature-456

We need to take those remote branches and turn them into local Git branches. To do this you can run git branch branch-name remotes/svn/branch-name. If you only have a few, you can run that command manually for each branch and be done with it. If you have a lot, well, you can use Powershell or something similar to loop through the branches and automate it, or be like me and copy the branches into Excel, create a formula that generates the create branch statements and save those as a batch file and run it. I’m not an Excel fan but, hey, it works. However you want to do it, get it done.

After branches, you can do the same thing with tags:

> git tag -a -m "Migrating SVN tag" tag-name refs/tags/tag-name

Now you have all your branches and tags as local branches and tags in Git.

Next, add your bare remote Git repository as a remote (you did create one of those, right?).

> git remote add newrepo https://url.to.git/repo.git

And push everything up! Remember to specify --all to push all local Git branches, and perform a second push with --tags to push all tags.

> git push --all newrepo
> git push --tags newrepo

You now have all of your commits, branches and tags from SVN migrated to Git. Instead of attempting to clean out your Git-SVN hybrid repository, it’s probably easiest to perform a clean checkout of the new repository before you start working again:

> git clone https://url.to.git/repo.git

And, with that, you’re done! Enjoy your new Git repository!

Microsoft releases a preview of the .NET Compiler Platform, codenamed Roslyn

Microsoft released a public preview of the .NET Compiler Platform, codenamed Roslyn, on April 3rd, 2014. The code is available at http://roslyn.codeplex.com/ for you to bask in all of it’s glory.

You can clone the .NET Compiler Platform Git repository using this command:

git clone https://git01.codeplex.com/roslyn

Or install the Nuget Package:

Install-Package Microsoft.CodeAnalysis -Pre

What is the .NET Compiler Platform?

The .NET Compiler Platform is Microsoft’s effort to open source the C# and Visual Basic compilers. The code is released under the Apache License 2.0. From Codeplex:

Traditionally, compilers are black boxes — source code goes in one end, magic happens in the middle, and object files or assemblies come out the other end. As compilers perform their magic, they build up deep understanding of the code they are processing, but that knowledge is unavailable to anyone but the compiler implementation wizards. The information is promptly forgotten after the translated output is produced.

This is the core mission of the .NET Compiler Platform (“Roslyn”): opening up the black boxes and allowing tools and end users to share in the wealth of information compilers have about our code. Instead of being opaque source-code-in and object-code-out translators, through the .NET Compiler Platform (“Roslyn”), compilers become platforms—APIs that you can use for code related tasks in your tools and applications.

Microsoft took the original C# and Visual Basic compilers, which were written mostly in C++, and completely rewrote them in managed code. This means they were able to create a set of APIs that allow you to consume the code compilation and analysis results. There are currently 2 main APIs: The Compiler APIs and Workspace APIs. It is worth noting that neither of these APIs have a dependency on Visual Studio which means you can provide much of the same Visual Studio functionality in any application you want.

Compiler APIs

The Compiler API layer allows you to view information about the compilation process. This includes syntax and semantic information, errors, warnings, as well as access to files and information after compilation is complete. It provides Syntax Trees that display the structure and references between your code, Syntax Tokens which are the keywords, variables, etc in your code, and Syntax Trivia which is essentially the items that the compiler ignores such as whitespace and comments.

Workspace APIs

The Workspace APIs provide you information about the current project and solution, allowing quick and easy access to a vast array of information about the code. This assists in providing code analysis, refactoring, and Intellisense to the user. The Workspace API has a CurrentSolution property that gets updated whenever a change to the host environment occurs. This can be anything from typing a letter in a source file to saving a project.

Why is this cool?

Well, first off, it’s open source! The .NET Compiler Platform is part of Microsoft’s newly created .NET Foundation, which is a foundation created to help spur on development of open source technologies making use of .NET. Open source means that the community at large can review the code, provide bug reports and fixes, and can maintain the code even if Microsoft falls off the face of the earth. The fact that Microsoft open sourced these compilers means they are serious about their recent push cultivate the open source .NET community.

This is also awesome because it means that creating code analysis tools is much, much easier. Like, an order of magnitude easier. Like, I might even be able to do it. Right now, developers of tools like JetBrains’ ReSharper, Telerik’s JustCode, and even Visual Studio itself had to write their own code that is essentially a duplicate of the existing compiler code. Roslyn allows them to tie into existing operations and make use of the analysis and syntax trees the compiler already has.

If you don’t want to create a full blown productivity extension, anyone can take this and write small extensions that provide new warnings and errors to the compiler. Or create a new refactoring extension that finds duplicate code through an entire solution. Or a tool that finds all comments in your solution and outputs a documentation file. Or create an analysis tool that provides your method’s Kevin Bacon Number!.

Now what?

Remember, this is just a preview release. Microsoft hasn’t provided a final release date, but I don’t expect it to be anytime soon. For now, go play with it! Look through the code, download the source, have fun! If nothing else, it’s a great look into the C# and Visual Basic compilers.

Now, if you’ll excuse me, I’m going to go add Intellisense to Notepad.

Coloring your Posh-Git output

As a followup to my previous post, Coloring your Git output, if you use Posh-Git you can also edit the colors of the Git output by modifying the Posh-Git settings.

What is Posh-Git? It’s a fantastic set of Powershell scripts for Git. It provides tab completion plus information right in the prompt stating the currently checked out branch along with the working copy and index statuses.

PoshGit

The Posh-Git color settings can be changed using the $global:GitPromptSettings object. Here are the available properties you can set:

  • IndexForegroundColor
  • BranchForegroundColor
  • BranchAheadBackgroundColor
  • AfterBackgroundColor
  • BranchBehindForegroundColor
  • UntrackedBackgroundColor
  • AfterText
  • BeforeForegroundColor
  • WorkingForegroundColor
  • RepositoriesInWhichToDisableFileStatus
  • EnableWindowTitle
  • ShowStatusWhenZero
  • BeforeIndexForegroundColor
  • BeforeIndexBackgroundColor
  • BranchBackgroundColor
  • DescribeStyle
  • BeforeBackgroundColor
  • WorkingBackgroundColor
  • DelimText
  • UntrackedForegroundColor
  • DefaultForegroundColor
  • AfterForegroundColor
  • DelimBackgroundColor
  • Debug
  • BeforeIndexText
  • BranchAheadForegroundColor
  • DelimForegroundColor
  • UntrackedText
  • EnableFileStatus
  • IndexBackgroundColor
  • AutoRefreshIndex
  • BeforeText
  • BranchBehindAndAheadBackgroundColor
  • BranchBehindBackgroundColor
  • BranchBehindAndAheadForegroundColor
  • EnablePromptStatus

You have a few more color options than the 9 that Git allow as well:

  • Black
  • Blue
  • Cyan
  • DarkBlue
  • DarkCyan
  • DarkGray
  • DarkGreen
  • DarkMagenta
  • DarkRed
  • DarkYellow
  • Gray
  • Green
  • Magenta
  • Red
  • White
  • Yellow

You can edit these by editing Posh-Git’s GitPrompt.ps1 file although it’s not recommended. If (and when) you update Posh-Git those settings will be overwritten. The better way is to edit your profile settings to set the colors on startup. Calling $profile at the Powershell prompt will display the location of your profile file; open it to edit your Powershell profile. You’ll see a line in there that initializes Posh-Git:

. 'C:\tools\poshgit\dahlbyk-posh-git-c481e5b\profile.example.ps1'

You should place any customizations after that line:

$global:GitPromptSettings.WorkingForegroundColor    = [ConsoleColor]::Yellow 
$global:GitPromptSettings.UntrackedForegroundColor  = [ConsoleColor]::Yellow

PoshGit

Now git nuts!

Coloring your Git output

Do you sometimes have a hard time viewing the output of a Git command? Updating the colors might help! In Git, you can edit the config to change the color of the output. You can set colors per repository or globally. We’ll focus on the global config here. The global config can be edited either by using the git config --global command, or by editing your global .gitconfig file.

Starting with Git 1.8.4, you can set color.ui auto which will color the output with the default colors. You’re also able to set the colors manually if you’re so inclined. You are able to edit the colors of the status, diff, and branch commands.

There are 9 colors available:

Color
normal
black
red
green
blue
yellow
cyan
magenta
white

If you choose to use the git config --global command, you edit the color.{command}.{property} property. For instance, to change the color of the untracked files listed in the status command to yellow:

git config --global color.status.untracked yellow

If you choose to edit the global file manually, the .gitconfig file can be found at these locations:

OS Path
Windows (Vista up) C:/Users/{username}/.gitconfig
Mac $HOME/.gitconfig
Linux ~/.gitconfig

When editing the file, add a new [color] section for the command you want to edit followed by a list of the properties and colors.

[color "diff"]
    meta = yellow
    frag = magenta
    old = red
    new = green

[color "status"]
    added = yellow
    changed = green
    untracked = red

[color "branch"]
    current = green
    local = white
    remote = red

Writing your own Convert.ToBase64String in C#

Have you ever wondered what Base64 is? How it works? Why you need it? Have you ever wanted to write your own Base64 encoder? Well, my friend, you are in luck because that’s what we’re talking about today. To get started…

What is Base64?

Base64 is a common way to convert binary data into a text form. This is commonly used to store and transfer data over media that was designed to store and transfer only text, such as including an image in an XML document.

It works by converting the data into a base-64 representation and displaying it using a common character set. The most common character set used is A-Z, a-z, 0-9, + and /, although different implementations can use different character sets. The goal is to use a common set of characters that can be represented in most encoding schemes. Here’s the index table of the most common set:

Index Character
0 A
1 B
2 C
3 D
4 E
5 F
6 G
7 H
8 I
9 J
10 K
11 L
12 M
13 N
14 O
15 P
16 Q
17 R
18 S
19 T
20 U
21 V
22 W
23 X
24 Y
25 Z
26 a
27 b
28 c
29 d
30 e
31 f
32 g
33 h
34 i
35 j
36 k
37 l
38 m
39 n
40 o
41 p
42 q
43 r
44 s
45 t
46 u
47 v
48 w
49 x
50 y
51 z
52 0
53 1
54 2
55 3
56 4
57 5
58 6
59 7
60 8
61 9
62 +
63 /

How does it work?

It works by grouping the bits of the data into chunks 24 bits, treating those as 4 chunks of 6 bits (sextets), converting each sextet into base10 and looking up the corresponding character for that decimal number. A single 24 bit string is represented by 4 encoded characters.

For instance, to start encoding the first 3 characters of my name we first have to convert the letters into bytes, and the bytes into bits. In this instance, we’ll say the characters are encoded in ASCII. The byte representations for Dav are:

D: 68
a: 97
v: 118

Those numbers, written in 8 bit binary, are 01000100, 01100001, and 01110110 respectively. Group those together to form a 24 bit string and you get 010001000110000101110110.

Next, grab 4 sextets of bits, convert those to decimal and look up the corresponding character in the index table. 010001 is 17, 000110 is 6, 000101 is 5, and 110110 is 54. Looking those up in the index table gives the string RGF2. We just converted to Base64! Hooray!

Padding

But wait… we have a problem. What happens when the data we want to represent isn’t divisible by three and our last grouping doesn’t have 24 bits?

This is where padding comes in. When we lack 1 or 2 octects out of our 24 bit string, we need to pad the end of the base64 string with =. To extend our previous example, let’s encode my entire first name (Dave if you already forgot…). We know that Dav is encoded as RGF2 so we just need to encode the last letter, e.

e as a byte is 101, which is 01100101 in binary. If we attempt to get our sextet groupings out of that, we get 011001 and 01. Huh. That last sextet is missing a few bits.

What we need to do is pad the last sextet with 0 and note that we have 2 octects missing. That leaves us with 011001 and 010000, which are 25 and 16, which are Z and Q. Our final string, padded with = for the two missing octets, is RGF2ZQ==.

Writing your own encoder

First, a disclaimer. What we’re writing here is for educational purposes. It’s slow, unoptimized and pretty useless considering .NET comes with a respectable Base64 converter. This is a learning exercise.

The existing Convert.ToBase64String method in the System namespace takes a byte[] as a parameter and returns a string. Here’s the full method signature:

public static string ToBase64String(
    byte[] inArray
)

We’re going to write our own implementation of this method:

namespace MyBase64Converter
{
    public static string ToBase64String(byte[] inArray)
    {
        //Converter code goes here
    }
}

The good part about the method taking a byte[] parameter is that part of the work is already done for you – getting the byte representation of your data. From there, we need to convert each byte into it’s 8-bit binary representation. We could use one of the Convert.ToString() overloads in .NET, or we could use the one we wrote ourselves! We’re using the PadLeft method after our call to IntToBinaryString to ensure the binary string is a full 8-bits.

namespace MyBase64Converter
{
    public static string ToBase64String(byte[] inArray)
    {
        var bits = string.Empty;
        for(var i = 0; i < inArray.Length; i++)
        {
            bits += IntToBinaryString(inArray[i]).PadLeft(8, "0");
        }
    }
}

Now that we have our data represented as binary, we need to grab 24-bit chunks at a time. We’ll make use of the Skip and Take methods in LINQ to accomplish this.

string base64 = string.Empty;

const byte threeOctets = 8 * 3;
var octetsTaken = 0;
while(octetsTaken < bits.Length)
{
    var currentOctects = bits.Skip(octetsTaken).Take(threeOctets).ToList();

    // More code here

    octetsTaken += threeOctets;
}

Note that we loop while octectsTaken is less than the length. This will allow us to loop through the end of the string, regardless of whether or not we have full 24 bit chunks.

Next we go sextet by sextet, convert the binary to a byte and look it up in the table. We're making use of another LINQ method, Aggregate, which is basically a fancy way of joining the bits into a string again.

const byte sixBits = 6;
int hextetsTaken = 0;
while(hextetsTaken < currentOctects.Count())
{
    var chunk = currentOctects.Skip(hextetsTaken).Take(sixBits);
    hextetsTaken += sixBits;

    var bitString = chunk.Aggregate(string.Empty, (current, currentBit) => current + currentBit);

    if (bitString.Length < 6)
    {
        //This happens when we need to pad
        bitString = bitString.PadRight(6, '0');
    }
    var singleInt = Convert.ToInt32(bitString, 2);

    base64 += Base64Letters[singleInt];
}

Great! Finally, we'll check if we need to pad the end with =. If we check the remainder of the length of the full bit string divided by 3, that will tell us how many padding characters are required.

// Pad with = for however many octects we have left
for (var i = 0; i < (bits.Length % 3); i++)
{
    base64 += "=";
}

Below is the full code, including the index table for the base64 characters.

private static string Base64Encode(string s)
{
    var bits = string.Empty;
    foreach (var character in s)
    {
        bits += Convert.ToString(character, 2).PadLeft(8, '0');
    }

    string base64 = string.Empty;

    const byte threeOctets = 24;
    var octetsTaken = 0;
    while(octetsTaken < bits.Length)
    {
        var currentOctects = bits.Skip(octetsTaken).Take(threeOctets).ToList();

        const byte sixBits = 6;
        int hextetsTaken = 0;
        while(hextetsTaken < currentOctects.Count())
        {
            var chunk = currentOctects.Skip(hextetsTaken).Take(sixBits);
            hextetsTaken += sixBits;

            var bitString = chunk.Aggregate(string.Empty, (current, currentBit) => current + currentBit);

            if (bitString.Length < 6)
            {
                bitString = bitString.PadRight(6, '0');
            }
            var singleInt = Convert.ToInt32(bitString, 2);

            base64 += Base64Letters[singleInt];
        }

        octetsTaken += threeOctets;
    }

    // Pad with = for however many octects we have left
    for (var i = 0; i < (bits.Length % 3); i++)
    {
        base64 += "=";
    }

    return base64;
}

private static readonly char[] Base64Letters = new[]
                                        {
                                              'A'
                                            , 'B'
                                            , 'C'
                                            , 'D'
                                            , 'E'
                                            , 'F'
                                            , 'G'
                                            , 'H'
                                            , 'I'
                                            , 'J'
                                            , 'K'
                                            , 'L'
                                            , 'M'
                                            , 'N'
                                            , 'O'
                                            , 'P'
                                            , 'Q'
                                            , 'R'
                                            , 'S'
                                            , 'T'
                                            , 'U'
                                            , 'V'
                                            , 'W'
                                            , 'X'
                                            , 'Y'
                                            , 'Z'
                                            , 'a'
                                            , 'b'
                                            , 'c'
                                            , 'd'
                                            , 'e'
                                            , 'f'
                                            , 'g'
                                            , 'h'
                                            , 'i'
                                            , 'j'
                                            , 'k'
                                            , 'l'
                                            , 'm'
                                            , 'n'
                                            , 'o'
                                            , 'p'
                                            , 'q'
                                            , 'r'
                                            , 's'
                                            , 't'
                                            , 'u'
                                            , 'v'
                                            , 'w'
                                            , 'x'
                                            , 'y'
                                            , 'z'
                                            , '0'
                                            , '1'
                                            , '2'
                                            , '3'
                                            , '4'
                                            , '5'
                                            , '6'
                                            , '7'
                                            , '8'
                                            , '9'
                                            , '+'
                                            , '/'
                                        };
}

Converting a binary string to an int in C#

Back in my previous post, Converting an int to a binary string, we looked at how to write out the bits of an int without using the existing Convert.ToString method in the .NET Framework. Now, let’s look at the reverse – how to convert that binary string back into an int. The .NET Framework already has a built in method to do this (obviously), which is the Convert.ToInt32(string, int) method. This method takes the binary string and the base to convert from as parameters.

The easiest way that I have found to convert a binary number to decimal is to look at the bits of the binary number and raise 2 to the power of the index of the “on” bits and add those together. I define an “on” bit as a bit that is 1 as opposed to 0.

For example, the binary number 100 can be looked at as

22 + 0 + 0 = 4

Similary, 101 can be looked at as

22 + 0 + 20 = 5

Knowing this, we can then loop through the characters of the binary string, check if the bit is “on” and, if so, add

2 [index]

to resulting int. In the code example below, I first reverse the array to allow the index of our loop (power) match up with the index of the binary string (the power in which we want to raise 2 to).

public static int BitStringToInt(string bits)
{
    var reversedBits = bits.Reverse().ToArray();
    var num = 0;
    for (var power = 0; power < reversedBits.Count(); power++)
    {
        var currentBit = reversedBits[power];
        if (currentBit == '1')
        {
            var currentNum = (int) Math.Pow(2, power);
            num += currentNum;
        }
    }

    return num;
}

Converting an int to a binary string in C#

The .NET Framework has a built in overload of Convert.ToString which takes 2 parameters: the int you want to convert and an int of the base you want to convert to. Utilizing this with base 2, you can print out the string representation of a number in binary, like so:

var binary = Convert.ToString(5, 2); //Gives you "101"    

Now this is all fine and dandy, but you didn’t learn anything. (Or maybe you did. I don’t know. But you can learn some more so keep reading). For fun, let’s pretend that .NET didn’t have this method built in. How would you convert your number to it’s binary representation?

We can use a combination of bit shifting and logical AND’s to achieve this. If you logical AND a number with 1, that will give the value 1 or 0 depending on the value of the bit in the first position:

  1101
& 0001 (The number 1 in binary)
------
  0001

As we bit shift, 0′s are brought in from the left and the rightmost bit is dropped off and lost. If we bit shift the number to the right and then AND it with 1 again, we’ll get the result of the second bit.

  0110
& 0001
------
  0000

If we loop and continue to bit shift until the number is 0 we can build the entire binary string.

Example: Say we have the number 9, which in binary is 1001. Here’s the breakdown:

Number In Binary AND Result String
9 1001 1 1
4 0100 0 01
2 0010 0 001
1 0001 1 1001
0 0000 N/A (number is 0 so we’re done!) 1001

Now, in C#, to perform a right bit shift we use the >> operator, and to perform a logical AND we use the & operator. Here’s the code:

public string IntToBinaryString(int number)
{
    const int mask = 1;
    var binary = string.Empty;
    while(number > 0)
    {
        // Logical AND the number and prepend it to the result string
        binary = (number & 1) + binary;
        number = number >> 1;
    }

    return binary;
}

If you would like to print the string with a specific bit length, you can use the PadLeft method in the .NET Framework. it will prepend the specified number of a character of your choosing to your string:

binary = "1001";
binary = binary.PadLeft(8, '0');
// binary is now "00001001";