Compression on the web is surprisingly underused

Eric Lawrence posted an article the other day on web compression, focusing a bunch of different algorithms, what should get compressed and how to get the best performance on your site based on mixing minification and compression. It’s a great read with lots of good, useful information.

Isn’t compression used on most sites already?

This got me curious of the state of compression on the web. How many sites use some form of compression? My initial assumption was > 90%, but even that sounded low. I looked up some stats on W3Techs, which provides “information about the usage of various types of technologies on the web.” Based on their studies, they found that:

Compression is used by 58.1% of all the websites.

Fifty-eight point one percent.

That’s it? I was astounded when I first read that number and I’m still pretty surprised now. There are estimated to be over 1 billion websites in existence right now, which means 500 million of them are sending uncompressed data. That’s a lot of useless bytes being sent through the wire.

It makes no sense to not enable compression. It’s incredibly easy to set up in both Apache and IIS (on IIS you seriously just have to click a few buttons). Enabling it will only affect users for the good, because not only have all major browsers supported it for the past ~10 years, they all send the Accept-Encoding: gzip header if they support compression. If they don’t send the header, the server won’t compress the response. Everyone gets their content either way, some will just get it faster than others.

The only downside is a slight increase in CPU usage, but that is a minimal increase for a massive decrease in response size. To show how much the response size is decreased, we can use the Composer in Fiddler to run two requests to http://davidzych.com, one with the Accept-Encoding: gzip header and one without:

Without Compression With Compression (gzip) Savings
79,203 bytes 21,499 bytes 72.8%

Multiply that over thousands of users a month and that’s a significant bandwidth savings.

Compress yourself before your wreck yourself

Check yourself before you wreck yourself
Do yourself a favor and make sure that compression is enabled on your website. It’s the ultimate low-cost, high reward feature.

Your server’s internet pipes will thank you.

Google’s Material Design Spec is a great idea

Google just released their first major update to their Material Design Spec. Originally released back in June, the spec is a document that outlines the best approach to application design based on their material design philosophy. It’s goal is to:

Create a visual language that synthesizes classic principles of good design with the innovation and possibility of technology and science.

It’s a great introduction to application design and is relatively easy to follow. It has color pallets, layout ideas and animation guidelines among many other things.

What in the crap is “Material Design”?

It is a design philosophy for virtual applications that attempts to replicate the physical world. Items in the physical world have physical properties – they have mass, they rotate, they lay on top of each other, they move. They accelerate and decelerate, with larger objects taking longer than smaller objects to get up to speed. When an object is touched, it provides tactile feedback and moves in predictable ways.

The goal of material design is to replicate these effects to provide a consistent, immersive experience across applications. When the user taps on an item in the virtual world, it should provide feedback in some way – it should ripple, or highlight, or float. Pages shouldn’t just appear – they should slide in, with natural looking acceleration and deceleration. It’s appealing to the eyes and provides a great deal of polish to an application.

The following video shows a few examples of what Google is trying to achieve:

Everything is fluid, and moves, and is colorful and fun and appealing. But it’s not over the top – animations don’t take 30 seconds and cause users to get frustrated because they have to wait to perform their action. The animations are immediate, and provide just enough visual appeal while not getting in the way.

Okay, really, is this important?

I think so. To my knowledge this is the first document of it’s kind – an easy to use, “here’s what looks good and why” guide for application design. It’s Layouts and Colors for Dummies*. It will, hopefully, create consistency across not only environments, but across devices, OS’s and applications. If applications function in similar ways, it means the application growth curve turns from this:

This means learning is hard

This means learning is hard

to this:

easy

Learning is easy!

This will allow developers to spend less time focusing on specific environments and more time creating a single, awesome design that works everywhere. Less time duplicating means more time awesome-ing.

Feedbagback

Google has also promised that they’ll take community contributions:

… we set out to create a living document that would grow with feedback from the community.

The fact that Google is taking feedback from the community is, probably, the best part. This isn’t Google trying to tell the world how to do things – they aren’t throwing slop (this spec) to the pigs (the development community) and expecting us to eat it up and do whatever they tell us. This is a document by the community, for the community, and for the greater good of computing.

Just like CommonMark, this is an attempt to grow the field computing, and I’m definitely on board.


* If this isn’t a real book it should be.

Install Windows 10 from a USB Flash Drive

I’m writing this because I can, for some reason, never remember how to use Diskpart. And who uses DVD’s anymore? Download the Windows 10 preview ISO from here: http://windows.microsoft.com/en-us/windows/preview

Steps

1. Insert a usb drive at least 4gb in size

2. Open a command prompt as administrator

Hit Windows Key, type cmd and hit Ctrl+Shift+Enter. This will force it to open as admin.

3. Run diskpart

This will open and run the Diskpart command line utility, which allows you to manage disks, partitions and volumes.

C:\Windows\system32> diskpart

4. Run list disk

This will list all disks on your system. You’ll see the something similar to this:

DISKPART> list disk

  Disk ###  Status         Size     Free     Dyn  Gpt
  --------  -------------  -------  -------  ---  ---
  Disk 0    Online          238 GB      0 B
  Disk 1    Online          465 GB      0 B
  Disk 2    Online           29 GB      0 B

5. Select your flash drive by running select disk #

Find the item that corresponds with your flash drive and select the disk. In the example above, my flash drive is disk 2 so I’ll run:

DISKPART> select disk 2

Disk 2 is now the selected disk.

6. Run clean

WARNING: This deletes all data on your drive

The clean command marks all data on the drive as deleted and therefore removes all partitions and volumes. Make sure you want to do this! If you are sure, run:

DISKPART> clean

7. Create a partition

DISKPART> create partition primary

8. Select the new partition

Since we know there is only one partition, we can just run this:

DISKPART> select partition 1

Without checking the partition number. If you’re really curious, run list partition to check.

9. Format the partition

To format it, we’ll use the NTFS file system and run a quick format:

DISKPART> format fs=ntfs quick

10. Set the current partition as Active

Run:

DISKPART> active

11. Exit diskpart

Run exit. This will exit diskpart, but leave the command window open.

12. Mount your ISO

Use Virtual CloneDrive or similar.

13. Navigate to the mounted image and install a bootsector

My ISO is mounted as G:\, so I’ll navigate to G:\boot and run:

C:Windowssystem32> G:
G:\> cd boot
G:\boot> bootsect.exe /nt60 E:

Where E:\ in this case is my flash drive’s letter.

14. Copy the entire contents of the ISO to your flash drive

You can either do this from Windows using Ctrl+C+Ctrl+V, or from the command line using xcopy.

G:\> xcopy g:\*.* e:\ /E /H /F 

/E copies all subfolders, /H copies all hidden files and /F displays all source and destination file names as it’s copying.

Once that’s done, go and install Windows!

Tools Amplify Talent

My sister in law’s dad picked up golf a few years ago. We’ll call him George, because that’s a pretty generic name and he was curious about golf. Anyway. On the morning of my then-girlfriend’s-brother’s-wedding-to-his-then-girlfriend (complicated, I know) we played a round of golf. George had never played before, wasn’t interested in learning, but didn’t want to miss out on the heavily desired guy time so he rode in the cart with us while we played. Apparently something clicked, because he, at that exact point in time, decided golf was for him. Fast forward to today, and he plays a few times a week.

Now, as I said, he’s only been playing a few years, so he’s not great. Last I heard, he shoots in the low 90’s/high 80’s. Not bad, but not good either. (For you non-golfers, 72 is the average par score for most courses. The winners of PGA tournaments score in the 60’s). It takes years upon years to master golf* so his score is not a surprise.

What is a surprise, though, is that George thinks buying new clubs will make him a better golfer. I have heard by rumor that he has already had 5 sets of clubs, not to mention a countless number of new drivers, putters and various other clubs. All in a few years of golf. He’s probably had a few new golf bags, too, because hell, the bag color definitely affects your swing.

He thinks that the clubs make him good. But that’s astonishingly backwards.

Tools amplify talent; talent doesn’t appear through the tools.

Buying a new Callaway driver won’t make you magically hit the ball 300 yards if you can’t swing straight to begin with.

This applies to so many other disciplines as well. Take woodworking, for example. If you have a natural eye for desk design, you can get by with low quality tools. If your jigsaw has a low RPM and a bent blade, you can still cut wood and sand it and perfect it and craft a beautiful desk. It might take longer, and might be more difficult, but you still have the ability and the eye for desk design. The most expensive jigsaw you can buy won’t magically flip the switch in your brain that allows your hands to work with wood.

elaborate table

Most people can’t craft a table as beautiful as this.

Tools enhance your ability. They allow you to apply the skills you have gained from years of experience. New, expensive tools are not a substitute for experience.

Quite often prospective programmers are asking What programming language should I learn? To me, that’s a fruitless question. It doesn’t matter what language you learn, what’s important is learning how to program. You need to learn how to manipulate a computer and how to think in a logical, linear manner.

Once you know how to program, you’ll understand how to choose the right tool for the job. Find a language that augments what you’re trying to do. You wouldn’t choose Objective-C for web programming, just like you wouldn’t choose C# for embedded systems. You don’t use a belt sander for a smooth finishing sand. Choose the right tools, and they’ll help you create something awesome.

Unless you’re hoping to hit a hole in one; in that case you’re gonna have to rely on luck.


* I’m actually convinced that nobody masters golf. It’s an incredibly difficult game.

If you’re using enum.ToString() that often, you’re doing it wrong

Daniel Wertheim measured the performance of enum.ToString and found it to be 400x slower than using a comparable class with const’s. That’s a massive difference, and something that, in theory, might make you think otherwise about using an enum.

But in practice… who cares?

You shouldn’t be using enum.ToString that often anyway

There aren’t many scenarios in which you should be ToStringing them in rapid succession. And if you are, you’re probably doing something wrong. Enums are used to hold state and are commonly compared, and enum comparisons are incredibly fast. Much, much faster than comparing strings.

The only real time you’ll have to have the string representation of an enum is if you’re populating a drop down list or something similar, and for that you ToString each item once and you’re done with it.

Just for fun, I ran a totally unscientific, unoptimized test* to see how fast a single enum.ToString() ran:

static void Main(string[] args)
{
    var sw = new Stopwatch();
    sw.Start();
    var s = Test.One.ToString();
    sw.Stop();
    Console.WriteLine(sw.Elapsed);
    Console.Read();
}
        
public enum Test
{
    One,
    Two,
    Three
}

The result was 00:00:00.0000664. This was for a single ToString with no burn in. That’s ridiculously fast, and will be even faster after it’s JIT’d.

So, yes, Daniel is right and ToStringing an enum is slow, but let’s look at the big picture here. For the amount that you should be calling ToString on an enum (very little) it’s fast enough by a large margin. Unless you run into a very rare situation, there are many more performance issues to worry about.


* Like, really, this probably breaks every rule there is.

Recovering changes with git reflog

I ran into a situation recently where I accidentally merged an unfinished feature branch directly into master. I had been working on the feature and got an urgent hotfix request. Without thinking, I branched from the feature branch to perform the hotfix changes, then merged that directly into master once I was finished.

Whoops.

Luckily enough, I noticed the vast number of changes in master and realized what I had done before tagging and releasing.

My first thought was to revert the merge commit, but since it was a fast forward it wasn’t that simple. The feature branch had about a month’s worth of work in it and it would have been a pain to wade through all of those commits.

What is a developer to do?

Reflog to the rescue!

Reflog is basically a list of every single action performed on the repository. Specifically, the man pages say:

Reflog is a mechanism to record when the tip of branches are updated.

So anytime you commit, checkout or merge, an entry is entered into the reflog. This is important to remember, because it means that basically nothing is ever lost.

Here’s some sample output from the reflog command:

D:/Projects/reflog [develop]> git reflog
38ca8c4 HEAD@{0}: checkout: moving from feature/foo to develop
512e62c HEAD@{1}: commit: Now with 50% more foos!:
38ca8c4 HEAD@{2}: checkout: moving from develop to feature/foo

Some things:

  • The results here are listed in descending order – newest action is first
  • The first alphanumeric string is the commit hash of the result of the action – if the action is a commit it’s the new commit hash, if the action is a checkout it’s the commit hash of the new branch head, etc
  • The next column is the history of HEAD. So the first line (HEAD@{0}) is where HEAD is now, the second line is where head was before that, the third line is where head was before that, etc
  • The final column is the action along with any additional information – if the action is a commit, it’s the commit message, if the action is a checkout, it includes information about the to and from branches

Using that output, you can easily trace my footsteps (remember, descending order so we’re starting at the bottom):

  1. First, I checked out a feature branch
  2. I then committed 50% more foos
  3. Finally, I checked out the develop branch

So how do I get my data back?

It’s relatively easy – in most cases you can perform a checkout on the commit you want to get back, and branch from there.

Let’s pretend that, while in the develop branch, I somehow deleted my unmerged feature/foo branch. I can run git reflog to see the history of HEAD, and see that the last time I was on the feature/foo branch was on commit 512e62c. I can run git checkout 512e62c, then git branch branch-name:

D:/Projects/reflog [master]> git checkout 512e62c
Note: checking out '512e62c'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b new_branch_name

HEAD is now at 512e62c... First commit
D:/Projects/reflog [(512e62c...)]> git branch feature/foo
D:/Projects/reflog [(512e62c...)]> git checkout feature/foo
Switched to branch 'feature/foo'
D:/Projects/reflog [feature/foo]>

Notice how it said that I’m in a detached HEAD state. What that means is that HEAD is not pointing to the tip of a branch – we’re down the line of commits somewhere. However, at this point the files are checked out in my working copy and I am able to recover them. I can run git branch to create a branch from this commit, and continue working where I left off like nothing happened.

CommonMark only wants to help

I’m sure many of you have heard of Markdown, which is a plain text format for writing structured documents. Developed in 2004, it is in wide use throughout the internet and has simple syntax:

Heading
=======

Sub-heading
-----------
  
h3. Traditional html title
 
Paragraphs are separated
by a blank line.
 
Let 2 spaces at the end of a line to do a  
line break
 
Text attributes *italic*,
**bold**, `monospace`.
 
A [link](http://example.com).
<<<   No space between ] and (  >>>

Shopping list:
 
* apples
* oranges
* pears
 
Numbered list:
 
1. apples
2. oranges
3. pears
 
The rain---not the reign---in
 Spain.

Markdown was created by John Gruber, and the “spec” (if you can call it that) is just the initial implementation, written in Perl. Many other implementations have spawned for various languages, and they all used this initial implementation as the spec, even though it is buggy and therefore incredibly ambiguous.

CommonMark

CommonMark is an effort by a group of people to create an unambiguous Markdown spec.

We propose a standard, unambiguous syntax specification for Markdown, along with a suite of comprehensive tests to validate Markdown implementations against this specification. We believe this is necessary, even essential, for the future of Markdown.

Due to so many differing implementations and the wide usage throughout the internet, it’s impossible to know whether or not the Markdown you write on Reddit will work in a readme.md on Github. The goal of CommonMark is to make sure that it will.

I think it’s a great cause, and as I said in the coding horror discussion, nothing but good things can come out of this. Another member, however, reminded me of why I was commenting on that discussion in the first place:

Well, with the exception of this little spat, of course.

Oh yes, this little spat. The spat between the CommonMark team and John Gruber. Apparently John is not on board with the standardization of Markdown and has ignored requests to be on the team. CommonMark was started 2 years ago and originally requested that John join the project. They heard nothing from John for 2 years, until they announced the project with it’s original name of Standard Markdown. Apparently John thought the name was infuriating and insisted that it be changed. It is now known as CommonMark.

John appears to be 100% against this project and the standardization of Markdown.

Why?

The intent behind CommonMark is to make Markdown more consistent and make the internet as a whole a better place. This is being done because Markdown is highly regarded throughout the industry. It’s being done because people love Markdown and want to see it live long after many projects die.

Markdown has been neglected ever since shortly after it’s initial release. The last version, 1.0.1, was released on December 17, 2004. 10 freaking years ago. It’s fine to no longer have any interest in maintaining a project, but to not let people continue it’s development is beyond me.

I would love to hear from John on the reasoning behind his lack of interest in CommonMark. He may very well have good reasons and can set the record straight. But for now, I just don’t get it.

Coffee shops and programming… now with science!

Remember that scene in Family Guy, where there are two guys in a coffee shop and one asks if the other will watch him work? You don’t? Okay, fine:

Guy #2: Hey, getting some writing done there buddy?

Guy #1: Yeah, setting up in public so everybody can watch me type my big screenplay.

Guy #2: Me too. All real writers need to be seen writing otherwise what’s the point, right?

Guy #1: You should totally write that down!

Guy #2: Okay, will you watch me?

Funny? Yes. We’ve all seen these people in our local Starbucks – sitting, on their laptops, diligently working away for the world to see. Go home you have probably said under your breath. Nobody wants to see you arrogantly type out in public (I certainly have never said that, but it’s because I’m half Canadian and therefore 50% nicer than the average American).

But is there a reason people work in coffee shops, other than to show off their assiduous lifestyle?

Well, apparently it can help you be more creative.

Researchers at the University of Illinois conducted an experiment to determine how ambient noise can affect creativity:

This paper examines how ambient noise, an important environmental variable, can affect creativity. Results from five experiments demonstrate that a moderate (70 dB) versus low (50 dB) level of ambient noise enhances performance on creative tasks and increases the buying likelihood of innovative products. A high level of noise (85 dB), on the other hand, hurts creativity. Process measures reveal that a moderate (vs. low) level of noise increases processing difficulty, inducing a higher construal level and thus promoting abstract processing, which subsequently leads to higher creativity. A high level of noise, however, reduces the extent of information processing and thus impairs creativity.

The subjects were exposed to differing levels of ambient noise and they were tested on their creativity by taking Remote Associates Tests. The researchers found that a moderate level of noise (which they classify as ~70db) helps raise cognitive awareness and therefore increases your creativity. Coffee shops, if you haven’t guessed yet, are in that same decibel level range and make for perfect creativity booster.

This level of ambient noise keeps your brain at a state of heightened awareness, where it is always engaged and actively thinking, calculating, and processing data. It’s quiet enough that it’s not a distraction, and there’s enough different noises going on (people talking, footsteps, doors opening and closing, coffee grinding, milk steaming, etc) your brain can’t focus on one single noise, and therefore throws it in the background and tunes it out. Coffee shops are also “safe” – most people are comfortable in them and don’t worry about the people around them, which allows their mind to get absorbed in their work.

As programmers, we tip toe this weird world between art and science. We need the math and reasoning skills of a scientist but the creative process of an artist. Coffee shops can help stem the creative side if you’re having a hard time finding the artist inside.

But what about those times where you’re 3 blocks away from the nearest Starbucks and have hit the creative wall? Enter Coffitivity. The goal is to allow someone to throw some headphones on and simulate the experience of being in a coffee shop. They have a few different loops, ranging from morning coffee shops to university lunch hangouts.

I’ve been listening to it for a few weeks and so far I think it works pretty well. After the first few minutes I forget I have headphones on and quickly get engrossed in whatever tasks I’m working on. (I’m even listening to it as I write this)

My only qualm at the moment is that the loops seem to be pretty short. After a while it starts getting distracting hearing the same woman’s laugh every 10 minutes. Kind of annoying. But if you’re dying to hear a coffee shop in a pinch, you can’t beat it.

The case against EntityDataSource

Why does Microsoft insist on developing the EntityDataSource? I really don’t see the benefit. It’s just adding bloat to Entity Framework, especially since most people are moving away from Web Forms in favor of MVC. It was never even a good idea in the first place. It’s supposed to make data binding easier, but it ends up causing many problems.

It’s difficult to debug

It works great when it works, but when it doesn’t work… Ugh. When it breaks, it’s nearly impossible to determine the problem without blindly Googling around and trying things until they work, which is terrible. I once had to use SQL Server Profiler to monitor queries to determine the sql it had generated so I could properly debug the issue.

You also can’t see the results being returned without viewing the output on the page. Manually binding allows you to view the IEnumerable returned and manipulate the results further, if necessary.

It forces data access logic in your views

If you want a list of all Users in your database then great, add an EDS and grab all of them. But what if you need to filter them? Forget about it. You have to add a Where property and manually write SQL yourself. Or use the <WhereParameters> property with a bunch of verbose filters:

<WhereParameters>
    <asp:SessionParameter Name="Id" DbType="Int32" SessionField="Id" />
    <asp:SessionParameter Name="Name" DbType="String" SessionField="Name" />
    <asp:ControlParameter ControlID="txtCompany" DbType="String" 
      DefaultValue="" Name="Company" PropertyName="Company" />
</WhereParameters>

It’s a mess of text and it’s impossible to determine what you’re actually selecting. Compared to:

context.Tests.Where(t => t.Id = (int)Session["Id"] 
                                  && t.Name = (string)Session["Name"] 
                                  && t.Company = txtCompany.Text);

Much cleaner and much more readable.

This also forces data access to live in your views which is a violation of Separation of Concerns. Data access should live where it belongs – in the aptly named Data Access Layer.

It’s slow

As a quick example, I set up a page with an EntityDataSource selecting all columns from a table with 50,000 rows and putting them into a GridView. To compare, I also manually binded the GridView by selecting from the ObjectContext itself.

Here’s a sample of the code for the EDS:

<asp:EntityDataSource runat="server" ID="eds" ConnectionString="name=Test" 
    AutoGenerateWhereClause="True" DefaultContainerName="Test" 
    EntitySetName="Test" AutoSort="true" />
<asp:GridView runat="server" ID="gvEds" DataSourceID="eds" AllowPaging="True"></asp:GridView>

And the manual binding:

<asp:GridView runat="server" ID="gvNoEds" AllowPaging="true" PageSize="50"></asp:GridView>
var tests = context.Test.ToList();
gvNoEds.DataSource = tests;
gvNoEds.DataBind();

Results:

Entity Data Source Manual
0.020342 seconds 0.006623 seconds

Sure, both are fast, but this shows that the EDS is an order of magnitude slower than the manual binding. In a situation where there are many concurrent users with a lot more on a page, it could be 0.1 seconds compared to 0.01 seconds which is a noticable difference.

Kill it with fire

I’m really not sure why Microsoft insists on continuously supporting EntityDataSource. I see a slow, outdated control helper that abstracts too much while adding a lot of complexity. Let it die.

Why use strong and em over b and i?

One question I see around the interwebs a lot is why strong and em should be used over b and i. If we look at the HTML 4 spec, it lists b and i under the Font style section, and notes:

The following HTML elements specify font information. Although they are not all deprecated, their use is discouraged in favor of style sheets.

The strong and em tags are listed under the Phrase elements section and note:

Phrase elements add structural information to text fragments.

Now that’s all well and good, but what does it mean?

Among other things:

b and i are visual.

What this means is that in a web browser, when the html parser encounters a <b> tag, it knows to bold the font. It’s a physical property, meaning “use thicker ink when displaying this word.” Same with <i> – “skew this so it looks like it’s going really fast” (or something like that). These are tags whose sole purpose is to physically change the display of the text.

Okay. Great.

Well, maybe not. What about when a blind person views a page? The visual properties mean nothing to them. That’s where em and strong come in.

em and strong are semantic

The em tag indicates emphasis and the strong tag indicates stronger emphasis. This could (and usually does) mean italics and bold on a web page. But it also could alert a text-to-speech program to use a different tone of voice when encountering the text. They have meaning behind them, and that meaning means different things to different interpreters.

As noted by the HTML 4 spec, b and i, although not deprecated, should be avoided because not only are they style properties that should be handled in CSS, they don’t have any semantic meaning. Use strong and em in their place.