MB vs. MiB

By jason, 29 May, 2009

So I'm reading Slashdot this afternoon when I come across this quote in someone's signature: "KB? SI units are meant to be computationally convenient, not arbitrarily assigned." This got me wondering. Apart from the differences between units and prefixes, where's the boundary between computationally convenient and arbitrarily assigned?

As long as the metric system has been around, they've had the "computationally convenient" notion attached to them. Ten was a natural choice for a multiplier because its something we humans can easily deal with. All you have to do to move from a measurement with one prefix to a measurement with another is move the decimal point.

Then computers came along, with a new unit, albeit a non-SI unit, for defining storage capacity, a pseudo-volume if you will. Along with computers came the byte, or octet, depending on your language of choice. Unlike humans, computers deal with powers of 2 far easier than they handle powers of 10. As memory capacity expanded, the computing industry needed better units for larger quantities of memory; no one wants to say "I've got 1,048,576" bytes of memory in my computer. It sounds ridiculous. Conveniently the metric system had a suitable set of prefix, and the kilo-, mega-, and gigabyte were born, meaning 2^10, 2^20, and 2^30, instead of 10^3, 10^6, and 10^9 as they do when applied to SI units.

However, we have some ambiguity with this, since kilo-, mega-, and giga- are used in their base-10, SI sense when applied to serial line transmission speeds. a 56Kbps literally transmits at 56,000 bits per second, and a 1.544Mbps T1 line is literally, 1,544,000 bits per second. Disk drives further magnified the problem. A 5.25" MS-DOS formatted 360KB floppy disk is laid out at 2 sides, by 40 tracks per side, by 9 sectors per track, by 512 bytes per sector. This works out to 368,640 bytes, or 360*1024 bytes. When the 3.5" 1.44MB floppy disk was introduced, it doubled the number of tracks per side as well as the number of sectors per track, yielding 1440*1024 bytes. Realistically, the 1.44MB should have been 1.40625MB on a binary convention, or 1440KB, not 1.44MB. Hard drive manufacturers, never had a standard. Some originally chose to use the SI base-10 approach, others used the base-2 approach. As far as I know, there are no manufacturers using the base-2 approach anymore, since 1,000,000,000,000 bytes is cheaper to manufacture than 1,099,511,627,776 bytes are, while still meeting a plausible legal definition of a terabyte. Obviously not a usable situation.

Ever look at the packaging with your hard drive or your computer? Somewhere in there is a disclaimer that the capacities of the drives are computed in base-10, albeit using simpler terms. Try finding a similar disclaimer for the amount of RAM that computer has. They don't exist. Why not? Because the computer industry has grown up with the presumption that kilo-, mega-, and giga- imply base-2 numbering for things like RAM. Given the nature of how computers are organized, selling a memory stick with exactly 1 billion bytes of RAM would create enormous headaches for the hardware that has to map that memory, assuming that the designers didn't want to leave 73,741,824 bytes of address space wasted for every gig of RAM in the system.

So, in 1998, IEC Technical Committee 25 proposed the binary prefixes: kibi-, mebi-, gibi-, tebi, etc. The end result, hard drive manufacturers get their product specs right (although despite these standards, their lawyers still feel the need to note on the packaging how the capacity is calculated; odd isn't it?). So when I go to the store and buy a computer with a 160GB hard drive and 2GB of RAM (which i did with my laptop 8 or 9 months ago), I can be assured that I'm getting 160,000,000,000 bytes of disc space, and 2,000,000,000 bytes ofmemory.

Not so fast there.

Yes, I got a hard drive within a few thousand bytes of 160 billion. But if I look at the amount of RAM it has, lo and behold its got 2,147,483,648 bytes. I was expecting 2 billion and instead I got 7.3% more memory than I paid for assuming we use the IEC/IEEE/NIST approved definitions of the prefixes. As a red-blooded American, I wonder who I can sue over that. I'm still emotionally traumatized over the matter.

Interestingly enough, if you were to investigate a Timeline of binary prefixes, there are several references to ACM and publications throughout the 1960's and 1970's where kilo- and mega- refer to multiples of 1024 and 1024*1024 respectively. What's even more interesting is that it took from 1998 to 2001 before a paper was published that used the new units, three additional years before the IEEE recognized them, and another 4 years beyond that before NIST mandated their use.

So, back to the quote in the signature that I read, "KB? SI units are meant to be computationally convenient, not arbitrarily assigned." First off,  SI recognizes neither the byte nor the bit as official SI units. As a result, the SI definition of the prefix is quite irrelevant to the units of bit and byte, notwithstanding the common terminology. If the poster knew this, he wouldn't have that quote as a signature. Now as far as arbitrarily assigned and computationally convenient are concerned, it seems to me that 1024, or 1,048,576 are far more computationally convenient to the computer than they are to the human. Just as we move between powers of 10 by moving a decimal point, the computer can much more efficiently multiply by powers of two by moving the "binary point"; i.e., bit-shifting. Using SI prefixes to represent even powers of ten in the field of computing therefore seems far less computationally convenient and far more arbitrarily assigned than using them to represent powers of two.

Let's not get started down the path of how they're spoken. Mebibytes? Please! Mebbe It's a byte, mebbe it isn't a byte. We just don't know. Even though we're not yet there in terms of data capacity, exbibyte sounds even more ridiculous. Imagine if these prefixes had been in use when Exabyte Corporation came into existence.

Quite frankly, it isn't a standards body that determines common use of a unit of measurement; its the people that actually use those units that define them. Thus far, the hard drive industry has adopted the SI units, with appropriate disclaimers. They have for a while. It makes economic sense to them, and consumers are accustomed to that usage. Memory vendors use SI prefixes, without disclaimers, and customers are just as accustomed to that usage. System builders, as a result of the decisions of both of the afore-mentioned manufacturers use both, with appropriate disclaimers for the disk drives in their systems.

Perhaps the biggest kick in the face, however, of the IEC standard for prefixes is that Microsoft still uses SI prefixes to display file sizes in Windows. I don't expect this to change any time soon, nor do I suspect that it even could change. The Windows-using computing population expects after years of conditioning that kilo and mega have their binary meanings regardless of what a standards body that is otherwise irrelevant to them may say. Yes, you can argue that those standards are relevant to the lay person, and I certainly wouldn't argue against that, however, the average computer user has no clue about standards, and really doesn't have the need or desire to know about them as long as once everything is plugged in and turned on it runs. As long as that's the case, whatever standards may apply are irrelevant to the user.

For me, a kilobyte is, always has been, and always will be 1024 bytes, and a megabyte will always be 1,048,576 bytes. Mebi I'm just old-school, which is crazy considering I got my Computer Science degree after these prefixes were introduced, but old habits die hard, standards be damned.

Blog comments