Jeremy Kun, a mathematics PhD student at the University of Illinois in Chicago, has posted a wonderful primer on probability theory for programmers on his blog. It's a subject vital to machine learning and data-mining, and it's at the heart of much of the stuff going on with Big Data. His primer is lucid and easy to follow, even for math ignoramuses like me.

For instance, suppose our probability space is and is defined by setting for all (here the “experiment” is rolling a single die). Then we are likely interested in more exquisite kinds of outcomes; instead of asking the probability that the outcome is 4, we might ask what is the probability that the outcome is *even*? This event would be the subset , and if any of these are the outcome of the experiment, the event is said to *occur*. In this case we would expect the probability of the die roll being even to be 1/2 (but we have not yet formalized why this is the case).

As a quick exercise, the reader should formulate a two-dice experiment in terms of sets. What would the probability space consist of as a set? What would the probability mass function look like? What are some interesting events one might consider (if playing a game of craps)?

Probability Theory — A Primer

(*Image: Dice, a Creative Commons Attribution (2.0) image from artbystevejohnson's photostream*)

Matthew Borgatti, purveyor of such Boing Boing favorites as the Guy Fawkes Bandanna, the War Boy Bandanna, and the Lockpick Earrings, offers you your choice of his wares at at 20% discount, with the coupon code “jackhammerjill.”

Boing Boing readers already know about MakieLab, the startup where my friends and I make 3D printed, customizable dolls called Makies.

Sgt Crispy writes, “XKCD creator Randall Munroe, has made a spiffy little hoverboard game. Looks to be small, however, when you realize that boundaries are made to be broken, A massive world opens up to be explored.”

Today and tomorrow only we are offering an additional 15% off the entire Boing Boing store (some exclusions may apply). Simply use coupon code: BLACKFRIDAY at checkout! Below are a few of our favorites from the store: First Generation Lytro 16GB Camera: The First Consumer Camera to Capture the Entire Light FieldAdobe Training Videos: Lifetime Subscription: 6,000+ Adobe […]

Today only in the Boing Boing Store we are offering an extra 15% off of the below VPN deals just use coupon code: VPN15 at checkout. proXPN VPN: Premium Lifetime Subscription Surf the web with ultimate peace of mind – both at home and on the road – over proXPN’s fully-encrypted, lightning-fast servers. Your lifetime premium subscription […]

These knitted gloves are here to save the day (and your hands) with an ultra-comfy, double-layer that will allow you to stay warm and use your phone. Now you can take photos on the fly, text, Tinder, and more without letting freezing temperatures get in your way. Plus they work with all touchscreens, so no […]

Combinatorics is the devil, and probably the reason for my current wretched pay grade.

Someone give that die owner a q-tip.

Not to mention: the drilled holes are bad news. They make the die unbalanced.

I had several years of probability and statistics, back in the 20th century. The formal treatment with set theory is a great way to make a difficult subject much, much more difficult. Take the example above – it’s a die, for chrissakes, only a mathematician could make it so confusing while not shedding any extra light.

It would make much more sense for programmers to take the probability offerings from the engineering school. Same content, but much more practical.

Agreed. I’ve done Ph.D. level statistics. This reads like a sloppy version of Casella and Berger. Just read the real thing if it’s going to be presented in this manner. I was hoping for something new.

Plus, if you’re going to go the formal math route,use proper math typesetting! Copy and pasting math symbols looks bad.

All I currently use about statistics is what I learned from playing Champions. 3d6 FTW!

I agree with @Boundgear. Formal treatment with set theory is no way to “prime” anybody for anything.

“Giant zigzag, sitting on two symbols separated by a scarab’s butt.” The diagram is three lines high, and displayed as an image.

It means:

FOREACH (o IN O)

x += f(o)

…which is still unacceptable. It’s TERRIBLE. It would quite possibly get a programmer fired, but is considered perfectly acceptable for math professors.

Mathematicians think mathematical notation is acceptable in teaching. It is NOT acceptable, for several reasons.

1) It is not copy-pastable.

2) It is not searchable – Google is blind to these things: pasting in an omega will not give you what you want.

3) It is not linkable. So someone seeing a sum or union symbol has literally NO way of finding out what they mean. An arrow pointing to the right… what does that mean? This arrow has two bars. Hrm. Nope. I got nothing.

4) They feel it is sufficient to provide an explanation in these esoteric alchemical symbols; therefore many wiki pages are left incomplete, where otherwise an intelligible explanation might be given; thus they reduce the sum of human knowledge.

5) They choose the world’s worst variable names. The above code is NOT acceptable. When they write programs, they write them with global variables ‘a’ through ‘z’, then start again at ‘aa’, ‘ab’ – this is not a rarity, it is a commonplace amongst math professors in most faculties. It is how they think: without clarity or any need to explain their workings.

eventProbability = 0

FOREACH (outcome IN eventOutcomeList)

eventProbability += probabilityOf(outcome)

Now that’s getting closer to the right way to do it. Without esoteric one-letter symbols to represent things, you stand a chance of being able to see at a glance what is going on.

Mathemagical symbols were designed for writing quickly on blackboards. They are the WRONG TOOL for writing on computers.

“But Dewi, we have no better tool” – what, other than English, every other written language, and every single programming language? Well, if you’re not willing to use any of those, then perhaps design a better terminology, then, and push for it to be accepted. But I can guarantee there is a clear and concise terminology in most programming languages to represent every one of those concepts, and that this terminology will at least be copyable, linkable, and searchable.

And don’t cry that the terminology is unfamiliar and doesn’t have global acceptance in the mathematical community: that is an EXCELLENT feature, for it will mean that you will link to somewhere that clearly defines all your symbols in the language of the rest of your article, and you will actually pick decent variable and function names, instead of arbitrary letters!

So how about a link to your clear and concise universal grammar/language that all can use to do everything. I personally know several gurus who would be totally willing to throw mad mad money in your (that/this) direction for such a solution to this very general problem. Seriously, dude, we’re waiting.