In this lesson
Two ways to compute a probability
The previous lesson named events and sample spaces. This one computes their probabilities. The curriculum is precise about TWO different formulas, one based on theory and one based on observation.
The theoretical probability of an event is computed from the structure of the experiment, assuming all outcomes are equally likely:
For a fair die, P(rolling a 4) = 1/6 because the sample space has
six equally likely outcomes and exactly one of them (rolling a 4)
favours the event. No experiment required.
The experimental probability of an event is computed from data observed across many trials of the experiment:
If you roll a die 60 times and get a 4 on 12 of them, your
experimental probability of rolling a 4 is 12 / 60 = 0.20. The
theoretical probability is 1/6 ≈ 0.167. They don't match
exactly, but they're close, and over more trials they'd get
closer.
Probability comes in four equivalent forms
The curriculum names four representations of the same probability value: ratio, fraction, decimal, and percentage. Each one says the same thing differently.
One probability, four equivalent forms
Ratio
3 : 8
Fraction
3 / 8
Decimal
0.375
Percentage
37.5 %
3/8 = 0.375 = 37.5%. Three different forms, one value. Use
whichever fits the next step in your work:
- Ratio / fraction is useful when the favourable and total
outcomes are small and exact (
3/8is cleaner than0.375). - Decimal is useful when you're computing: multiplying probabilities, comparing magnitudes.
- Percentage is useful for everyday communication ("a 37% chance" reads naturally).
The conversion between forms uses the same rules from Lesson 12 (Percent: Discounts, Taxes, Tips): divide / multiply by 100 to move between decimal and percent; simplify the fraction or convert via long division to move between fraction and decimal.
Try it yourself first
Before any simulator runs hundreds of flips for you, flip ten for real. Grab a coin, commit to a prediction, and record what the coin actually does — not what you expected it to do. When your ten are done, the recorder lines your result up against a typical class.
Real coin first
Dig a real coin out of your pocket and flip it ten times. After each flip, record what actually landed — the buttons are for the coin's answer, not yours. No coin handy? The on-screen flip uses the same 50/50.
Before trial 1, commit: what should 10 flips of a fair coin produce?
What to notice: different students doing 10 flips get different results. Some will see 7 heads, others 4, others 6. None of those results means the coin is biased. Ten trials is too few to draw a conclusion. The variability is expected and normal.
Predict before you run
The curriculum's Skills Bullet 2 is explicit: predict the experimental probability of an event, using theoretical probability.
That direction, theoretical to experimental, is the working order. You compute the theoretical value first, then run the experiment, then check how close the experimental result is to your prediction. Doing it the other way (running first, computing theoretical after) misses the prediction step entirely.
The simulator below enforces predict-then-run, and it doesn't hand you the answer: each setup is described by its structure — "2 win sectors of 10" — and it's your job to turn that into a probability. The true value stays hidden until your first run, so your prediction stands alone on the plot until the data arrives.
Two more things live in there. A "new 10-trial experiment" button stacks fresh 10-trial strips side by side — run it several times and compare them before you trust any single strip. And the first big run will find your longest losing streak and make you an offer about the next flip. Take the bet seriously; it's the oldest trap in probability.
Predict, then run
Pick a setup. Work out P(win) from its structure — favourable over total — and enter your prediction. Then run trials and see how the data treats your number.
The true value stays hidden until your first run. Your prediction is on its own out there.
Enter a prediction before running.
Trials so far: 0
Experimental P(win): —
10-trial experiments, side by side
Three things to notice:
- At 10 trials, experimental P fluctuates wildly. With a fair coin, you might see anything from 0.2 to 0.8. The convergence hasn't kicked in.
- At 100 trials, experimental P is usually within a few percent of the theoretical value. The convergence is visible.
- At 1000 trials, experimental P is within about a percentage point of the theoretical value almost every time. The convergence is complete.
This is the law of large numbers in action. Over many independent trials, experimental probability converges to theoretical probability. The curriculum's Understanding captures it: "Over a large number of trials, experimental probability models theoretical probability."
The sum-to-one rule
For any experiment, the probabilities of every possible outcome in the sample space must sum to exactly 1:
That gives a useful shortcut. If you know the probability of an event, the probability of its complement (the event NOT happening) is whatever's left over from 1:
So if a weather forecast says P(rain) = 0.7, then
P(no rain) = 1 − 0.7 = 0.3. You don't need to know what "no rain"
covers in detail. Every outcome in the sample space is either
rain or not-rain, and the two probabilities must sum to 1.
This works for any experiment. For a die, P(rolling a 4) = 1/6,
so P(NOT rolling a 4) = 1 − 1/6 = 5/6. For a deck of cards,
P(drawing a heart) = 1/4, so P(drawing a non-heart) = 3/4. The
rule doesn't care about the details of the experiment.
One trial is unpredictable; many are predictable
Here's the part that messes with intuition. The outcome of any single trial is unknown before it occurs (except for certain or impossible events, but those aren't interesting). You can compute the probability of heads on a fair coin to arbitrary precision, but you still can't say whether the NEXT flip will be heads or tails. Knowing the probability is not the same as knowing the outcome.
What probability DOES tell you: over many trials, the proportion of favourable outcomes will converge to the theoretical value. The aggregate is predictable; the individual trial isn't.
A common trap follows from confusing these two: after five tails
in a row, the gambler's fallacy insists the coin is "due" to
land on heads on the sixth flip. But each flip is independent.
The coin has no memory; past results don't change the next
trial's probability. After five tails, P(heads on the next flip)
is still 0.5 for a fair coin. The law of large numbers does its
work over the next 995 flips, not by "correcting" the next one.
Where it shows up in real life
A weather forecast that says "60% chance of snow tomorrow"
is reporting an experimental probability, derived from
historical data on days with similar conditions in Calgary: what
fraction had snow the next day. The "60%" is P_experimental = favourable / total where each historical day is one trial.
A batting average in baseball is exactly the experimental
probability of getting a hit at the plate. A .300 batting
average means 30% of at-bats produced a hit. Across hundreds of
trials, the experimental probability has stabilized to a value
that's a good predictor of the next at-bat (but doesn't determine
it).
Insurance pricing is built on theoretical and experimental probabilities of events like car accidents, house fires, or illness. Insurers compute the experimental probability across millions of policy-holders (their "sample space"), then set premiums so that expected payouts plus profit equal premiums collected. Probability isn't abstract here; it's how rates are set.
A lottery is the inverse case: tickets are priced so the theoretical probability of winning is far below what would make the expected payout positive. The math is identical to insurance, just running the other direction.
Worksheet
These aren't graded. Get them right, get them wrong. The goal is fluency with both formulas, the four representational forms, the sum-to-one rule, and the independence of trials.
Practice · Not graded
MA.7.PRO.1Practice the idea
01 / 10
Without rolling the die at all, you compute P(rolling a 4) on a fair six-sided die as 1/6. What kind of probability did you compute?
Multiple choice: identify which kind of probability was computed from structure alone.Show common mistakes
Student says
“'The coin has landed on tails five times; heads is due.' (Gambler's fallacy.)”
What it reveals
Treats trials as if they had memory. Each flip of a fair coin is independent; past results don't change the next trial's probability. The coin doesn't 'owe' you anything.
Targeted response
Each flip is independent. After five tails, P(heads on next flip) is still 0.5 for a fair coin. The law of large numbers does its work over the next 995 flips, not by correcting the next one. Use the CoinConvergenceSimulator above: run 1000 flips and watch the proportion settle near 0.5, even when individual streaks look 'unfair' along the way.
Student says
“Claims a coin is biased based on 12 flips. 'I got 10 heads and 2 tails. The coin is rigged.'”
What it reveals
Trusts small-sample experimental probability as reliable. With 12 flips, an experimental probability of 10/12 is well within the range of normal variation for a fair coin. Small samples are noisy; you need many trials to detect real bias.
Targeted response
Stack a few 10-trial strips in the simulator above with the coin setup. The win counts bounce around from strip to strip. That's not bias; it's normal small-sample variability. To claim bias, you'd need hundreds or thousands of flips with experimental P consistently far from 0.5.
Student says
“Mixes up the formulas: uses 'total trials' as the denominator for theoretical probability, or 'total outcomes in sample space' as the denominator for experimental.”
What it reveals
The two formulas LOOK alike (both are favourable / something), but the denominator is different. Theoretical uses the count of POSSIBLE outcomes; experimental uses the count of ACTUAL trials run.
Targeted response
Theoretical = favourable / total possible outcomes (a calculation from the experiment's structure). Experimental = favourable observed / total trials run (a measurement from data). The denominators are conceptually different even though both formulas have the same shape. Always check: did the trial actually happen, or are we counting what COULD happen?
Going further
This is the end of the Grade 7 Probability strand, and the end of the Grade 7 mathematics curriculum as a whole. Together with Likelihood and Sample Space, you now have the full Grade 7 toolkit: name the sample space, identify favourable outcomes, compute theoretical probability from the structure, run trials to get experimental probability, and watch the two converge.
In Grade 8, compound events add the next layer. What's the probability of two events both happening, or at least one happening? Probability trees, multiplication rules for independent events, and addition rules for mutually exclusive events all build on the foundation you just laid.
In Grade 9, conditional probability asks: given that event A happened, what's the probability of event B? That's where most real-world probability lives: diagnostic test interpretation, weather forecasts conditional on yesterday's weather, financial risk assessments. The Grade 7 toolkit is what makes the Grade 9 toolkit make sense.
The four representational forms (ratio, fraction, decimal, percentage) and the sum-to-one rule reappear in every later grade. Get fluent with both, and the next two years' worth of probability sits on stable ground.