Type to search lessons across every grade and subject.

MATH · GRADE 7Probability

Theoretical and Experimental Probability

Your 10 flips say 0.7. The math says 0.5. Which one is lying?

Grade 7
0.5the math…0.7your 10 flips…?which one is lying…
In this lesson

Two ways to compute a probability

The previous lesson named events and sample spaces. This one computes their probabilities. The curriculum is precise about TWO different formulas, one based on theory and one based on observation.

The theoretical probability of an event is computed from the structure of the experiment, assuming all outcomes are equally likely:

For a fair die, P(rolling a 4) = 1/6 because the sample space has six equally likely outcomes and exactly one of them (rolling a 4) favours the event. No experiment required.

The experimental probability of an event is computed from data observed across many trials of the experiment:

If you roll a die 60 times and get a 4 on 12 of them, your experimental probability of rolling a 4 is 12 / 60 = 0.20. The theoretical probability is 1/6 ≈ 0.167. They don't match exactly, but they're close, and over more trials they'd get closer.

Probability comes in four equivalent forms

The curriculum names four representations of the same probability value: ratio, fraction, decimal, and percentage. Each one says the same thing differently.

One probability, four equivalent forms

Ratio

3 : 8

Fraction

3 / 8

Decimal

0.375

Percentage

37.5 %

Each form represents the same value. Use whichever fits the next step in your work.

3/8 = 0.375 = 37.5%. Three different forms, one value. Use whichever fits the next step in your work:

  • Ratio / fraction is useful when the favourable and total outcomes are small and exact (3/8 is cleaner than 0.375).
  • Decimal is useful when you're computing: multiplying probabilities, comparing magnitudes.
  • Percentage is useful for everyday communication ("a 37% chance" reads naturally).

The conversion between forms uses the same rules from Lesson 12 (Percent: Discounts, Taxes, Tips): divide / multiply by 100 to move between decimal and percent; simplify the fraction or convert via long division to move between fraction and decimal.

Try it yourself first

Before any simulator runs hundreds of flips for you, flip ten for real. Grab a coin, commit to a prediction, and record what the coin actually does — not what you expected it to do. When your ten are done, the recorder lines your result up against a typical class.

Real coin first

Dig a real coin out of your pocket and flip it ten times. After each flip, record what actually landed — the buttons are for the coin's answer, not yours. No coin handy? The on-screen flip uses the same 50/50.

A real-coin trial recorder. The student first commits to a prediction about what 10 flips of a fair coin should produce, then flips a real coin (or uses the built-in on-screen flip) and records each outcome. A running tally and experimental probability accumulate. At 10 trials a completion panel echoes the prediction against the actual count and shows the result alongside eight typical classmate results.

Before trial 1, commit: what should 10 flips of a fair coin produce?

What to notice: different students doing 10 flips get different results. Some will see 7 heads, others 4, others 6. None of those results means the coin is biased. Ten trials is too few to draw a conclusion. The variability is expected and normal.

Predict before you run

The curriculum's Skills Bullet 2 is explicit: predict the experimental probability of an event, using theoretical probability.

That direction, theoretical to experimental, is the working order. You compute the theoretical value first, then run the experiment, then check how close the experimental result is to your prediction. Doing it the other way (running first, computing theoretical after) misses the prediction step entirely.

The simulator below enforces predict-then-run, and it doesn't hand you the answer: each setup is described by its structure — "2 win sectors of 10" — and it's your job to turn that into a probability. The true value stays hidden until your first run, so your prediction stands alone on the plot until the data arrives.

Two more things live in there. A "new 10-trial experiment" button stacks fresh 10-trial strips side by side — run it several times and compare them before you trust any single strip. And the first big run will find your longest losing streak and make you an offer about the next flip. Take the bet seriously; it's the oldest trap in probability.

Predict, then run

Pick a setup. Work out P(win) from its structure — favourable over total — and enter your prediction. Then run trials and see how the data treats your number.

A predict-then-run probability simulator. The student picks a setup described only by its structure — a coin with 1 winning face of 2, a spinner with 2 win sectors of 10, a marble bag with 7 win marbles of 10, or a midway wheel with 9 win slots of 10 — computes and enters a prediction for P(win), then runs 10, 100, or 1000 trials. The true probability line stays hidden until the first run. A line graph plots cumulative experimental P(win); a separate panel stacks independent 10-trial experiment strips side by side to show small-sample variability; and after the first run the widget pauses on the longest losing streak and asks the student to bet on the next trial's probability, confronting the gambler's fallacy directly.
00.250.50.751Trial number (0–10)P(win)

The true value stays hidden until your first run. Your prediction is on its own out there.

Setup

Enter a prediction before running.

Trials so far: 0

Experimental P(win):

10-trial experiments, side by side

Three things to notice:

  1. At 10 trials, experimental P fluctuates wildly. With a fair coin, you might see anything from 0.2 to 0.8. The convergence hasn't kicked in.
  2. At 100 trials, experimental P is usually within a few percent of the theoretical value. The convergence is visible.
  3. At 1000 trials, experimental P is within about a percentage point of the theoretical value almost every time. The convergence is complete.

This is the law of large numbers in action. Over many independent trials, experimental probability converges to theoretical probability. The curriculum's Understanding captures it: "Over a large number of trials, experimental probability models theoretical probability."

The sum-to-one rule

For any experiment, the probabilities of every possible outcome in the sample space must sum to exactly 1:

That gives a useful shortcut. If you know the probability of an event, the probability of its complement (the event NOT happening) is whatever's left over from 1:

So if a weather forecast says P(rain) = 0.7, then P(no rain) = 1 − 0.7 = 0.3. You don't need to know what "no rain" covers in detail. Every outcome in the sample space is either rain or not-rain, and the two probabilities must sum to 1.

This works for any experiment. For a die, P(rolling a 4) = 1/6, so P(NOT rolling a 4) = 1 − 1/6 = 5/6. For a deck of cards, P(drawing a heart) = 1/4, so P(drawing a non-heart) = 3/4. The rule doesn't care about the details of the experiment.

One trial is unpredictable; many are predictable

Here's the part that messes with intuition. The outcome of any single trial is unknown before it occurs (except for certain or impossible events, but those aren't interesting). You can compute the probability of heads on a fair coin to arbitrary precision, but you still can't say whether the NEXT flip will be heads or tails. Knowing the probability is not the same as knowing the outcome.

What probability DOES tell you: over many trials, the proportion of favourable outcomes will converge to the theoretical value. The aggregate is predictable; the individual trial isn't.

A common trap follows from confusing these two: after five tails in a row, the gambler's fallacy insists the coin is "due" to land on heads on the sixth flip. But each flip is independent. The coin has no memory; past results don't change the next trial's probability. After five tails, P(heads on the next flip) is still 0.5 for a fair coin. The law of large numbers does its work over the next 995 flips, not by "correcting" the next one.

Where it shows up in real life

A weather forecast that says "60% chance of snow tomorrow" is reporting an experimental probability, derived from historical data on days with similar conditions in Calgary: what fraction had snow the next day. The "60%" is P_experimental = favourable / total where each historical day is one trial.

A batting average in baseball is exactly the experimental probability of getting a hit at the plate. A .300 batting average means 30% of at-bats produced a hit. Across hundreds of trials, the experimental probability has stabilized to a value that's a good predictor of the next at-bat (but doesn't determine it).

Insurance pricing is built on theoretical and experimental probabilities of events like car accidents, house fires, or illness. Insurers compute the experimental probability across millions of policy-holders (their "sample space"), then set premiums so that expected payouts plus profit equal premiums collected. Probability isn't abstract here; it's how rates are set.

A lottery is the inverse case: tickets are priced so the theoretical probability of winning is far below what would make the expected payout positive. The math is identical to insurance, just running the other direction.

Worksheet

These aren't graded. Get them right, get them wrong. The goal is fluency with both formulas, the four representational forms, the sum-to-one rule, and the independence of trials.

Practice · Not graded

MA.7.PRO.1

Practice the idea

01 / 10

Without rolling the die at all, you compute P(rolling a 4) on a fair six-sided die as 1/6. What kind of probability did you compute?

Multiple choice: identify which kind of probability was computed from structure alone.
Show common mistakes

Student says

'The coin has landed on tails five times; heads is due.' (Gambler's fallacy.)

What it reveals

Treats trials as if they had memory. Each flip of a fair coin is independent; past results don't change the next trial's probability. The coin doesn't 'owe' you anything.

Targeted response

Each flip is independent. After five tails, P(heads on next flip) is still 0.5 for a fair coin. The law of large numbers does its work over the next 995 flips, not by correcting the next one. Use the CoinConvergenceSimulator above: run 1000 flips and watch the proportion settle near 0.5, even when individual streaks look 'unfair' along the way.

Student says

Claims a coin is biased based on 12 flips. 'I got 10 heads and 2 tails. The coin is rigged.'

What it reveals

Trusts small-sample experimental probability as reliable. With 12 flips, an experimental probability of 10/12 is well within the range of normal variation for a fair coin. Small samples are noisy; you need many trials to detect real bias.

Targeted response

Stack a few 10-trial strips in the simulator above with the coin setup. The win counts bounce around from strip to strip. That's not bias; it's normal small-sample variability. To claim bias, you'd need hundreds or thousands of flips with experimental P consistently far from 0.5.

Student says

Mixes up the formulas: uses 'total trials' as the denominator for theoretical probability, or 'total outcomes in sample space' as the denominator for experimental.

What it reveals

The two formulas LOOK alike (both are favourable / something), but the denominator is different. Theoretical uses the count of POSSIBLE outcomes; experimental uses the count of ACTUAL trials run.

Targeted response

Theoretical = favourable / total possible outcomes (a calculation from the experiment's structure). Experimental = favourable observed / total trials run (a measurement from data). The denominators are conceptually different even though both formulas have the same shape. Always check: did the trial actually happen, or are we counting what COULD happen?

Going further

This is the end of the Grade 7 Probability strand, and the end of the Grade 7 mathematics curriculum as a whole. Together with Likelihood and Sample Space, you now have the full Grade 7 toolkit: name the sample space, identify favourable outcomes, compute theoretical probability from the structure, run trials to get experimental probability, and watch the two converge.

In Grade 8, compound events add the next layer. What's the probability of two events both happening, or at least one happening? Probability trees, multiplication rules for independent events, and addition rules for mutually exclusive events all build on the foundation you just laid.

In Grade 9, conditional probability asks: given that event A happened, what's the probability of event B? That's where most real-world probability lives: diagnostic test interpretation, weather forecasts conditional on yesterday's weather, financial risk assessments. The Grade 7 toolkit is what makes the Grade 9 toolkit make sense.

The four representational forms (ratio, fraction, decimal, percentage) and the sum-to-one rule reappear in every later grade. Get fluent with both, and the next two years' worth of probability sits on stable ground.