Probability is the mathematical language of uncertainty. Grounded in set theory and measure theory, it provides the foundation for statistics, machine learning, information theory, and every rational approach to inference.
Sample Spaces and Events
Definition — Sample Space
The sample spaceΩ is the set of all possible outcomes of a random experiment. An event is any subset A⊆Ω.
Fair coin flip
Ω={H,T}
Event: A={H}
Rolling a die
Ω={1,2,3,4,5,6}
Event: A={2,4,6} (even)
Since events are sets, set operations apply: A∪B is the event that A or B occurs; A∩B is both; Ac is the event that A does not occur.
Kolmogorov's Axioms
Andrey Kolmogorov (1933) placed probability on a rigorous axiomatic foundation. A probability measure is a function P:F→[0,1] satisfying:
1.
Non-negativity
P(A)≥0 for all events A
2.
Normalization
P(Ω)=1
3.
Additivity
P(A∪B)=P(A)+P(B) if A∩B=∅
From these three axioms, all of probability theory follows. Key consequences:
P(∅)=0P(Ac)=1−P(A)P(A∪B)=P(A)+P(B)−P(A∩B)
Conditional Probability
Definition — Conditional Probability
The probability of event A given that event B has occurred (with P(B)>0):
P(A∣B)=P(B)P(A∩B)
Conditioning restricts the sample space to B and re-normalizes. The multiplication rule follows directly:
P(A∩B)=P(A∣B)P(B)=P(B∣A)P(A)
Law of Total Probability
If B1,B2,…,Bn partition Ω (mutually exclusive, exhaustive), then for any event A:
P(A)=i=1∑nP(A∣Bi)P(Bi)
Independence
Definition — Independence
Events A and B are independent if knowing B occurred gives no information about A:
P(A∩B)=P(A)P(B)equivalently,P(A∣B)=P(A)
Independence and mutual exclusivity are very different concepts. If P(A)>0 and P(B)>0, then A and B cannot be both independent and mutually exclusive.
Bayes' Theorem
Theorem — Bayes
P(A∣B)=P(B)P(B∣A)P(A)
Using the law of total probability to expand the denominator:
P(A∣B)=P(B∣A)P(A)+P(B∣Ac)P(Ac)P(B∣A)P(A)
Bayes' theorem is the engine of Bayesian inference: it tells us how to update a prior belief P(A) in light of new evidence B to obtain a posterior P(A∣B).
Classic example: a medical test for a disease with 1% prevalence. Sensitivity (true positive rate) = 99%, specificity (true negative rate) = 95%. If you test positive, Bayes' theorem gives P(disease∣+)≈16.7% — far lower than intuition suggests, because the disease is rare.
Random Variables
Definition — Random Variable
A random variableX is a function X:Ω→R that assigns a numerical value to each outcome. It is discrete if it takes countably many values; continuous if described by a probability density function (PDF).
Discrete: PMF
The probability mass function of a discrete random variable satisfies:
p(x)=P(X=x)≥0x∑p(x)=1
Continuous: PDF and CDF
A continuous random variable has a probability density functionf(x)≥0 with:
P(a≤X≤b)=∫abf(x)dx∫−∞∞f(x)dx=1
The cumulative distribution functionF(x)=P(X≤x) is non-decreasing, right-continuous, with F(−∞)=0 and F(∞)=1.
Expectation and Variance
Definition — Expected Value
The expected value (mean) of X is its probability-weighted average: