Consider an experiment E which can produce multiple outcomes, and there is uncertainty as to which one will be producted at a given time. Let the set of outcomes be denoted by Ω. [Note that E is not a function; there is no way to have uncertainty in the output of a function].
Let Ω have a measure P defined on it (the probability of an event) and a σalgebra of measureable subsets ( the events ). We require the following conditions:
A map
X: Ω → 
So calling it a "variable" is technically incorrect, but that is none the less
a good way to think about it. It is suggestive of a number obtained as the
outcome of an experiment. For example, let E be rolling a standard 6sided
dice, and let X be the value obtained. An elementary probability book might have
something like
P( X=5 ) = 

P( X^{ 1}( {5} ) ) = 

Mathematically, there is no concept of "performing the experiment". We only deal with static relationships between sets:
Define a cumulative distribution function (cdf)
F_{X} :  → [0, 1]  
t  P( X^{1}(∞,t) ) 
f is the probability density function or pdf (it is a "weight function" i.e. a weighted measure for integrating ). It should really be notated f_{X}, but that can be cumbersome. It exists for all "nice" problems, and from my point of view (and that of all elementary textbooks) it is the easiest thing to work with, conceptually.
F(x) =
Expectation
∞  
E(x) =  ∫  t · f(t) dt 
∞ 
E(X) is called the expected value, but this term is misleading. It is not necessarily the value with the maximum probability weight; consider the exponential distribution. It is the "most probable value" in certain important special cases, like when the pdf is a symmetic bump function like the Normal distribution. A better term would be the "center" or "balance point" of the pdf (well, technically the Real line, with points weighted by the pdf). Torque about point μ must equal 0:
∞  
Var(X) =  ∫  (tμ)^{2} · f(t) dt 
∞ 
E(X) and Var(X) are coarse measurements of a distribution! But they can tell us a lot of useful information.
Let X be an arbitrary Random Variable with
E(X) = μ Var(X) = σ^{2} 
} ==>  P( X ∉ (μt, μ+t) ) ≤ σ^{2} / t^{2} for any real t > 0 
What does it mean for RVs to be independent?
n  in some sense  
^{1}⁄_{n}  ∑  X_{i}  →  μ 
i = 1 
Weak law: the convergence is in probability
Strong law: the convergence is in Almost Surely
The important properties of covariance are:
J. Lamperti Probability: A Survery of the Modern Theory
QA 273 L26 1996
Marek Fisz Probability Theory and Mathematical Statistics