DFAs require exactly one transition from every state on every character (δ is a function). While processing a string, there is only one place you can go at every step.
NFAs allow any number of transitions from a state on a given character. The machine accepts a string x if there is any possible path from the start state to a final state using x.
One way of thinking of nondeterminism is that the machine gets to "magically" guess which path to take at any point.
Another way is that it explores all possible paths and accepts if any of them do.
Formally, the only difference between an NFA and a DFA is that in an NFA, δ is a function from Q × Σ to 2Q (instead of to Q). δ(q, a) outputs the set of states that can be reached by taking a transition labeled a from state q.
The extended transition function serves the same purpose as with a DFA. The definition is a bit more involved, but follows the same logic: to compute δ^(q, xa), first recursively find all of the states you can reach from q using string x: compute δ^(q, x). Then, from each of these steps, take all of the states you can reach from any of them using a single transition on a. Formally:
δ^(q, xa) = ∪ qʹ ∈ δ^(q, x)δ(qʹ, a)