You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As a computer scientist, I was introduced to tensors as "multidimensional arrays" in a class while talking about representations for multi-channel media such as images. Like most other common first approaches to the topic, it is practically sufficient but conceptually incomplete. The truth is that this incompleteness is seldom resolved in the remainder of most students' academic journey. Here, I offer a resolution.
289
+
Let us revisit the opening scene of _Linear Algebra Done Right_:
290
+
291
+
> Linear algebra is the study of linear maps on finite-dimensional vector spaces.
292
+
293
+
Another thing to note (apart from the restriction to finite dimensions) is the lack of matrices and vectors in this description. This section will live up to this witholding; we will first look for a linear-map-centric view of the objects in linear algebra and, afterwards, gain an understanding of tensor spaces.
290
294
291
295
#### Vectors and Matrices
292
296
@@ -307,9 +311,9 @@ M =
307
311
\end{equation}
308
312
$$
309
313
310
-
To talk about multilinear algebra and to gain a principled understanding of tensors, I will expose the idea that $v$ and $M$ are _both_ matrices, each of which simultaneously identifies a vector and a linear map. For a richer support, I will establish three resources below and explain their relationship afterward.
314
+
I assert that $v$ and $M$ are _both_ matrices, each of which simultaneously identifies a vector and linear map. For richer support, I will establish three resources below and explain their relationship afterward.
311
315
312
-
{{% hint title="3.8. Remark" %}}
316
+
{{% hint title="3.8. Note" %}}
313
317
314
318
The set of linear maps from a vector space $V$ over the field $\mathbb{F}$ to another vector space $W$ (over the same field) forms a vector space over $\mathbb{F}$. That is,
315
319
@@ -321,18 +325,18 @@ is a vector space over $\mathbb{F}$. We denote the case of linear operators on $
321
325
322
326
{{% /hint %}}
323
327
324
-
{{% hint title="3.9. Remark" %}}
328
+
{{% hint title="3.9. Note" %}}
325
329
326
-
There is a bijection between $\mathcal{L}(V, W)$ and $\mathbb{F}^{m \times n}$ such that $\dim V = n$ and $\dim W = m$, where $V$ and $W$ are vector spaces over the field $\mathbb{F}$. In other words, for each linear map $T$ from a vector space of dimension $n$ to another of dimension $m$, there is exactly one $m$-by-$n$ matrix with entries in $\mathbb{F}$.
330
+
There is a bijection between $\mathcal{L}(V, W)$ and $\mathbb{F}^{(\dim V) \times (\dim W)}$, where $V$ and $W$ are vector spaces on the same field $\mathbb{F}$ and are finite-dimensional. In other words, for each linear map $T$ from a vector space of dimension $n$ to another of dimension $m$, there is exactly one $m$-by-$n$ matrix with entries in $\mathbb{F}$.
327
331
328
332
{{% /hint %}}
329
333
330
-
{{% hint title="3.10. Remark" %}}
334
+
{{% hint title="3.10. Note" %}}
331
335
332
336
A vector $v$ in a space $V$ over $\mathbb{F}$ can be regarded as a linear map from the space $\mathbb{F}^1$ into $V$, via
333
337
334
338
$$
335
-
\psi_v : \mathbb{F}^1 \to V, \;\; \psi_v(\lambda) = v \lambda
339
+
\psi_v : \mathbb{F}^1 \to V, \;\; \psi_v(\lambda) = v \lambda.
336
340
$$
337
341
338
342
When a basis for $V$ is fixed, the map $\psi_v$ is represented by an $n \times 1$ matrix (as an instance of theorem 3.9). This matrix is the familiar column of "coordinates" of $v$. In particular, observe that scalar multiplication can be seen as matrix multiplication with a single-dimensional vector,
@@ -371,17 +375,17 @@ $$
371
375
\text{Matrices} \cong^! \text{Linear maps}.
372
376
$$
373
377
374
-
Indeed, a vector can be turned into a matrix solely through its identifiability as a linear map, **turning the linear map into the central object of our understanding of linear algebra.** In summary,
378
+
Indeed, a vector can be turned into a matrix solely through its identifiability as a linear map, turning the linear map into the central object of our understanding of linear algebra. In summary,
Although true, the statement of 3.10 is quite tricky. In particular, consider the case when you have a linear map $T \in \mathcal{L}(V, W)$ represented by a matrix $M$, given bases. According to 3.8 that map is a vector, but according to 3.10 it is identified by some _other_ linear map $\psi_v$, which is identified by some _other_ column matrix $M^\prime$,
388
+
The statement of 3.10 is neuanced. Consider the case of a linear map $T \in \mathcal{L}(V, W)$ represented by a matrix $M$ (under fixed bases). According to 3.8 that map is a vector, but according to 3.10 it is identified by some _other_ linear map $\psi_v$, which is identified by some _other_ column matrix $M^\prime$,
The basis-conscious bijection $\Phi_\mathcal{B} : \mathcal{L}(V, W) \to \mathbb{F}^{(\dim W) \times (\dim V)}$ lives up to 3.9, but we silently adopted the canonical translation $\Psi_U$ of arbitrary vectors into linear maps in 3.10,
397
+
In this diagram, the basis-conscious bijection $\Phi_\mathcal{B} : \mathcal{L}(V, W) \to \mathbb{F}^{(\dim W) \times (\dim V)}$ lives up to 3.9, but we silently adopted the canonical translation $\Psi_U$ of arbitrary vectors into linear maps in 3.10,
394
398
395
399
$$
396
400
\Psi_U : U \to \mathcal{L}(\mathbb{F}^1, U) \;\; \text{s.t.} \;\; \Psi_U(u) = \psi_u \; \forall \, u \in U.
397
401
$$
398
402
403
+
{{% hint title="3.11. Note" %}}
404
+
399
405
This makes sense when the vector space $U$ is finite-dimensional, as is the case whenever $U = \mathcal{L}(V, W)$ for finite-dimensional $V$ and $W$; the fact that we always interpret finite-dimensional vectors as column matrices is what makes this case of $\Psi_U$ "canonical." In other cases where matrix representations make no sense (e.g. the linear map of the Fourier transform $\mathcal{F}$ from 3.6), the choice of $\Psi_U$ will have to be more conscientious.
400
406
401
407
{{% /hint %}}
@@ -443,9 +449,9 @@ $$
443
449
U^* = \{ \, \varphi : U \to \mathbb{F}^1 \; | \; \varphi \text{ is linear} \}.
444
450
$$
445
451
446
-
This is called the [dual vector space](https://en.wikipedia.org/wiki/Dual_space) of $U$, which receives the special notation $U^\*$ due to how naturally it arises. Much like elements of $U$ can be represented by column matrices, the elements of $U^\*$ (which are the covectors of elements of $U$) can be represented by row matrices (wherever matrices make sense).
452
+
This is called the [dual vector space](https://en.wikipedia.org/wiki/Dual_space) of $U$, which receives the special notation $U^\*$ due to how naturally it arises. Much like elements of $U$ can be represented by column matrices, the elements of $U^\*$ (which are called covectors) can be represented by row matrices (wherever matrices make sense).
447
453
448
-
{{% hint title="3.13. Remark" %}}
454
+
{{% hint title="3.13. Note" %}}
449
455
450
456
In any Hilbert space $H$, for every continuous linear form $\varphi \in H^*$, there is a unique vector $u \in H$ with
451
457
@@ -457,10 +463,10 @@ Furthermore, $||u|| = ||\varphi||$. This gives a natural identification between
457
463
458
464
{{% /hint %}}
459
465
460
-
It is not difficult to see that for any vector space $V$ over a field $\mathbb{F}$, in fact, $\Psi_V$ and $\Psi_V^\*$ are both bijections. Since they are canonical (in the sense that they are naturally defined), we can dare write
466
+
It is not difficult to see that for any vector space $V$ over a field $\mathbb{F}$, in fact, $\Psi_V$ and $\Psi_V^\*$ are both bijections. Since they are canonical (in the sense that they are naturally defined), it is only right to assert that
@@ -487,6 +493,197 @@ This is with the understanding that $m$-by-$n$ matrices with entries in $\mathbb
487
493
488
494
We may also extend linear forms with the same treatment that led us to multilinear maps. In particular, if an $n$-linear map over a field $\mathbb{F}$ has the codomain of $\mathbb{F}$ itself, it is called an $n$-linear form (where all maps like this are called [multilinear forms](https://en.wikipedia.org/wiki/Multilinear_form)).
489
495
496
+
#### Tensor Product
497
+
498
+
If one has two vector spaces $V$ and $W$ over the same field, one can naturally talk about their cartesian product $V \times W$ (as we have been doing in the case of multilinear maps). But instead of doing that, one can talk about a third vector space $V \otimes W$ (called the [tensor product](https://en.wikipedia.org/wiki/Tensor_product) of $V$ and $W$) which, while having just as much expressiveness as $V \times W$, of course has the added benefit of being a vector space itself.
499
+
500
+
{{% hint title="3.15. Note" %}}
501
+
502
+
Let $\varphi : V \times W \to V \otimes W$ be a bilinear map. Then, for each bilinear map $h : V \times W \to Z$ (into another vector space $Z$), there is a unique linear map $\tilde h : V \otimes W \to Z$ such that $h = \tilde h \circ \varphi$. This is referred to as the [universal property](https://en.wikipedia.org/wiki/Universal_property) of the tensor product, which justifies the phrase "just as much expressiveness."
503
+
504
+
{{% /hint %}}
505
+
506
+
Every tensor product $V \otimes W$ is equipped with such a bilinear map $\varphi : V \times W \to V \otimes W$ that allows the construction of vectors in $V \otimes W$. The [outer product](https://en.wikipedia.org/wiki/Outer_product) is an example of this in finite dimensions, but in other cases one must be more creative. Confusingly, this map $\varphi$ is also called a tensor product, and $\otimes$ is predominantly used instead of $\varphi$ in notation. Summarizing,
507
+
508
+
$$
509
+
\forall (v, w) \in V \times W, \; \varphi(v, w) = v \otimes w \quad \text{where} \quad v \otimes w \in V \otimes W.
510
+
$$
511
+
512
+
{{% hint title="3.16. Note" %}}
513
+
514
+
Even if the tensor product among vectors is not strictly commutative, there are canonical isomorphisms among permutations of tensor products of vector spaces. That is, for any permutation $\sigma$,
Due to this symmetry, we often write tensor products as if they were commutative without loss of generality. But when one talks about actual computation or representations, order will probably matter.
521
+
522
+
{{% /hint %}}
523
+
524
+
Totally, through 3.15 and 3.16, the tensor product is precisely designed to "linearize" multilinear maps. To elaborate, for any multilinear map $h : V_1 \times V_2 \times \dots \times V_n \to W$, there exists a unique linear map
525
+
526
+
$$
527
+
\tilde h : \bigotimes_i V_i \to W \;\; \text{s.t.} \;\; \tilde h(v_1 \otimes \cdots \otimes v_n) = h(v_1, \ldots, v_n).
528
+
$$
529
+
530
+
#### More Matrices
531
+
532
+
Being now able to identify every multilinear map with a unique linear map over a tensor product space, it is possible to assert that 3.8, 3.9, and 3.10 also apply to tensor product spaces. I will reiterate the notes, dressing them up specifically for the case of tensor product spaces.
533
+
534
+
{{% hint title="3.17. Specialization of 3.8" %}}
535
+
536
+
The set of linear maps from a tensor product space $\bigotimes_i V_i$ over the field $\mathbb{F}$ to another vector space $W$ over $\mathbb{F}$ forms a vector space over $\mathbb{F}$. Symbolically,
537
+
538
+
$$
539
+
\mathcal{L}({\textstyle\bigotimes}_i V_i, W) = \left\{ \, T : \bigotimes_i V_i \to W \;\; \bigg| \;\; T \text{ is linear} \right\}
540
+
$$
541
+
542
+
is a vector space over $\mathbb{F}$. We denote the operator case as $\mathcal{L}(\bigotimes_i V_i) = \mathcal{L}(\bigotimes_i V_i, \\, \bigotimes_i V_i)$.
543
+
544
+
{{% /hint %}}
545
+
546
+
{{% hint title="3.18. Specialization of 3.9" %}}
547
+
548
+
There is a bijection between $\mathcal{L}(\bigotimes_i V_i, W)$ and $\mathbb{F}^{\times_i (\dim V_i)} \times \mathbb{F}^{(\dim W)}$ when $V$ and $W$ are finite-dimensional vector spaces over $\mathbb{F}$. That is, for each linear map from a tensor product over spaces $V_1, \\, \ldots, \\, V_n$ and into $W$ (all over a field $\mathbb{F}$), there is one matrix with axis lengths $(\dim V_1, \\, \ldots, \\, \dim V_n, \\, \dim W)$ with entries in $\mathbb{F}$.
549
+
550
+
{{% /hint %}}
551
+
552
+
Often, $(\dim V_1, \\, \ldots, \\, \dim V_n, \\, \dim W)$ from 3.18 is referred to as the shape of the matrix. Each entry of the shape tuple can be viewed as the sidelength of a pictographical embedding of the matrix in $\mathbb{R}^{n + 1}$. For example, the matrix in $\mathbb{R}^{2 \times 3}$
553
+
554
+
$$
555
+
M =
556
+
\begin{bmatrix}
557
+
1 & 2 & 3 \\
558
+
4 & 5 & 6 \\
559
+
\end{bmatrix}
560
+
$$
561
+
562
+
is said to have shape $(2, 3)$ -- once it is "drawn on paper," its "sidelengths" are $2$ and $3$. Keeping the spirit of "pictographical" representation, people often call this a $2$-dimensional array, as it can be neatly "drawn" on two dimensions. But concretely, this matrix corresponds to a $6$-dimensional linear map. This ambiguity is often resolved by calling each shape entry an "axis" instead of a dimension, understanding that it refers to the visual axis of $\mathbb{R}^n$ where we would pictographically embed it.
563
+
564
+
{{% hint title="3.19. Note" %}}
565
+
566
+
Software libraries that represent matrices often make the choice of calling them either $n$-dimensional arrays or simply tensors in an effort to maintain generality. Arguably, these terms are both misnomers. Personally, I think they should have just called them all matrices.
567
+
568
+
{{% /hint %}}
569
+
570
+
{{% hint title="3.20. Note" %}}
571
+
572
+
Appending a trailing \(1\) to the shape of a matrix does not change the underlying object. Concretely, a matrix of shape \((\alpha_1, \, \ldots, \, \alpha_n)\) can be naturally identified with one of shape \((\beta_1, \, \ldots, \, \beta_n)\) when $\prod_i \alpha_i = \prod_i \beta_i$. This is a result of the isomorphisms included in the scope of
Some bijections of the form $f : \bigotimes_i \Alpha_i \to \bigotimes_i \Beta_i$ are often referred to as "reshapings." This concept is hard to formalize, but pictographically graspable. For example, this is one way to reshape $M$:
581
+
582
+
$$
583
+
\begin{bmatrix}
584
+
1 & 2 & 3 \\
585
+
4 & 5 & 6 \\
586
+
\end{bmatrix}
587
+
\xrightarrow{f}
588
+
\begin{bmatrix}
589
+
1 & 2 & 3 & 4 & 5 & 6 \\
590
+
\end{bmatrix}
591
+
\xrightarrow{g}
592
+
\begin{bmatrix}
593
+
1 & 2 \\
594
+
3 & 4 \\
595
+
5 & 6 \\
596
+
\end{bmatrix}.
597
+
$$
598
+
599
+
{{% hint title="3.21. Specialization of 3.10" %}}
600
+
601
+
Every vector $v$ in a tensor product space $\bigotimes_i V_i$ can be seen as a linear map from the sapce $\mathbb{F}^1$ into $\bigotimes_i V_i$ through the definition
With a basis for $\bigotimes_i V_i$ fixed, the map $\psi_v$ can be represented by a matrix of shape $(1, \\, \bigotimes_i V_i)$ (as an instance of note 3.17). However, in the context of tensor product spaces, it is common to represent $\psi_v$ using a matrix of shape $(\dim V_1, \\, \ldots, \\, \dim V_n)$ (invoking 3.20). This is done to facilitate descriptions of computations involving the vector in question.
608
+
609
+
{{% /hint %}}
610
+
611
+
#### Actual Tensors
612
+
613
+
Many people refer to vectors in tensor product spaces as tensors, especially in computationally-oriented scientific disciplines. This population has recently gained numerosity (and maybe even majority) thanks to the increasing availability of efficient computers and their applications. But traditionally, a [tensor](https://en.wikipedia.org/wiki/Tensor_(intrinsic_definition)) is a linear map associated with a vector space $V$ over $\mathbb{F}$ of the form
Here, $(m, \\, n)$ is called the "type" of the tensor $T$. Perhaps the most important restriction of this definition is that we are only talking about a single vector space $V$, making it invalid to call $\mathbb{R}^2 \otimes \mathbb{R}^3 \to \mathbb{R}$ a tensor (without first invoking some isomorphism).
622
+
623
+
{{% hint title="3.22. Note" %}}
624
+
625
+
626
+
{{% /hint %}}
627
+
628
+
{{% hint title="3.23. Note" %}}
629
+
630
+
The vector-valued tensor product $\otimes : V \times W \to V \otimes W$ provides the isomorphism
631
+
632
+
$$
633
+
\{f : V \times W \to Z \; | \; f \text{ is linear} \} \cong \{g : V \otimes W \to Z \; | \; g \text{ is linear} \}
634
+
$$
635
+
636
+
as a corollary of 3.15. Furthermore, it is canonical in the sense that the vector-valued tensor product is static. In the case of $V = W$, $\\; \\{g : V \otimes V \to \mathbb{F}^1 \\; | \\; g \text{ is linear} \\} = \mathcal{L}(V \otimes V, \mathbb{F}^1) = (V \otimes V)^* \cong V^* \otimes V^*$. With this we can see that for all instances of $(8)$,
Let us observe the types of some known tensors of form $T : (\times^n \\, V) \to \mathbb{F}$.
648
+
649
+
1. Any scalar $\lambda : \varnothing \to \mathbb{F}$ is a tensor of type $(0, \\, 0)$.
650
+
2. Any norm $|| \cdot || : v \mapsto ||v||$ is a tensor of type $(0, \\, 1)$.
651
+
3. Any quadratic form $q : v \mapsto v^\top A v$ is a tensor of type $(0, \\, 1)$.
652
+
4. Any inner product $\langle \cdot, \cdot \rangle : (v, w) \mapsto \mathbb{F}$ is a tensor of type $(0, \\, 2)$.
653
+
654
+
{{% /hint %}}
655
+
656
+
All the tensors in 3.24 are scalar-valued. Now I will show how we can see that vector-valued linear maps are also tensors. Consider a tensor-valued multilinear map of the form
By showing that $T$ identifies a tensor, we will also show that an arbitrary linear map defined using a single vector space $V$ identifies a tensor (including tensor-valued maps, via 3.23). First, we define
where $v_i \in V$, $w_i \in V^\*$, and $\varphi \in (\left( \otimes^c \\, V \right) \otimes \left( \otimes^d \\, V^* \right))^*$. Thanks to 3.15, there is a unique
0 commit comments