Orthogonal Projections on Hyperplanes Intertwined With Unitaries

Fix a point in a finite-dimensional complex vector space and consider the sequence of iterates of this point under the composition of a unitary map with the orthogonal projection on the hyperplane orthogonal to the starting point. We prove that, generically, the series of the squared norms of these iterates sums to the dimension of the underlying space. This leads us to construct a (device-dependent) dimension witness for quantum systems which involves the probabilities of obtaining certain strings of outcomes in a sequential yes-no measurement. The exact formula for this series in non-generic cases is provided as well as its analogue in the real case.


R
I Alice and Bob live in two antipodal cities, say Alaejos in Spain (A for Alice) and Wellington in New Zealand (B for Bob), lying on the latitudes ϕ ≈ 41 • N and S, respectively. Alice, an addicted traveller, sets off from A and moves eastward along the parallel to some point C. By λ ∈ [0, 2π) we denote the difference of longitudes (in the sense of [9,Problem VIII,p. 170 a biased coin to choose her destination, deciding to either return to A or travel to B. We assume that the coin's bias is such that her odds of returning home are inversely proportional to the ratio of squared (Euclidean) distances between C and the potential destinations A and B. That is, putting p for the probability of Alice going from C to A, we have p/(1 − p) = |CB| 2 /|CA| 2 . Hence, Alice has a tendency to go to the place closer to her current location. In fact, it follows easily that |CA| = 2 √ 1 − p and |CB| = 2 √ p. To express p in terms of the central angles, one can apply the haversine and havercosine functions, so appreciated by navigators of all ages (haversin x := sin 2 x 2 , havercosin x := cos 2 x 2 ). From the law of cosines it follows that p = havercosin(θ CA ) and 1 − p = havercosin(θ CB ), where θ CA and θ CB are the central angles between C and A, and between C and B, respectively. To express p in terms of geographic coordinates, we call on the renowned haversine formula [11], obtaining p = 1 − cos 2 ϕ · haversin λ.
Should fate send Alice back to A, her trip is complete and she is done with travelling (at least for some time). Assume she finds herself at B. As much as she loves visiting Bob, sooner or later she needs to get back home. So one day Alice departs to the east along the parallel, arriving at point D such that the difference of longitudes of B and D is again equal to λ, i.e., D is antipodal to C. However, once at D, she decides on her destination in the same manner as before; namely, she goes from D to A with probability havercosin(θ DA ) = 1 − p, or to B with probability havercosin(θ DB ) = p, where θ DA (= θ CB ) and θ DB (= θ CA ) are the central angles between D and A, and between D and B, respectively. In the latter case, having spent a few extra days at Bob's place, our vacillating traveller again makes a journey to D, repeating this procedure until eventually returning to A.
One can now ask: what is the average number of times Alice will visit Bob before getting home? Somewhat surprisingly, the answer depends neither on ϕ, i.e., the localization of the antipodal cities, nor on λ, and it is always 1, unless ϕ = ±π/2 or λ = 0, in which case all Alice's adventures are imaginary. Indeed, generically, we deal here with an irreducible two-state (A and B) symmetric Markov chain in which the mean return time to A is equal to 2, see Fig. 2.
From the characters' names one might get the impression that (quantum) information theory is involved here somehow, and this is indeed the case. Namely, let us replace the globe with the unit Bloch sphere S 2 , which is isomorphic to CP 1 [6, p. 61]. For |z ∈ CP 1 F 2. Alice's travelling as a symmetric two-state Markov chain. Clearly, the chain is irreducible iff p < 1.
we denote the corresponding element of S 2 by r z . We then have | w|z | 2 = 1 2 (r w · r z + 1) for |w , |z ∈ CP 1 , see [6, p. 63]. Next, we swap the antipodal cities A and B for the Bloch vectors r z 0 , r z 1 , where r z 1 = − r z 0 , related to the elements of the orthonormal projective basis {|z 0 , |z 1 } of CP 1 , and travels along parallels for the rotation O λ through the angle λ about the N-S axis of S 2 . By U we denote the (projective) unitary operator corresponding to this rotation via r U z = O λ (r z ), where |z ∈ CP 1 [6, p. 88]. Finally, the coin tossing is swapped for the rank-1 projection-valued measurement (PVM) consisting of P 0 , P 1 such that P 0 + P 1 = I and related to the basis, i.e., P 0 |z = |z 0 and P 1 |z = |z 1 for |z ∈ CP 1 .
We analyse the situation where successive measurements are performed on a qubit, i.e., on a two-dimensional quantum system, whose evolution between two subsequent measurements is governed by U . Assume that |z 0 is the initial state of the system and the instrument describing the measurement process is repeatable. It follows that the probability P i 1 ,...,in of obtaining a string of measurement outcomes (i 1 , . . . , i n ), where i m ∈ {0, 1} for m = 1, . . . , n and n ∈ N \ {0}, is given by the celebrated Wigner formula [19]: It follows that where p jl := P l U z j 2 = | z l |U |z j | 2 with j, l ∈ {0, 1} is the probability that we obtain l as the measurement outcome, provided that the preceding measurement yielded j [13,14]. Thus, we have a Markov chain on the set of symbols 0 and 1 with the initial distribution concentrated at 0 and the doubly stochastic transition matrix P := (p jl ) j,l=0,1 . Note that the combined evolution of states is also Markovian with two states |z 0 and |z 1 , the initial distribution concentrated at |z 0 , and the same transition matrix P . We put a 0 := 1 and a n : and b n := P 1···1 n 0 = a n − a n+1 for n ∈ N.
That is, b n is the probability of obtaining the outcome 0 for the first time in the (n + 1)-th measurement (i.e., the probability that Alice returns home only after having landed n times in B). The mean return time to |z 0 (i.e., one plus the average number of visits Alice pays to Bob) is given by Summation by parts (Abel transformation) allows us to express M as follows: In order to calculate M , we first need to determine the transition matrix. Observe that and so p 01 = p 10 = 1 − p and p 11 = p. Therefore, a n = (1 − p)p n−1 and b n = (1 − p) 2 p n−1 for n ∈ N \ {0}, and b 0 = p. Thus, from (5) we obtain Clearly, if p = 1, then M = 1, and if p < 1, then M = (1 − p) 2 ( 1 1−p ) + (1 − p) + p = 2. Observe that p = 1 iff havercosin(θ CA ) = 1 iff cos 2 ϕ · haversin λ = 0 iff ϕ = ±π/2 or λ = 0, which is in turn equivalent to U |z 0 = |z 0 . On the other hand, we have lim n→∞ na n = 0, so it follows from (1) and (6) that In consequence, we obtain independently on U and |z 0 . Recall that P 1 is the orthogonal projection on the hyperplane orthogonal to z 0 .
The primary aim of the present paper is to extend this elementary result to higher dimensions. For simplicity, from now on we abandon the projective approach and stick to Euclidean spaces. We claim that for a generic choice of U ∈ U(C d ) and z ∈ C d such that where P stands for the orthogonal projection in the direction of z, i.e., on span{z} ⊥ . More specifically, the following theorem holds.
It is straightforward to verify Theorem 1 in the particular case of z being an eigenvector of U . In the series at the left-hand side all but the first term (which is equal to 1) vanish, and the sum at the right-hand side gives d − 1, because there are d − 1 linearly independent eigenvectors of U in Θ. In general, however, the proof is more demanding, see Sec. 2.
Theorem 1 has a quantum-mechanical interpretation: in essence, the same as in the qubit case (though geographical analogies are no longer possible). Namely, instead of a PVM consisting of two rank-1 projections, we now have a PVM comprising one projection of rank 1 and one projection of rank d − 1, as well as the Lüders instrument corresponding to this PVM [2]. In consequence, the two-state Markov chain is replaced by an aggregated Markov chain [12] with two outcomes 0 and 1 and with hidden state space given by the disjoint union of a point, corresponding to z 0 , and an (at most) countable subset of the projective space CP d−2 , which takes the place of z 1 , see Fig. 3. Note that (1) and (3) -(6) are still valid; however, (2) no longer holds as symbolic dynamics is no longer Markovian. 3. An aggregated Markov chain in the case of d = 3. By P in|i 1 ...i n−1 we denote the conditional probability of the system outputting i n , provided that so far it has emitted the outcomes i 1 , . . . , i n−1 .
Accordingly, Theorem 1 provides another operational (physical) meaning to the number of quantum degrees of freedom, 1 i.e., to the dimensionality of the Hilbert space underlying the quantum system. The question of how to determine this dimension is not only of theoretical interest but also of utmost practical importance, because in quantum information theory the system's dimension is regarded as an important resource: in higher-dimensional spaces more powerful protocols are available. This long-standing problem has been addressed from many perspectives, see, e.g., [1,3,4,7,10,16,17,20]. Theorem 1 offers the possibility of estimating (from below) the dimension of the system from the statistics of a projective measurement performed on this system. See Section 3 for further discussion.
In this context, it is noteworthy that convergence in Theorem 1 is geometric. Namely, let ρ stand the spectral radius of an operator, i.e., the largest absolute value of its eigenvalues.
In Lemma 3 we will show that ρ( For the proof of this theorem, see Sec. 2. A result analogous to Theorem 1 holds for real vector spaces, the formula is slightly more complicated though, because orthogonal matrices need not be diagonalizable. Let R be an orthogonal operator on R d . Let σ(R) denote its real spectrum; obviously, with the property that there exists ϕ j ∈ R such that with respect to every orthonormal basis of A j we have with the sign depending on the orientation of the basis. Clearly, W 1 , W −1 , A 1 , . . . , A k constitute an orthogonal decomposition of R d . Also, let z ∈ R d be a unit vector. In the real case, by Θ we denote the orthogonal complement of z in R d , and by P the orthogonal projection on Θ.
As before, see Sec. 2 for the proof.
, so the claim of Theorem 3 can be rewritten as Note also that in the generic case we have ∞ n=0 ||(P R) n z|| 2 = d.
Let us illustrate Theorem 3 with the following example.
Example. We fix a unit vector z ∈ R 2 and let P stand for the orthogonal projection on Θ := span{z} ⊥ . Under the standard identification of R 2 with C we have z = e iθ for some θ ∈ R, so Θ = e i(θ+π/2) R. It is straightforward to verify that P acts as where r ≥ 0 and κ ∈ R. We investigate S := ∞ n=0 ||(P R) n z|| 2 with R assumed to be an orthogonal operator on R 2 , i.e., a rotation or reflection.

P
Adopting the standard convention that raising any operator to the null power yields the identity, we see that both theorems are hyper-obvious for d = 1. Thus, in order to avoid trivial statements, we assume that d ≥ 2. For the whole of this section we adopt the following notation. As before, the orthogonal complement of z in C d (or, in the proof of Theorem 3, in R d ) is denoted by Θ, the orthogonal projection on Θ by P , and W := λ∈σ(U ) (Θ ∩ V λ ) is the maximal subspace of Θ invariant under U . By || · || we denote the spectral norm on L(C d ), i.e., the operator norm induced on the space of linear transformations of C d by the Euclidean norm, while ρ stands for the spectral radius of an operator.

Lemma 1.
If v ∈ Θ is an eigenvector of P U with eigenvalue µ ∈ C and |µ| = 1, then v is an eigenvector of U with eigenvalue µ.
Proof. Clearly, as P is a projection on Θ, it is sufficient to show that W ⊥ is invariant under P U . Letting u ∈ W ⊥ , we obtain U u ∈ W ⊥ due to the invariance of W ⊥ under U . Thus, P U u|w = U u|P w = U u|w = 0 for every w ∈ W , and so P U u ∈ W ⊥ .

Lemma 3. P U is an endomorphism on
Proof. The fact that P U is an endomorphism on W ⊥ ∩ Θ follows easily from Lemma 2.
The preceding three lemmas pave the way for the following result, which not only is a crucial step in the proof of Theorem 1, but also plays a key role in studying the symbolic dynamics generated by the quantum system under consideration [15, Sec. 1.3].

Lemma 4. We have
Proof. Clearly, dim W = λ∈σ(U ) dim(Θ ∩ V λ ), so the above claim can be rewritten as where α k := tr((P U ) k (P U ) * k ) for k ∈ N. First, recall that W is invariant under U . We observe that P U | W = U | W . Indeed, for w ∈ W we obtain U w ∈ W ⊂ Θ, and thus P U w = U w. In consequence, P U | W is unitary, and so ||P U w|| = ||w|| for every w ∈ W.
Next, from Lemma 3 it follows that (P U | W ⊥ ∩ Θ ) n → 0 as n → ∞, so, via Lemma 2, we obtain (P U ) n u = (P U ) n−1 (P U u) Now, we putd := dim W and choose an orthonormal basis {w 1 , . . . , w d } of C d such that W = span w 1 , . . . , wd and W ⊥ = span wd +1 , . . . , w d . It follows that Applying (8) and (9), we obtain Proof of Theorem 1. Let us show that for every n ∈ N we have where, as before, α k stands for tr((P U ) k (P U ) * k ), k ∈ N. Put P z := I − P , i.e., P z is the orthogonal projection on span{z}. It follows that where n ∈ N, as desired. As a consequence, we have To conclude the proof, it suffices to apply Lemma 4.

A
As we mentioned above, Theorem 1 can be used to estimate the dimension d of the Hilbert space underlying a quantum system. Consider a yes-no measurement (elementary test) represented by a PVM Π = {|z z| , I − |z z|}, where z is a unit vector from C d , along with the corresponding Lüders instrument. One can think of applying Π as posing the question whether the system is in state |z or not [8]. This measurement is performed repeatedly in an isochronous manner and between each two subsequent measurements the system undergoes deterministic time evolution governed by a unitary operator U . An example of such a system (for d = 3) is a spin-1 particle subject to a magnetic field rotating the spin, with the measurement answering the yes-no question whether the square of the spin component along a given axis is zero [18].
Assume that the initial state of the system is |z z|. In the current context, (1) reads where P := I − |z z| and n ∈ N \ {0}. Hence, from (7) we obtain the following direct formula This opens the way to determining the dimension of the underlying Hilbert space from the joint probabilities of the measurement results, which can be estimated from repeated runs of the experiment. Clearly, the reliability of the resulting dimension witness depends on the device being a POVM and consisting of two projections, one of them one-dimensional.
In practice, we observe the sequence of partial sums S N (U ) := N −1 n=0 ||(P U ) n z|| 2 that tends to d at least geometrically as r 2N /(1 − r 2 ) with N → ∞, where r is lower bounded by the spectral radius of P U | W ⊥ ∩ Θ , see Theorem 2. If an eigenspace of U intersects Θ non-trivially or it is very close to Θ, then, respectively, S N (U ) does not converge to d or the convergence is very slow as ρ(P U | W ⊥ ∩ Θ ) is very close to one. As a way to circumvent this problem we propose to consider several different unitaries simultaneously, see Appendix.
Let us also point out that the sequence of ceilings S N (U ) of the partial sums converges to d even faster than the original series, eventually hitting d for some N .
Actually, the joint probabilities required to find d via (19) can be inferred from a single sequence of outcomes. Namely, the outcome 0 identifies the underlying (hidden) quantum state as |z , i.e., as the initial state of the measurement protocol. Hence, whenever 0 appears in the sequence of outcomes, the system is reset to the initial setting.
Alternatively, d can be computed as the mean return time to |z . To see this, combine (6) & (18), and invoke Theorem 2 to verify that the limit in (6) vanishes. The mean return time to |z can be estimated from a single sequence of measurement outcomes by the Monte Carlo method as the average distance between the consecutive occurrences of 0, or, equivalently, as one plus the average length of a series of 1's. Namely, let 1 ≤ j 1 < j 2 < . . . stand for the positions in the sequence of outcomes occupied by 0's. Put T k := j k − j k−1 for the distance between the (k − 1)-th and k-th occurrence of 0, and T 1 := j 1 : 1, . . . , 1, By the strong law of large numbers we get almost certainly, as desired. To illustrate this problem, we took 100 unitary matrices in dimension d = 15 generated from the Haar distribution (CUE). It turns out that N = 698 steps had to be executed in order for all these unitary matrices to point to the actual dimension of the system, see Fig. 4. However, one can argue that the correct result could have been identified much earlier from the shape of these distributions.
In this vein, we propose to taked := max i=1,.. where β ∈ [0, 1] and s ∈ N \ {0} are parameters. That is, in terms of the barplots, the far right bar is required to remain stable (not to move further right) for s + 1 consecutive steps and to contain at least βM of all observations. Clearly, this stopping criterion is always met since for each i = 1, . . . , M the sequence {d N (U i )} ∞ N =1 is non-decreasing and equal to d from some N onwards, andÑ is the number of executed steps.
Obviously, there is a trade-off between accuracy and time-efficiency. The parameters M , β, s can be used to find a balance between increasing the probability of the algorithm returning the correct estimate of the system's dimension (by increasing the parameters) and decreasing the number of executed steps (by decreasing the parameters).
We ran this algorithm 1,000 times for d = 2, . . . , 30 with parameters M = 100, β = 0.5, s = 1 and unitary matrices generated from the Haar distribution. The accuracy was 100% and the average number of executed steps is plotted in Fig. 5.