Skip to main content

Universal Turing Machine and Its Simulation

The theorems we have which we will show is 

TheoremThere exists a Turing Machine $\mathcal{U}$ such that for every $x,\alpha\in \{0,1\}^*$ $\mathcal{U}(x,\alpha)=M_{\alpha}(x)$ where $M_{\alpha}$ denotes the turing machine represented by $\alpha$. Moreover, if $M_{\alpha}$ halts on input $x$ within $T$ steps then $\mathcal{U}(x,\alpha)$ halts within $cT\log T$ steps where $C$ is a number of independent of $|x|$ and depending only on $M_{\alpha}'s$ alphabet size, number of tapes and number of states 

For this, we need some equivalency between some special Turing machine models

Theorem 1: For every $f:\{0,1\}^*\to \{0,1\}$ and time-constructible $T:\mathbb{N}\to \mathbb{N}$ if $f$ is computable in time $T(n)$ by a Turing Machine $M$ using alphabet $\Gamma$ then it is computable in time $4\log|\Gamma|T(n)$ by a Turing Machine $\hat{M}$ using the alphabet $\{\triangleright, \square,0,1\}$

Theorem 2: Define a single-tape Turing machine to be a Turing Machine that has only one read-write tape, that is used as input, work tape, and output tape. For every $f:\{0,1\}^*\to \{0,1\}$ and time-constructible $T:\mathbb{N}\to \mathbb{N}$ if $f$ is computable in time $T(n)$ by a Turing Machine $M$ using $k$ tapes then it is computable in time $5kT^2(n)$ by a single-tape Turing Machine $\hat{M}$.

Theorem 3: Define a bidirectional Turing machine to be a Turing Machine whose tapes are infinite in both directions. For every $f:\{0,1\}^*\to \{0,1\}$ and time-constructible $T:\mathbb{N}\to \mathbb{N}$ if $f$ is computable in time $T(n)$ by a bidirectional Turing Machine $M$ using $k$ tapes then it is computable in time $4T(n)$ by a standard Turing Machine $\hat{M}$.


Now coming back to the proof, we'll first proof a weaker version.

Weaker Version: $CT\log T$ is replaced by $CT^2$

Proof of Weaker Version: Our universal TM $\mathcal{U}$ is given an input $(x, \alpha)$, where $\alpha$ represents some TM $M$, and needs to output $M(x)$. By Theorem 1 and Theorem 2, we may assume

  1. $M$ has a single work tape (in addition to the input and output tape)
  2. $M$ uses the alphabet $\{\triangleright, \square,0,1\}$

Note that these transformations may introduce quadratic slowdown

    The $TM$ $\mathcal{U}$ uses the alphabet $\{\triangleright, \square, 0,1\}$ and three work tapes in addition to its input and output tape. $\mathcal{U}$ uses its input tape, output tape, and one of the work tapes in the same way $M$ uses its three tapes. In addition, $\mathcal{U}$ will use its first extra work tape to store the table of values of $M$ 's transition function, and its other extra work tape to store the current state of $M$. 

Simulation: To simulate one computational step of $M, \mathcal{U}$ scans the table of $M$ 's transition function and the current state to find out the new state, symbols to be written, and head movements, which it then executes. 


Now we will prove the actual theorem.

Proof of Actual Theorem: The general structure of $\mathcal{U}$ will be as in the case of $CT^2$. $\mathcal{U}$ will use its input and output tape in the same way $M$ does and will also have extra work tapes to store $M$ 's transition table and current state and to encode the contents of $M$ 's work tapes. The main obstacle we need to overcome is that we cannot use Theorem 2 to reduce the number of $M$ 's work tapes to one

    Let $k$ be the number of tapes that $M$ uses (apart from its input and output tapes) and $\Gamma$ its alphabet. We may assume that $\mathcal{U}$ uses the alphabet $\Gamma^k$. Thus we can encode in each cell of $\mathcal{U}$'s main work tape $k$ symbols of $\Gamma$, each corresponding to a symbol from one of $M$'s tapes. This means that we can think of $\mathcal{U}$'s main work tape not as a single tape but rather as $k$ parallel tapes; that is, we can think of $\mathcal{U}$ as having $k$ tapes with the property that in each step all their read-write heads go in unison either one location to the left, one location to the right, or stay in place. 

We still have to deal with the fact that $M$'s $k$ read-write heads can each move independently to the left or right, whereas $\mathcal{U}$ 's parallel tapes are forced to move together.

Idea: Since we can not move $\mathcal{U}'$s read-write head in different directions at once we simply move parallel tapes ``under'' the head.

    To simulate a single step of $M$, we shift all the nonblank symbols in each of these parallel tapes until the head's position in these parallel tapes corresponds to the heads' positions of $M$'s $k$ tapes. For example, if $k=3$ and in some particular step $M$'s transition function specifies the movements $L, R, R$, then $\mathcal{U}$ will shift all the nonblank entries of its first parallel tape one cell to the right, and shift the nonblank entries of its second and third tapes one cell to the left. The above approach is still not good enough to get $O(T\log T) $ time simulation. The reason is that there may be as many as $T$ nonblank symbols in each parallel tape and so each shift operation may cast $\mathcal{U}$ as much as $T$ operations per each step of $M$ resulting $\Theta(T^2)$ time simulation.

We will deal with this problem by encoding the information on the tapes in a way that allows us to amortize the work performing a shift.

Idea: We will ensure that we do not need to move all the nonblank symbols of the tape in each shift operation. Specifically, we will encode the information in a way that allows half of the shift operations to be performed using $2 c$ steps, for some constant $c$, a quarter of them using $4 c$ steps, and more generally $2^{-i}$ fraction of the operations will take $2^i c$ steps, leading to simulation in roughly $c T \log T$ time.

Encoding $M$'s tapes on  $M$'s tape:

 We add a special kind of blank symbol $\boxtimes$ to the alphabet of $M$’s parallel tapes with the properties that this symbol is ignored in the simulation. For example, if $M$'s tape contents are $010$ then, this can be encoded as $0\boxtimes 10$ or $0\boxtimes\boxtimes1\boxtimes 0$ and so on or just $010$.

    For convenience, we think of $\mathcal{U}$'s parallel tapes as infinite in both the left and right directions by the Lemma. Thus we index their locations by $0,\pm 1,\pm 2,\dots$ Normally we keep $\mathcal{U}$'s head on location 0 of these parallel tapes. We will only move it temporarily to perform a shift when, following our general approach, we simulate a left hand movement by sifting the tape to the right and vice versa. At the end of the shift, we return the head to location 0.

    We split each of $\mathcal{U}$'s parallel tapes into zones that we denote by $R_0$, $L_0$, $R_1$, $L_1,\dots$ (We'll only need to go up to $R_{\log T}$, $L_{\log T}$ since the turing machine runs for time $T$, it can at most access $T$ cells of each tape.). The cell at location 0 is not at any zone. Zone $R_0$ contains two cells immediately right of location 0 (i.e. location $+1,+$). Zone $R_1$ contains the four cells $+3,+4,+5,+6$. Generally for every $i\geq 1$, Zone $R_1$ contains the $2\times 2^i$ cells that are right to the $R_{i-1}$ (i.e. locations $2^{i+1}-1,\dots, 2^{i+2}-2$). Similarly Zone $L_0$ contains two cells indexed by $-1$ and $-2$ and generally  Zone $L_i$ contains the cells $-2^{i+2}+2,\dots,- 2^{i+1}+1$

Invariants: We shall always maintain the following invariants

  • Each of the zones is either empty, full, or half-full with non-$\boxtimes$-symbols. That is, the number of symbols in zone $R_i$ that are not $\boxtimes$ is either $0$, $2^i$, or $2\times  2^i$ and the same holds for $L_i$ . (We treat the ordinary symbol $\square$ the same as any other symbol in $\Gamma$, and in particular a zone full of $\square$’s is considered full). We assume that initially all the zones are half-full. We can ensure this by filling half of each zone with $\boxtimes$ symbols in the first time we encounter it.
  • The total number of non-$\boxtimes$-symbols in $R_i \cup L_i$ is $2 \times 2^i$. That is, either $R_i$ is empty and $L_i$ is full, or $R_i$ is full and $L_i$ is empty, or they are both half-full.
  • Location 0 always contains a non-$\boxtimes$-symbol.

Performing a Shift: Now when performing the shifts, we do not always have to move the entire tape, but we can restrict ourselves to only using some of the zones.

    We illustrate this by showing how $\mathcal{U}$ performs a left shift on the first of its parallel tapes

  1. $\mathcal{U}$ finds the smallest $i_0$ such that $R_{i_0}$ is not empty (Hence $R_0,R_1,\dots,R_{i_0-1}$ are empty). Note that this is also the smallest $i_0$ such that $L_{i_0}$ is not full. We call this number $i_0$ the index of this particular shift. 
  2. $\mathcal{U}$ puts the leftmost non-$\boxtimes$-symbol of $R_{i_0}$ in position 0. We now denote the new zones as $R_i, L_i$. Now there are $2\times 2^{i_0}-1$ non-$\boxtimes$-symbols remain in $R_{i_0}$ if $R_{i_0}$ is full and otherwise $2^{i_0}-1$ non-$\boxtimes$-symbols remain in $R_{i_0}$. 
    • Now if $R_{i_0}$ was full before we shift the $2\times 2^{i_0}-1$ to the $R_0,\dots,R_{i_0}$ each of the zones becomes half filled. In other words we shift the leftmost $ 2^{i_0}-1$ non-$\boxtimes$-symbols to the zones $R_0,\dots,R_{i_0-1}$ each of which becomes half filled because in total they take the space $\sum\limits_{i=0}^{i_0-1}2\times 2^i=2(2^{i_0}-1)$. And hence $R_{i_0}$ also becomes half filled because $2^{i_0}$ non-$\boxtimes$-symbols remain in $R_{i_0}$. 
    • If $R_{i_0}$ was half-full before we shift the $2^{i_0}-1$ non-$\boxtimes$-symbols of $R_{i_0}$ into the zones $R_0,\dots, R_{i_0-1}$ filling up exactly half of the symbols of each $R_i$ zones where $i\in \{0,\dots, i_0-1\}$. Now the zone $R_{i_0}$ is empty.
  3. $\mathcal{U}$ performs symmetric operation to the left of position 0. That is for $j$ starting from $i_0-1$ down to 0, $\mathcal{U}$ literally moves the $2\times 2^j$ symbols of $L_j$ to fill the cells of $L_{j+1}$. Finally $\mathcal{U}$ moves the symbol originally in position 0 to $L_0$. 
    • So if $R_{i_0}$ was full before $L_{i_0}$ was empty and all $L_0,\dots, L_{i_0-1}$ were full. After the shifting $L_{j}$ contains all the $2\times 2j$ symbols of $L_{j-1}$. So $L_{i_0}$ becomes half filled. And since at each step $L_{j-1}$ was emptied to move all its content to $L_{j}$ in the next step $L_{j-1}$ becomes half filled. Therefore all $L_0,\dots,L_{i_0-1}$ are half filled. 
    • If $R_{i_0}$ was half filled before $L_{i_0}$ was also half filled and all $L_0,\dots, L_{i_0-1}$ were full. After the shifting $L_{j}$ contains all the $2\times 2j$ symbols of $L_{j-1}$. So $L_{i_0}$ becomes full. And since at each step $L_{j-1}$ was emptied to move all its content to $L_{j}$ in the next step $L_{j-1}$ becomes half filled. Therefore all $L_0,\dots,L_{i_0-1}$ are half filled.
  4. At the end of the shift all of the zones $R_0,L_0,R_1,L_1,\dots,R_{i_0-1},L_{i_0-1}$ are half-full, $R_{i_0}$ has $2^{i_0}$ fewer non-$\boxtimes$-symbols and $L_{i_0}$ has $2^{i_0}$ additional non-$\boxtimes$-symbols. Thus our invariants are maintained

Complexity Analysis: The total cost of performing a shift is proportional to the total size of all zones involved $R_0,L_0,\dots,R_{i_0},L_{i_0}$. That is $$O\left(\sum_{j=0}^{i_0}2\times 2^j  \right)=O\left(2^{i_0}\right)\text{ operations}$$After performing a shift with index $i$ the zones $R_0,L_0,\dots, R_{i-1},L_{i-1}$ are half-full which means it will take at least $2^i-1$ left shifts before the zones $L_0,\dots,L_{i-1}$ becomes empty or at least $2^{i-1}-1$ right shifts before the zones $R_0,\dots,R_{i-1}$ becomes empty. Hence once we perform a shift with index $i$, the next $2^i-1$ shifts of that particular parallel tape will all have index less than $i$. This means that for every one of parallel tapes at most a $\frac{1}{2^i}$ fraction of the total number of shifts have index $i$. Since we perform $T$ shifts and the highest possible index is in the course of simulating $T$ steps of $M$ is $$O\left( l\sum_{i=1}^{\log T} \frac{T}{2^{i-1}}2^i \right)=O(T\log T)$$

Comments

Popular posts from this blog

Hierarchy Theorems (Time, Space, Alternation)

There are many hierarchy theorems based on Time, Space, Nondeterminism, Randomization etc. Time Hierarchy Theorem (Version 1) : If $f, g$ are time-constructible functions satisfying $f(n) \log f(n)=o(g(n))$, then $$ DTIME(f(n)) \subsetneq DTIME (g(n)) $$ Proof:  Consider the following Turing machine $D$: "On input $x$, run for $f(|x|)\log (f(|x|))$ steps the Universal $TM$ $\mathcal{U}$ to simulate the execution of $M_x$ on $x$. If $\mathcal{U}$ outputs some bit $b \in\{0,1\}$ in this time, then output the opposite answer (i.e., output $1-b$ ). Else output 0.” Here $M_x$ is the machine represented by the string $x$.      By definition, $D$ halts within $f(n)\log (f(n))$ steps and hence the language $L$ decided by $D$ is in $DTIME(g(n))$. Claim:  $L \notin$ DTIME $(f(n))$ Proof:  For contradiction's sake, assume that there is some $TM$ $M$ and constant $c$ such that $TM$ $M$, given any input $x \in\{0,1\}^*$, halts within $cf(|x|)$ steps and outputs $D(x)$.      The time to si

Function Automata: A Technique to Solve Finite State Automata Problems

Here I will describe a strategy or technique to solve some finite state automata problems. This strategy was first taught to us by my Theory of Computation Professor. The term 'Function Automata' is given to me since we use function-like structures in this.     Suppose you are given a language that is a regular set \(L\). So there exists a \(DFA\) for \(L\), \(L_D=(Q,\Sigma,\delta, q_1, F)\) where \(Q\) is the set of states, \(\Sigma\) is the set of alphabets, \(\delta\) is the transition function, \(S\) is the starting state and \(f\) is the set of final states. Now let \(Q=\{q_1,q_2,\dots,q_n\) where \(n\in\mathbb{N}\). Now we will do a kind of subset construction but every new state will have \(n\) states of \(Q\) but they can repeat. So $$(\underbrace{q_1,q_1,\dots,q_1}_{n\text{ times}})$$ is a state of the new automata. Now, what is the meaning of these new states? Let \(f=(q_{k_1},q_{k_2},\dots,q_{k_n})\) be a new state where \(q_{k_i}\)'s are not necessary to be diff

Cook - Levin Theorem

Around 1971, Cook and Levin independently discovered the notion of \(NP-completeness\) and gave examples of combinatorial \(NP-complete\) problems whose definition seems to have nothing to do with Turing machines. Soon after, Karp showed that \(NP-completeness\) occurs widely and many problems of practical interest are \(NP-complete\). To date, thousands of computational problems in a variety of disciplines have been shown to be \(NP-complete\). \(SAT\) is the language of all satisfiable \(CNF\) formulae. Cook-Levin Theorem : \(SAT\) is \(NP-complete\) Before we go into the proof first we need to define some things Configuration of a Turing Machine: A configuration of a Turing Machine at any point of time \(t\) is basically the snapshot of the turning machine at that time. It is an ordered pair of the head position, current state, and tape content. Any configuration is initialized and ended with a special symbol, let \(\#\) mark its ending. Then the contents of the tape at the left of