Skip to main content

General Grammars and Turing Machines

(Content adapted from Critchlow & Eck)

General Grammars

At the beginning of this chapter the general idea of a grammar as a set of rewriting or production rules was introduced. For most of the chapter, however, we have restricted our attention to context-free grammars, in which production rules must be of the form AxA\longrightarrow x where AA is a non-terminal symbol. In this section, we will consider general grammars, that is, grammars in which there is no such restriction on the form of production rules. For a general grammar, a production rule has the form uxu\longrightarrow x, where uu is string that can contain both terminal and non-terminal symbols. For convenience, we will assume that uu contains at least one non-terminal symbol, although even this restriction could be lifted without changing the class of languages that can be generated by grammars. Note that a context-free grammar is, in fact, an example of a general grammar, since production rules in a general grammar are allowed to be of the form AxA\longrightarrow x. They just don't have to be of this form. I will use the unmodified term grammar to refer to general grammars.1 The definition of grammar is identical to the definition of context-free grammar, except for the form of the production rules:

A grammar is a 4-tuple (V,Σ,P,S)(V,\Sigma,P,S), where:

  • VV is a finite set of symbols. The elements of VV are the non-terminal symbols of the grammar.
  • Σ\Sigma is a finite set of symbols such that VΣ=V\cap\Sigma=\emptyset. The elements of Σ\Sigma are the terminal symbols of the grammar.
  • PP is a set of production rules. Each rule is of the form uxu\longrightarrow x where uu and xx are strings in (VΣ)(V\cup \Sigma)^* and uu contains at least one symbol from VV.
  • SVS\in V. SS is the start symbol of the grammar.

Suppose GG is a grammar. Just as in the context-free case, the language generated by GG is denoted by L(G)L(G) and is defined as L(G)={xΣ    SGx}L(G)=\{x\in\Sigma^*\;|\; S \Longrightarrow_G^* x\}. That is, a string xx is in L(G)L(G) if and only if xx is a string of terminal symbols and there is a derivation that produces xx from the start symbol, SS, in one or more steps.

The natural question is whether there are languages that can be generated by general grammars but that cannot be generated by context-free languages. A simple example of a language that is not context-free is {anbncn    nN}\{a^nb^nc^n\;|\; n\in \N\}. This language can be generated by a general grammar that first generates a string of the form XABCABCABCXABCABC\cdots ABC, where there are nn groups of ABCABC for some n0n\geq 0. Then we use general grammar rules that allow AA's, BB's, and CC's to switch places, until the string looks like XAAABBBCCCXAA\cdots ABB\cdots BCC\cdots C. Finally, another set of rules essentially traverse the string from left to right, changing the non-terminals AA, BB, and CC into the corresponding terminals aa, bb, and cc. Here is a grammar that does this:

SSABCSXBAABCAACCBBCXAaXXYYBbYYZZCcZZε\begin{aligned} S&\longrightarrow SABC\\ S&\longrightarrow X\\ BA&\longrightarrow AB\\ CA&\longrightarrow AC\\ CB&\longrightarrow BC\\ XA&\longrightarrow aX\\ X&\longrightarrow Y\\ YB&\longrightarrow bY\\ Y&\longrightarrow Z\\ ZC&\longrightarrow cZ\\ Z&\longrightarrow \varepsilon \end{aligned}

Here, the first two rules produce one of the strings XX, XABCXABC, XABCABCXABCABC, XABCABCABCXABCABCABC, and so on. The next three rules allow AA's to move to the left and CC's to move to the right, producing a string of the form XAnBnCnXA^nB^nC^n, for some nNn\in\N. The rule XAaXXA\longrightarrow aX allows the XX to move through the AA's from left to right, converting AA's to aa's as it goes. After converting the AA's, the XX can be transformed into a YY. The YY will then move through the BB's, converting them to bb's. Then, the YY is transformed into a ZZ, which is responsible for converting CC's to cc's. Finally, an application of the rule ZεZ\longrightarrow\varepsilon removes the ZZ, leaving the string anbncna^nb^nc^n.

Note that if the rule XYX\longrightarrow Y is applied before all the AA's have been converted to aa's, then there is no way for the remaining AA's to be converted to aa's or otherwise removed from the string. This means that the derivation has entered a dead end, which can never produce a string that consists of terminal symbols only. The only derivations that can produce strings in the language generated by the grammar are derivations in which the XX moves past all the AA's, converting them all to aa's. At this point in the derivation, the string is of the form anXua^nXu where uu is a string consisting entirely of BB's and CC's. Now the rule XYX\longrightarrow Y can be applied, producing the string anYua^nYu. Then, if a string of terminal symbols is ever to be produced, the YY must move past all the BB's, producing the string anbnYCna^nb^nYC^n. You can see that the use of three separate non-terminals, XX, YY, and ZZ, is essential for forcing the symbols in anbncna^nb^nc^n into the correct order.

Turing Machines

We saw hints in the previous section that "computation" is a more general concept than we might have thought. General grammars, which at first encounter don't seem to have much to do with algorithms or computing, turn out to be able to do things that are similar to the tasks carried out by computer programs. In this section, we will see that general grammars are precisely equivalent to computer programs in terms of their computational power, and that both are equivalent to a particularly simple model of computation known as a Turing machine. We shall also see that there are limits to what can be done by computing.

Historically, the theoretical study of computing began before computers existed. One of the early models of computation was developed in the 1930s by the British mathematician, Alan Turing, who was interested in studying the theoretical abilities and limitations of computation. His model for computation is a very simple abstract computing machine which has come to be known as a Turing machine. While Turing machines are not applicable in the same way that regular expressions, finite-state automata, and grammars are applicable, their use as a fundamental model for computation means that every computer scientist should be familiar with them, at least in a general way.

A Turing machine is really not much more complicated than a finite-state automaton or a pushdown automaton.2 Like a FSA, a Turing machine has a finite number of possible states, and it changes from state to state as it computes. However, a Turing machine also has an infinitely long tape that it can use for input and output. The tape extends to infinity in both directions. The tape is divided into cells, which are in one-to-one correspondence with the integers, Z\Z. Each cell can either be blank or it can hold a symbol from a specified alphabet. The Turing machine can move back and forth along this tape, reading and writing symbols and changing state. It can read only one cell at a time, and possibly write a new value in that cell. After doing this, it can change state and it can move by one cell either to the left or to the right. This is how the Turing machine computes. To use a Turing machine, you would write some input on its tape, start the machine, and let it compute until it halts. Whatever is written on the tape at that time is the output of the computation.

Although the tape is infinite, only a finite number of cells can be non-blank at any given time. If you don't like the idea of an infinite tape, you can think of a finite tape that can be extended to an arbitrarily large size as the Turing machine computes: If the Turing machine gets to either end of the tape, it will pause and wait politely until you add a new section of tape. In other words, it's not important that the Turing machine have an infinite amount of memory, only that it can use as much memory as it needs for a given computation, up to any arbitrarily large size. In this way, a Turing machine is like a computer that can ask you to buy it a new disk drive whenever it needs more storage space to continue a computation.3

A given Turing machine has a fixed, finite set of states. One of these states is designated as the start state. This is the state in which the Turing machine begins a computation. Another special state is the halt state. The Turing machine's computation ends when it enters its halt state. It is possible that a computation might never end because the machine never enters the halt state. This is analogous to an infinite loop in a computer program.

At each step in its computation, the Turing machine reads the contents of the tape cell where it is located. Depending on its state and the symbol that it reads, the machine writes a symbol (possibly the same symbol) to the cell, moves one cell either to the left or to the right, and (possibly) changes its state. The output symbol, direction of motion, and new state are determined by the current state and the input symbol. Note that either the input symbol, the output symbol, or both, can be blank. A Turing machine has a fixed set of rules that tell it how to compute. Each rule specifies the output symbol, direction of motion, and new state for some combination of current state and input symbol. The machine has a rule for every possible combination of current state and input symbol, except that there are no rules for what happens if the current state is the halt state. Of course, once the machine enters the halt state, its computation is complete and the machine simply stops.

I will use the character # to represent a blank in a way that makes it visible. I will always use hh to represent the halt state. I will indicate the directions, left and right, with LL and RR, so that {L,R}\{L,R\} is the set of possible directions of motion. With these conventions, we can give the formal definition of a Turing machine as follows:

A Turing machine is a 4-tuple (Q,Λ,q0,δ)(Q,\Lambda,q_0,\delta), where:

  • QQ is a finite set of states, including the halt state, hh.
  • Λ\Lambda is an alphabet which includes the blank symbol, #.
  • q0Qq_0\in Q is the start state.
  • δ ⁣:(Q{h})×ΛΛ×{L,R}×Q\delta\colon (Q\smallsetminus\{h\})\times\Lambda \to \Lambda\times \{L,R\}\times Q is the transition function. The fact that δ(q,σ)=(τ,d,r)\delta(q,\sigma)=(\tau,d,r) means that when the Turing machine is in state qq and reads the symbol σ\sigma, it writes the symbol τ\tau, moves one cell in the direction dd, and enters state rr.

Even though this is the formal definition, it's easier to work with a transition diagram representation of Turing machines. The transition diagram for a Turing machine is similar to the transition diagram for a DFA. However, there are no "accepting" states (only a halt state). Furthermore, there must be a way to specify the output symbol and the direction of motion for each step of the computation. We do this by labeling arrows with notations of the form (σ,τ,L)(\sigma,\tau,L) and (σ,τ,R)(\sigma,\tau,R), where σ\sigma and τ\tau are symbols in the Turing machine's alphabet. For example,

Turing Machine transition

indicates that when the machine is in state q0q_0 and reads an aa, it writes a bb, moves left, and enters state hh.

Here, for example, is a transition diagram for a simple Turing machine that moves to the right, changing aa's to bb's and vice versa, until it finds a cc. It leaves blanks (#'s) unchanged. When and if the machine encounters a cc, it moves to the left and halts:

Turing Machine example

To simplify the diagrams, I will leave out any transitions that are not relevant to the computation that I want the machine to perform. You can assume that the action for any omitted transition is to write the same symbol that was read, move right, and halt.

For another example, shown below is a transition diagram for a Turing machine that makes a copy of a string of aa's and bb's. To use this machine, you would write a string of aa's and bb's on its tape, place the machine on the first character of the string, and start the machine in its start state, q0q_0. When the machine halts, there will be two copies of the string on the tape, separated by a blank. The machine will be positioned on the first character of the leftmost copy of the string. Note that this machine uses cc's and dd's in addition to aa's and bb's. While it is copying the input string, it temporarily changes the aa's and bb's that it has copied to cc's and dd's, respectively. In this way it can keep track of which characters it has already copied. After the string has been copied, the machine changes the cc's and dd's back to aa's and bb's before halting.

Turing Machine example

In this machine, state q0q_0 checks whether the next character is an aa, a bb, or a # (indicating the end of the string). States q1q_1 and q2q_2 add an aa to the end of the new string, and states q3q_3 and q4q_4 do the same thing with a bb. States q5q_5 and q6q_6 return the machine to the next character in the input string. When the end of the input string is reached, state q7q_7 will move the machine back to the start of the input string, changing cc's and dd's back to aa's and bb's as it goes. Finally, when the machine hits the # that precedes the input string, it moves to the right and halts. This leave it back at the first character of the input string. It would be a good idea to work through the execution of this machine for a few sample input strings. You should also check that it works even for an input string of length zero.


Our primary interest in Turing machines is as language processors. Suppose that ww is a string over an alphabet Σ\Sigma. We will assume that Σ\Sigma does not contain the blank symbol. We can use ww as input to a Turing machine M=(Q,Λ,q0,δ)M=(Q,\Lambda,q_0,\delta) provided that ΣΛ\Sigma\subseteq\Lambda. To use ww as input for MM, we will write ww on MM's tape and assume that the remainder of the tape is blank. We place the machine on the cell containing the first character of the string, except that if w=εw=\varepsilon then we simply place the machine on a completely blank tape. Then we start the machine in its initial state, q0q_0, and see what computation it performs. We refer to this setup as "running MM with input ww."

When MM is run with input ww, it is possible that it will just keep running forever without halting. In that case, it doesn't make sense to ask about the output of the computation. Suppose however that MM does halt on input ww. Suppose, furthermore, that when MM halts, its tape is blank except for a string xx of non-blank symbols, and that the machine is located on the first character of xx. In this case, we will say that "MM halts with output xx." In addition, if MM halts with an entirely blank tape, we say that "MM halts with output ε\varepsilon." Note that when we run MM with input ww, one of three things can happen: (1) MM might halt with some string as output; (1) MM might fail to halt; or (3) MM might halt in some configuration that doesn't count as outputting any string.

The fact that a Turing machine can produce an output value allows us for the first time to deal with computation of functions. A function f ⁣:ABf\colon A\to B takes an input value in the set AA and produces an output value in the set BB. If the sets are sets of strings, we can now ask whether the values of the function can be computed by a Turing machine. That is, is there a Turing machine MM such that, given any string ww in the domain of ff as input, MM will compute as its output the string f(w)f(w). If this is that case, then we say that ff is a Turing-computable function.

Suppose that Σ\Sigma and Γ\Gamma are alphabets that do not contain # and that ff is a function from Σ\Sigma^* to Γ\Gamma^*. We say that ff is Turing-computable if there is a Turing machine M=(Q,Λ,q0,δ)M=(Q,\Lambda,q_0,\delta) such that ΣΛ\Sigma\subseteq\Lambda and ΓΛ\Gamma\subseteq\Lambda and for each string wΣw\in\Sigma^*, when MM is run with input ww, it halts with output f(w)f(w). In this case, we say that MM computes the function ff.

For example, let Σ={a}\Sigma=\{a\} and define f ⁣:ΣΣf\colon\Sigma^*\to\Sigma^* by f(an)=a2nf(a^n)=a^{2n}, for nNn\in\N. Then ff is Turing-computable since it is computed by this Turing machine:

Turing Machine example

We can also use Turing machines to define "computable languages." There are actually two different notions of Turing-computability for languages. One is based on the idea of Turing-computability for functions. Suppose that Σ\Sigma is an alphabet and that LΣL\subseteq\Sigma^*. The characteristic function of LL is the function χL ⁣:Σ{0,1}\chi_L\colon\Sigma^*\to\{0,1\} defined by the fact that χL(w)=1\chi_L(w)=1 if wLw\in L and χL(w)=0\chi_L(w)=0 if w∉Lw\not\in L. Note that given the function χL\chi_L, LL can be obtained as the set L={wΣ    χL(w)=1}L=\{w\in\Sigma^*\;|\; \chi_L(w)=1\}. Given a language LL, we can ask whether the corresponding function χL\chi_L is Turing-computable. If so, then we can use a Turing machine to decide whether or not a given string ww is in LL. Just run the machine with input ww. It will halt with output χL(w)\chi_L(w). (That is, it will halt and when it does so, the tape will be blank except for a 0 or a 1, and the machine will be positioned on the 0 or 1.) If the machine halts with output 1, then wLw\in L. If the machine halts with output 0, then w∉Lw\not\in L.

Let Σ\Sigma be an alphabet that does not contain # and let LL be a language over Σ\Sigma. We say that LL is Turing-decidable if there is a Turing machine M=(Q,Λ,q0,δ)M=(Q,\Lambda,q_0,\delta) such that ΣΛ\Sigma\subseteq\Lambda, {0,1}Λ\{0,1\}\subseteq\Lambda, and for each wΣw\in\Sigma^*, when MM is run with input ww, it halts with output χL(w)\chi_L(w). (That is, it halts with output 0 or 1, and the output is 0 if w∉Lw\not\in L and is 1 if wLw\in L.) In this case, we say that MM decides the language LL.

The second notion of computability for languages is based on the interesting fact that it is possible for a Turing machine to run forever, without ever halting. Whenever we run a Turing machine MM with input ww, we can ask the question, will MM ever halt or will it run forever? If MM halts on input ww, we will say that MM "accepts" ww. We can then look at all the strings over a given alphabet that are accepted by a given Turing machine. This leads to the notion of Turing-acceptable languages.

Let Σ\Sigma be an alphabet that does not contain #, and let LL be a language over Σ\Sigma. We say that LL is Turing-acceptable if there is a Turing machine M=(Q,Λ,q0,δ)M=(Q,\Lambda,q_0,\delta) such that ΣΛ\Sigma\subseteq\Lambda, and for each wΣw\in\Sigma^*, MM halts on input ww if and only if wLw\in L. In this case, we say that MM accepts the language LL.

It should be clear that any Turing-decidable language is Turing-acceptable. In fact, if LL is a language over an alphabet Σ\Sigma, and if MM is a Turing machine that decides LL, then it is easy to modify MM to produce a Turing machine that accepts LL. At the point where MM enters the halt state with output 0, the new machine should enter a new state in which it simply moves to the right forever, without ever halting. Given an input wΣw\in\Sigma^*, the modified machine will halt if and only if MM halts with output 1, that is, if and only if wLw\in L.

Exercises

  1. Let Σ={a}\Sigma=\{a\}. Draw a transition diagram for a Turing machine that computes the function f ⁣:ΣΣf\colon\Sigma^*\to\Sigma^* where f(an)=a3nf(a^n)=a^{3n}, for nNn\in\N. Draw a transition diagram for a Turing machine that computes the function f ⁣:ΣΣf\colon\Sigma^*\to\Sigma^* where f(an)=a3n+1f(a^n)=a^{3n+1}, for nNn\in\N.

  2. Let Σ={a,b}\Sigma=\{a,b\}. Draw a transition diagram for a Turing machine that computes the function f ⁣:ΣΣf\colon\Sigma^*\to\Sigma^* where f(w)=wRf(w)=w^R.

  3. Suppose that Σ\Sigma, Γ\Gamma, and Ξ\Xi are alphabets and that f ⁣:ΣΓf\colon\Sigma^*\to\Gamma^* and g ⁣:ΓΞg\colon\Gamma^*\to\Xi^* are Turing-computable functions. Show that gfg\circ f is Turing-computable.

  4. We have defined computability for functions f ⁣:ΣΓf\colon\Sigma^*\to\Gamma^*, where Σ\Sigma and Γ\Gamma are alphabets. How could Turing machines be used to define computable functions from N\N to N\N\,? (Hint: Consider the alphabet Σ={a}\Sigma=\{a\}.)

  5. Let Σ\Sigma be an alphabet and let LL be a language over Σ\Sigma. Show that LL is Turing-decidable if and only if its complement, L\overline{L}, is Turing-decidable.

  6. Draw a transition diagram for a Turing machine which decides the language {anbn    nN}\{a^nb^n\;|\; n\in\N\}. (Hint: Change the aa's and bb's to $'s in pairs.) Explain in general terms how to make a Turing machine that decides the language {anbncn    nN}\{a^nb^nc^n\;|\; n\in\N\}.

  7. Draw a transition diagram for a Turing machine which decides the language {anbm    n>0\{a^nb^m\;|\; n>0 and mm is a multiple of n}n\}. (Hint: Erase nn bb's at a time.)

  8. Based on your answer to the previous problem and the copying machine presented in this section, describe in general terms how you would build a Turing machine to decide the language {ap    p\{a^p\;|\; p is a prime number}\}.

  9. Let g ⁣:{a}{0,1}g\colon \{a\}^*\to\{0,1\}^* be the function such that for each nNn\in\N, g(an)g(a^n) is the representation of nn as a binary number. Draw a transition diagram for a Turing machine that computes gg.


  1. There is another special type of grammar that is intermediate between context-free grammars and general grammars. In a so-called context-sensitive grammar, every production rule is of the form uxu\longrightarrow x where xu|x|\ge|u|. We will not cover context-sensitive grammars in this text.
  2. In fact, Turing machines can be shown to be equivalent in their computational power to pushdown automata with two independent stacks.
  3. The tape of a Turing machine can be used to store arbitrarily large amounts of information in a straightforward way. Although we can imagine using an arbitrary amount of memory with a computer, it's not so easy. Computers aren't set up to keep track of unlimited amounts of data. If you think about how it might be done, you probably won't come with anything better than an infinite tape. (The problem is that computers use integer-valued addresses to keep track of data locations. If a limit is put on the number of bits in an address, then only a fixed, finite amount of data can be addressed. If no limit is put on the number of bits in an address, then we are right back to the problem of storing an arbitrarily large piece of datajust to represent an address!)