Introduction to Theory of Computation

Theory of Computation (TOC) is a branch of computer science that deals with how efficiently problems can be solved using computational models and algorithms. It focuses on understanding the fundamental capabilities and limitations of computers.

Key Areas of TOC

  • Automata Theory: Study of abstract machines and computational problems
  • Computability Theory: What can and cannot be computed
  • Complexity Theory: Classification of problems by their inherent difficulty
  • Formal Languages: Study of grammar and language recognition

Importance of TOC

  • Foundation for compiler design
  • Basis for programming language theory
  • Essential for algorithm analysis
  • Foundation for artificial intelligence
  • Critical for understanding computational limits

Formal Languages

A formal language is a set of strings of symbols that may be constrained by rules specific to that language.

Basic Terminology

  • Alphabet (Σ): Finite set of symbols
  • String: Finite sequence of symbols from alphabet
  • Language: Set of strings over an alphabet
  • Empty String (ε): String with zero symbols
  • Kleene Star (Σ*): Set of all strings over alphabet Σ
  • Kleene Plus (Σ+): Set of all non-empty strings over Σ

Operations on Languages

  • Union: L₁ ∪ L₂ = {w | w ∈ L₁ or w ∈ L₂}
  • Intersection: L₁ ∩ L₂ = {w | w ∈ L₁ and w ∈ L₂}
  • Concatenation: L₁L₂ = {w₁w₂ | w₁ ∈ L₁ and w₂ ∈ L₂}
  • Kleene Star: L* = {ε} ∪ L ∪ LL ∪ LLL ∪ ...
  • Complement: L' = {w ∈ Σ* | w ∉ L}

Formal Language Definition

Given an alphabet Σ, a language L is any subset of Σ*

Σ* = {ε} ∪ Σ ∪ Σ² ∪ Σ³ ∪ ...

Grammars in Formal Languages

A grammar is a set of production rules for strings in a formal language. It describes how to form strings from the language's alphabet that are valid according to the language's syntax.

Components of a Grammar

  • Variables (V): Non-terminal symbols
  • Terminals (T): Basic symbols from which strings are formed
  • Start Symbol (S): Special variable where production begins
  • Production Rules (P): Rules for replacing variables

Formal Grammar Definition

A grammar G is a 4-tuple: G = (V, T, P, S)

Where:
V = Finite set of variables (non-terminals)
T = Finite set of terminals
P = Finite set of production rules
S = Start symbol (S ∈ V)

Grammar Example

Grammar: G = ({S, A}, {a, b}, P, S)

Production Rules P:
S → aA
A → bA | ε

This grammar generates the language: a*b*

Finite Automata

Finite automata are abstract machines that recognize regular languages. They have finite memory and process input strings symbol by symbol.

Deterministic Finite Automata (DFA)

  • Exactly one transition for each state and input symbol
  • No ε-transitions
  • Simple and efficient implementation

DFA Formal Definition

A DFA is a 5-tuple: M = (Q, Σ, δ, q₀, F)

Where:
Q = Finite set of states
Σ = Finite input alphabet
δ = Transition function: Q × Σ → Q
q₀ = Initial state (q₀ ∈ Q)
F = Set of final states (F ⊆ Q)

Non-deterministic Finite Automata (NFA)

  • Can have zero, one, or multiple transitions for a state and input symbol
  • May have ε-transitions
  • Easier to design for complex patterns

NFA Formal Definition

An NFA is a 5-tuple: M = (Q, Σ, δ, q₀, F)

Where:
Q = Finite set of states
Σ = Finite input alphabet
δ = Transition function: Q × (Σ ∪ {ε}) → 2^Q
q₀ = Initial state (q₀ ∈ Q)
F = Set of final states (F ⊆ Q)

Exam Tip: Every NFA can be converted to an equivalent DFA using the subset construction method, though the DFA may have exponentially more states.

DFA vs NFA Comparison

Understanding the differences between DFA and NFA is crucial for automata theory.

Aspect DFA NFA
Transition Exactly one transition per state and symbol Zero, one, or multiple transitions possible
ε-transitions Not allowed Allowed
Determinism Deterministic Non-deterministic
Implementation Easy to implement Harder to implement
State Complexity More states for same language Fewer states for same language
Power Recognizes regular languages Recognizes regular languages

DFA Example

DFA that accepts strings ending with '01'

States: q₀, q₁, q₂
Alphabet: {0, 1}
Start: q₀
Final: q₂

Transition Table:
δ(q₀, 0) = q₁, δ(q₀, 1) = q₀
δ(q₁, 0) = q₁, δ(q₁, 1) = q₂
δ(q₂, 0) = q₁, δ(q₂, 1) = q₀

Note: Despite their differences in design, DFAs and NFAs have equivalent computational power - both recognize exactly the regular languages.

Regular Expressions

Regular expressions provide a concise and flexible way to describe patterns in strings and define regular languages.

Basic Regular Expression Operations

  • Union (| or +): R₁ + R₂ matches R₁ or R₂
  • Concatenation: R₁R₂ matches R₁ followed by R₂
  • Kleene Star (*): R* matches zero or more occurrences of R
  • Kleene Plus (+): R+ matches one or more occurrences of R
  • Optional (?): R? matches zero or one occurrence of R

Regular Expression Examples

Regular Expression Language Description
a* Zero or more a's
a+b One or more a's followed by b
(a|b)* Any string of a's and b's
a(a|b)*b Starts with a, ends with b
(0|1)*00(0|1)* Contains substring '00'

Equivalence with Finite Automata

  • Every regular expression can be converted to an NFA
  • Every DFA/NFA can be converted to a regular expression
  • Regular expressions = Regular languages = Languages accepted by finite automata
Important: The set of languages described by regular expressions is exactly the set of regular languages, which are exactly the languages accepted by finite automata.

Pumping Lemma for Regular Languages

The Pumping Lemma is used to prove that certain languages are not regular.

Pumping Lemma Statement

If L is a regular language, then there exists a constant n (pumping length) such that for every string w in L with |w| ≥ n, w can be divided into three parts, w = xyz, satisfying:

  1. |xy| ≤ n
  2. |y| ≥ 1
  3. For all k ≥ 0, the string xy^kz is also in L

Using Pumping Lemma to Prove Non-regularity

  1. Assume L is regular (for contradiction)
  2. Let n be the pumping length
  3. Choose a string w in L with |w| ≥ n
  4. Show that for all possible divisions w = xyz satisfying conditions 1 and 2, there exists some k such that xy^kz ∉ L
  5. Conclude that L is not regular
Exam Tip: Common languages to prove non-regular using pumping lemma: {a^nb^n | n ≥ 0}, {ww | w ∈ {a,b}*}, {a^n^2 | n ≥ 0}.

Pumping Lemma Example

Prove L = {a^nb^n | n ≥ 0} is not regular

Assume L is regular with pumping length n
Choose w = a^nb^n
For any division w = xyz with |xy| ≤ n and |y| ≥ 1:
y consists only of a's
Pump y: xy^2z has more a's than b's
Therefore, xy^2z ∉ L
Contradiction! So L is not regular.

Context-Free Grammars

Context-Free Grammars (CFG) are more powerful than regular grammars and can describe nested structures and programming language syntax.

CFG Components

  • Variables: Non-terminal symbols
  • Terminals: Basic symbols
  • Productions: Rules of form A → α, where A is a variable and α is a string of variables and terminals
  • Start Symbol: Special variable where derivation begins

CFG Formal Definition

A CFG is a 4-tuple: G = (V, T, P, S)

Where:
V = Finite set of variables
T = Finite set of terminals
P = Finite set of productions of form A → α
S = Start symbol (S ∈ V)

CFG Examples

CFG for balanced parentheses:

S → (S) | SS | ε

CFG for arithmetic expressions:

E → E + E | E * E | (E) | id

Derivations and Parse Trees

  • Leftmost Derivation: Always replace the leftmost variable
  • Rightmost Derivation: Always replace the rightmost variable
  • Parse Tree: Tree representation of derivation
  • Ambiguous Grammar: Grammar with multiple parse trees for same string

Pushdown Automata

Pushdown Automata (PDA) are finite automata equipped with a stack, which gives them additional memory capability to recognize context-free languages.

PDA Components

  • Finite Control: States and transition function
  • Input Tape: Contains input string
  • Stack: LIFO data structure for storage

PDA Formal Definition

A PDA is a 7-tuple: M = (Q, Σ, Γ, δ, q₀, Z₀, F)

Where:
Q = Finite set of states
Σ = Input alphabet
Γ = Stack alphabet
δ = Transition function: Q × (Σ ∪ {ε}) × Γ → finite subsets of Q × Γ*
q₀ = Initial state
Z₀ = Initial stack symbol
F = Set of final states

PDA Acceptance Modes

  • Final State Acceptance: PDA accepts if it reaches a final state
  • Empty Stack Acceptance: PDA accepts if stack becomes empty
  • Both: Both final state and empty stack
Note: The set of languages accepted by PDAs is exactly the set of context-free languages. Every CFG has an equivalent PDA and vice versa.

Turing Machines

Turing Machines are the most powerful automata in the Chomsky hierarchy and form the theoretical basis for modern computers.

Turing Machine Components

  • Tape: Infinite in both directions, divided into cells
  • Head: Reads/writes symbols and moves left/right
  • Finite Control: States and transition function
  • Alphabet: Finite set of tape symbols

Turing Machine Formal Definition

A TM is a 7-tuple: M = (Q, Σ, Γ, δ, q₀, B, F)

Where:
Q = Finite set of states
Σ = Input alphabet (Σ ⊆ Γ)
Γ = Tape alphabet
δ = Transition function: Q × Γ → Q × Γ × {L, R}
q₀ = Initial state
B = Blank symbol (B ∈ Γ)
F = Set of final states

Turing Machine Capabilities

  • Can recognize recursively enumerable languages
  • Can compute any computable function
  • Serves as theoretical model for modern computers
  • Basis for the Church-Turing thesis

Variants of Turing Machines

  • Multi-tape TM: Multiple tapes with independent heads
  • Non-deterministic TM: Multiple possible transitions
  • Universal TM: Can simulate any other TM
Important: All variants of Turing Machines have the same computational power - they can recognize the same class of languages (recursively enumerable languages).

Chomsky Hierarchy

The Chomsky Hierarchy classifies formal grammars and their corresponding languages and automata into four types based on their generative power.

Type Grammar Language Automaton Production Rules
Type 0 Unrestricted Recursively Enumerable Turing Machine α → β (no restrictions)
Type 1 Context-Sensitive Context-Sensitive Linear Bounded Automaton αAβ → αγβ
Type 2 Context-Free Context-Free Pushdown Automaton A → γ
Type 3 Regular Regular Finite Automaton A → aB, A → a

Language Hierarchy

Regular Languages ⊂ Context-Free Languages ⊂ Context-Sensitive Languages ⊂ Recursively Enumerable Languages

Exam Tip: Remember the proper inclusions: Regular ⊂ Context-Free ⊂ Context-Sensitive ⊂ Recursively Enumerable. Each class is strictly more powerful than the previous one.

Computability Theory

Computability theory studies which problems can be solved by algorithms and which cannot.

Key Concepts

  • Decidable Problem: Problem for which an algorithm exists that always gives correct yes/no answer
  • Undecidable Problem: No algorithm exists that solves the problem for all inputs
  • Recognizable Language: Language for which a TM exists that accepts all strings in the language (may not reject strings not in the language)
  • Co-recognizable Language: Complement of a recognizable language

Famous Undecidable Problems

  • Halting Problem: Given a program and input, determine if it halts
  • Post Correspondence Problem: Given dominoes with strings, determine if a sequence exists where top and bottom strings match
  • Rice's Theorem: All non-trivial semantic properties of programs are undecidable

Halting Problem Proof

Assume H is a TM that solves halting problem
H(M,w) = accept if M halts on w, reject if M loops on w
Construct D(M) that:
- Runs H(M,M)
- If H accepts, loop forever
- If H rejects, halt
Now run D(D):
If D halts on D, then H(D,D) accepts, so D(D) loops - contradiction!
If D loops on D, then H(D,D) rejects, so D(D) halts - contradiction!
Therefore, H cannot exist.

Important: The Halting Problem is the most famous undecidable problem and serves as the basis for proving many other problems undecidable through reduction.