Back to Architecture Recipes

Encoder-Decoder Models

T5-styleSeq2Seq

T5-style

Encoder-decoder architectures for translation and sequence tasks.

Overview

Encoder-decoder models combine both encoder and decoder components of the Transformer architecture. These models, exemplified by T5 (Text-to-Text Transfer Transformer), BART, and others, excel at sequence-to-sequence tasks like translation and summarization.

Key characteristics

  • Sequence-to-Sequence: Process an input sequence and generate an output sequence of potentially different length
  • Cross-Attention: Decoder attends to encoder outputs to condition on the input sequence
  • Versatile Tasks: Can handle various tasks by framing them as text-to-text problems
  • Bidirectional Encoder: Encoder processes input in both directions for full context understanding

Popular models

  • T5 (Text-to-Text Transfer Transformer): Frames all NLP tasks as text generation
  • BART (Bidirectional and Auto-Regressive Transformers): Combines bidirectional encoding with autoregressive decoding
  • mBART: Multilingual variant of BART for machine translation
  • Pegasus: Pre-trained for abstractive summarization

Core steps

  1. Encoder Processing: Input sequence is processed bidirectionally through encoder layers
  2. Cross-Attention: Decoder layers attend to encoder outputs to condition on input
  3. Autoregressive Generation: Decoder generates output tokens one at a time
  4. Output Projection: Final representations are projected to vocabulary logits

Encoder-decoder trade-offs

  • Input understanding: Full bidirectional vs causal only
  • Generation: Autoregressive for both
  • Pretraining efficiency: Lower than decoder-only
  • Fine-tuning flexibility: Task-specific vs instruction-tuned

Training objectives

  • Prefix LM: Predict continuation of prefix
  • Infilling: Fill masked spans
  • Seq2seq LM: Standard encoder-decoder training

Architecture overview

Encoder-decoder models consist of an encoder that processes the input sequence bidirectionally, and a decoder that generates the output autoregressively while attending to the encoder's representations.

Core steps

  • Encoder Processing: Input sequence is processed bidirectionally through encoder layers
  • Cross-Attention: Decoder layers attend to encoder outputs to condition on input
  • Autoregressive Generation: Decoder generates output tokens one at a time
  • Output Projection: Final representations are projected to vocabulary logits

Applications

Sequence-to-Sequence Tasks

  • Machine Translation: Translating text between languages
  • Summarization: Generating concise summaries of documents
  • Question Generation: Generating questions from given contexts
  • Text Simplification: Rewriting text in simpler terms

Generation Tasks

  • Data-to-Text: Generating text from structured data or tables
  • Code Generation: Generating code from natural language descriptions
  • Dialogue Generation: Generating conversational responses

Industry applications

  • Translation Services: Real-time and document translation across languages
  • Content Summarization: News summarization, document condensation
  • Data Extraction: Extracting structured information from unstructured text
Code View

Encoder-Decoder Models Implementation

// Encoder-Decoder Models recipe using OpenAI
// Install: bun add openai

import OpenAI from "openai";

async function main() {
  const input = "Add your prompt here.";
  const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
  const system = "You are a senior AI engineer and technical writer. Explain how the architecture applies to the request and outline practical implementation guidance. Recipe: Encoder-Decoder Models. Description: Encoder-decoder architectures for translation and sequence tasks. Focus: T5-style Provide actionable, implementation-ready guidance.";
  const user = `Request: ${input}`;

  const openaiResponse = await openai.chat.completions.create({
    model: "gpt-4o-mini",
    messages: [
      { role: "system", content: system },
      { role: "user", content: user },
    ],
  });

  const openaiText = openaiResponse.choices[0]?.message?.content?.trim() ?? "";

  console.log(openaiText);
}

main().catch((error) => {
  console.error(error);
  process.exitCode = 1;
});