Read Models, Projections, and Integration Pipelines

In the previous lesson, the focus was on event sourcing and Akka Persistence: commands arrive, decisions are made, events are stored, and state can be rebuilt from that event history. That gives you a strong write-side model. It does not, by itself, give the rest of the business a convenient way to use that information.

This is where many event-sourced systems either become genuinely useful or become operationally awkward.

A persisted event log is excellent as a source of truth. It is usually a poor interface for dashboards, search screens, customer timelines, compliance reports, fraud analysis, or downstream integrations. Those consumers want something else:

a query-friendly view of the data
a denormalized representation shaped around reads
a stream of changes they can process independently
a reliable way to keep secondary systems up to date

That is the purpose of read models and projections.

In this lesson, we will look at how Akka-based systems turn persisted events into practical business outputs. We will make the distinction between write-side truth and read-side usability concrete, explain what projections are actually doing, and walk through the design constraints that matter in production: offset tracking, duplicate handling, lag, replay safety, and integration boundaries.

Why the Event Log Is Not the Whole Product

Event sourcing is attractive because it preserves historical truth. Instead of storing only the latest row state, you store the meaningful events that led to the current state:

PaymentAuthorized
FundsCaptured
FraudReviewRequested
FraudReviewCleared
RefundIssued

That history is valuable. It helps with auditability, debugging, recovery, and long-lived business workflows.

But imagine what happens if you ask a product team, support team, or finance team to build directly on top of that raw event stream.

They rarely want to answer questions like this by replaying every event on demand:

show me all payments currently pending review
display the current order timeline for customer cust-442
list suspicious transactions above a threshold in the last hour
search for all accounts updated by a particular merchant integration
render a dashboard of daily authorization rates

These are read concerns, not write concerns.

The event log answers: what happened?

The read side usually answers:

what is the current business view?
what can be searched quickly?
what should another system consume next?

That difference matters because event-sourced systems become painful when teams pretend one data shape should serve every purpose. It usually should not.

What a Read Model Actually Is

A read model is a data structure built for consumption, not for authoritative decision-making.

That means it is allowed to be:

denormalized
duplicated from the source of truth
optimized for queries
shaped around one audience or use case
rebuilt if necessary

This is an important mindset shift. In an event-sourced architecture, the write side and the read side often have different responsibilities.

The write side is about correctness:

validate commands
decide business outcomes
persist events
rebuild authoritative state

The read side is about usability:

fast lookup
summaries
reporting
search
operational views
external feeds

For example, a payment system may have several read models at once:

a customer timeline table for support agents
a merchant settlement summary for finance workflows
a fraud review queue for analysts
a search index for operations staff
a warehouse feed for analytics

All of those may be derived from the same persisted event stream. None of them needs to be the canonical source of truth.

Projections Are the Bridge Between Events and Reads

If the event log is the source and the read model is the destination, the projection is the mechanism that moves information from one to the other.

In practical terms, a projection is a consumer of persisted events that:

reads events from a source such as Akka Persistence Query or an event stream
keeps track of how far it has progressed
transforms events into updates for a read model or downstream system
handles restart and replay behavior safely

That sounds simple, but it carries a lot of operational responsibility.

A projection is not just a loop that says "for each event, write to another table." In production, it also needs to answer:

how do we avoid losing our place?
what happens when the handler crashes halfway through?
what if the same event is processed twice?
how do we rebuild a read model after a schema change?
how do we detect projection lag?

This is why the projection layer deserves explicit design instead of being treated as glue code.

A Concrete Scenario: Payment Events Feeding Three Consumers

Suppose you run a payments platform with an event-sourced payment entity. The entity persists events such as:

PaymentCreated
PaymentAuthorized
PaymentCaptured
PaymentFailed
RefundIssued

Now the rest of the business needs three different outputs:

A support-facing payment timeline screen.
A fraud-analysis table that highlights patterns across merchants.
A downstream integration feed that pushes relevant events into a settlement system.

These consumers do not need the same representation.

The support screen wants one row per payment with current status, timestamps, and customer context.

The fraud-analysis flow wants a stream of normalized facts, maybe enriched with merchant metadata and risk flags.

The settlement integration wants only capture and refund events, transformed into a contract another system understands.

Trying to satisfy all three directly from entity state or direct database queries usually leads to one of two failures:

the entity becomes bloated with read-side responsibilities it should not own
every downstream consumer invents its own inconsistent interpretation of the event stream

Projections solve this by giving each downstream concern a deliberate translation layer.

A Small Event-Sourced Domain Example

Here is a simplified payment event model:

import java.time.Instant

object PaymentDomain {
  sealed trait Event {
    def paymentId: String
    def occurredAt: Instant
  }

  final case class PaymentCreated(
      paymentId: String,
      merchantId: String,
      customerId: String,
      amount: BigDecimal,
      occurredAt: Instant
  ) extends Event

  final case class PaymentAuthorized(
      paymentId: String,
      authorizationCode: String,
      occurredAt: Instant
  ) extends Event

  final case class PaymentCaptured(
      paymentId: String,
      capturedAmount: BigDecimal,
      occurredAt: Instant
  ) extends Event

  final case class PaymentFailed(
      paymentId: String,
      reason: String,
      occurredAt: Instant
  ) extends Event

  final case class RefundIssued(
      paymentId: String,
      refundAmount: BigDecimal,
      occurredAt: Instant
  ) extends Event
}

The write-side entity persists these events because they represent meaningful business facts.

The projection layer then interprets them for a specific read concern.

Building a Payment Summary Read Model

Suppose support agents need a fast view that answers:

current payment status
merchant and customer IDs
amount
last updated time
failure reason if present

That does not require replaying the full event stream on every page load. It is a good fit for a projection-maintained table.

The read model might look like this conceptually:

final case class PaymentSummaryRow(
    paymentId: String,
    merchantId: String,
    customerId: String,
    amount: BigDecimal,
    status: String,
    lastUpdatedAt: Instant,
    failureReason: Option[String]
)

Now define a repository that can upsert the row based on incoming events:

import java.time.Instant
import scala.concurrent.Future

trait PaymentSummaryRepository {
  def createPayment(
      paymentId: String,
      merchantId: String,
      customerId: String,
      amount: BigDecimal,
      occurredAt: Instant
  ): Future[Unit]

  def markAuthorized(
      paymentId: String,
      occurredAt: Instant
  ): Future[Unit]

  def markCaptured(
      paymentId: String,
      capturedAmount: BigDecimal,
      occurredAt: Instant
  ): Future[Unit]

  def markFailed(
      paymentId: String,
      reason: String,
      occurredAt: Instant
  ): Future[Unit]

  def markRefunded(
      paymentId: String,
      refundAmount: BigDecimal,
      occurredAt: Instant
  ): Future[Unit]
}

The projection handler becomes a translation layer:

import akka.Done
import akka.projection.eventsourced.EventEnvelope

import scala.concurrent.{ExecutionContext, Future}

class PaymentSummaryProjectionHandler(
    repository: PaymentSummaryRepository
)(implicit ec: ExecutionContext) {

  def process(envelope: EventEnvelope[PaymentDomain.Event]): Future[Done] = {
    envelope.event match {
      case PaymentDomain.PaymentCreated(paymentId, merchantId, customerId, amount, occurredAt) =>
        repository
          .createPayment(paymentId, merchantId, customerId, amount, occurredAt)
          .map(_ => Done)

      case PaymentDomain.PaymentAuthorized(paymentId, _, occurredAt) =>
        repository
          .markAuthorized(paymentId, occurredAt)
          .map(_ => Done)

      case PaymentDomain.PaymentCaptured(paymentId, capturedAmount, occurredAt) =>
        repository
          .markCaptured(paymentId, capturedAmount, occurredAt)
          .map(_ => Done)

      case PaymentDomain.PaymentFailed(paymentId, reason, occurredAt) =>
        repository
          .markFailed(paymentId, reason, occurredAt)
          .map(_ => Done)

      case PaymentDomain.RefundIssued(paymentId, refundAmount, occurredAt) =>
        repository
          .markRefunded(paymentId, refundAmount, occurredAt)
          .map(_ => Done)
    }
  }
}

This is intentionally straightforward. That is a feature, not a weakness.

Projection code should usually be boring. It should map durable facts into durable consequences. If the handler becomes a second hidden domain model full of complicated business decisions, your system boundaries are probably blurring.

Where Akka Projections Fits

Akka Projections provides infrastructure around this pattern so teams do not have to hand-roll every concern themselves.

The important ideas are:

a source provider that reads envelopes from the event stream
a handler that processes each envelope
an offset store so progress can be resumed after restart
execution models for exactly how events are pulled and handled

At a high level, the wiring looks like this:

import akka.actor.typed.ActorSystem
import akka.projection.ProjectionBehavior
import akka.projection.eventsourced.scaladsl.EventSourcedProvider
import akka.projection.r2dbc.scaladsl.R2dbcProjection
import akka.projection.ProjectionId

def startProjection(system: ActorSystem[_], repository: PaymentSummaryRepository): Unit = {
  val sourceProvider =
    EventSourcedProvider.eventsByTag[PaymentDomain.Event](
      system = system,
      readJournalPluginId = "akka.persistence.r2dbc.query",
      tag = "payment"
    )

  val projection =
    R2dbcProjection.atLeastOnceAsync(
      projectionId = ProjectionId("payment-summary", "payment"),
      sourceProvider = sourceProvider,
      handler = () => new akka.projection.scaladsl.Handler[EventEnvelope[PaymentDomain.Event]] {
        private val delegate = new PaymentSummaryProjectionHandler(repository)(system.executionContext)

        override def process(envelope: EventEnvelope[PaymentDomain.Event]) =
          delegate.process(envelope)
      }
    )

  system.systemActorOf(ProjectionBehavior(projection), "payment-summary-projection")
}

The exact API surface may vary with your storage plugin and Akka version, but the architectural purpose stays the same: consume persisted events, transform them safely, and record progress.

Offsets Are Part of Your Correctness Story

Once a projection exists, offset tracking becomes just as important as the handler logic.

An offset answers a simple question: how far has this projection successfully processed the stream?

That answer controls whether the system can recover cleanly after:

a node restart
a deployment
a transient database failure
a crash halfway through a batch

This is why a projection usually needs durable progress tracking, not just an in-memory pointer.

If offsets are wrong, the system will either:

skip events and leave the read model incomplete
reprocess too much and create duplicates or inconsistencies

Neither failure mode is theoretical.

The important design question is not "does the framework track offsets?" The important question is whether your write to the read model and your acknowledgment of progress form a safe unit for your workload.

At-Least-Once Means Duplicates Are Normal

Many projection setups are at-least-once. That means an event may be processed more than once, especially around crashes and retries.

Teams sometimes hear that and immediately ask for exactly-once behavior everywhere. The better first question is whether the handler is idempotent.

Idempotent projection design usually matters more than chasing a perfect delivery slogan.

Examples of projection actions that can be made idempotent:

upsert a summary row by entity ID
store the latest known status for a payment
insert an event into a table with a unique event identifier
ignore processing if a downstream record for the envelope offset already exists

Examples that are more dangerous:

increment counters blindly on every replay
send emails directly from the projection without deduplication
call an external billing API that cannot tolerate duplicates
append to downstream state without any idempotency key

If your projection updates a search index or dashboard table, replay is usually acceptable if the writes are shaped carefully.

If your projection triggers external side effects, the bar is higher.

Internal Read Models and External Integrations Are Not the Same

One of the most useful architectural distinctions is this:

some projections build internal read models
some projections feed external systems

Those are related, but they do not have identical constraints.

Internal Read Models

These usually support your own product or operators:

admin dashboards
customer timelines
reporting tables
search indexes
queue views for human workflows

For these projections, replayability is often a major advantage. If the table gets corrupted or the schema changes, you can rebuild from the event history.

External Integration Pipelines

These push facts into other systems:

data warehouses
compliance exports
notification platforms
fraud engines
settlement processors

Here, replay is more delicate. The downstream system may not be happy if yesterday's capture event is delivered again without clear deduplication semantics.

This is why mature systems often keep the projection stage and the side-effect stage separate.

For example:

A projection writes a durable integration-outbox table.
A separate delivery component reads that outbox.
Delivery applies retries, backoff, and idempotency keys suited to the external target.

That design is often safer than having the projection call third-party systems directly.

A Practical Pipeline Shape

Suppose persisted payment events need to reach an external settlement system. A robust design often looks like this:

The event-sourced payment entity persists PaymentCaptured.
A projection reads the event and writes a normalized settlement record into an internal outbox table.
Another component reads the outbox and calls the settlement API.
Delivery status is tracked separately from projection progress.

That may feel more indirect than directly calling the external service inside the projection handler. It is also usually easier to operate.

Why?

projection progress stays tied to durable internal writes
external retries do not block event consumption forever
failed deliveries can be inspected and replayed independently
integration-specific backoff rules do not leak into the read-model layer

This is what the lesson title means by integration pipelines. The event log is not just feeding a UI table. It can feed a chain of reliable downstream steps, each with its own responsibilities.

Rebuilds Should Be Expected, Not Feared

One of the strongest advantages of persisted events is the ability to rebuild derived views.

That becomes valuable when:

you add new fields to a dashboard projection
you change how a fraud score summary is computed
a search index needs to be regenerated
a read-side table was populated incorrectly due to a bug

If the read model is truly derived, rebuild should be a normal operational tool.

This is another reason to keep business decisions on the write side and keep projections focused on derivation. If the projection has grown into a fragile second source of business truth, rebuild becomes risky instead of routine.

Rebuilds still require planning:

how much historical data must be replayed?
can the target table be rebuilt in place, or do you need a parallel table and cutover?
what happens to live traffic during replay?
do you need rate limiting to avoid crushing downstream storage?

These are operational design questions, not just coding details.

Lag Is a Business and Operations Concern

Every projection introduces some degree of eventual consistency.

That is usually fine, but only if the team understands what the lag means.

For example:

a support dashboard may tolerate a few seconds of lag
a fraud decision feed may need much tighter timing
a settlement export may have regulatory deadlines
a customer timeline may be acceptable as long as it catches up quickly after incidents

This is why projection monitoring matters.

Operators often need visibility into:

current offset position
time lag behind the newest event
handler error rates
replay throughput
dead-letter or failed-delivery counts in downstream stages

If a projection is business-critical, "it usually catches up" is not an operational strategy.

Schema Evolution Requires Discipline

As systems live longer, event schemas and read-model schemas both change.

That raises practical questions:

can old events still be interpreted correctly?
can new handlers replay old history without breaking?
do downstream tables need backfills or versioned transformations?

This is one reason explicit event contracts matter so much. If events are vague, inconsistent, or overly tied to internal implementation details, downstream projections become brittle.

Good projection design assumes that:

events outlive current code structure
handlers may need to replay old data years later
read models will evolve several times over the lifetime of the system

The more disciplined the event vocabulary, the easier those changes are to manage.

When Not to Build Another Read Model

Once teams learn projections, there is a temptation to create a new read model for every question anyone asks.

That can go too far.

A new read model is worth it when:

the query is important and frequent
the access pattern differs materially from the write model
replay and maintenance cost are justified by the value
the projection boundary is clear

It is usually not worth it when:

one direct query against existing storage is enough
the use case is temporary or rarely used
the read model duplicates another projection with minimal benefit
the team cannot actually operate more derived infrastructure confidently

This is the same pragmatic rule that appears throughout Akka architecture. Powerful patterns are useful when they solve a real problem repeatedly. They become accidental complexity when introduced for architectural aesthetics.

A Good Mental Model for Lesson Fourteen

If lesson thirteen was about storing historical truth, lesson fourteen is about making that truth useful.

The progression should look like this:

Commands hit a state-owning entity.
The entity decides and persists domain events.
Projections consume those events.
Read models and integration pipelines turn them into usable outputs.

That separation gives you several advantages:

the write side stays focused on correctness
the read side stays optimized for access patterns
downstream consumers can evolve independently
rebuilds and reprocessing become possible
operational visibility becomes more deliberate

It also forces intellectual honesty. Once you introduce projections, you have to think seriously about eventual consistency, offsets, replay, and duplicate handling. Those are not annoyances around the edges. They are part of the architecture.

Summary

Read models exist because the event log is a source of truth, not a universal query interface. Projections exist because persisted events need a reliable path into dashboards, search systems, operational tables, and downstream integrations.

In Akka-based systems, that means treating projections as real infrastructure: they need offset management, idempotent handlers, replay-safe design, and clear boundaries between internal derived views and external side effects.

If you keep the write side authoritative and the projection layer deliberately boring, you get the main benefits of event-driven architecture without turning every consumer into a bespoke interpretation of history.

In the next lesson, we will move further into production reality: observability, testing, and operating Akka systems when mailboxes back up, consumers slow down, and you need confidence that actors, streams, and persistent behaviors are still behaving under pressure.

Read Models, Projections, and Integration Pipelines

Read Models, Projections, and Integration Pipelines

Why the Event Log Is Not the Whole Product

What a Read Model Actually Is

Projections Are the Bridge Between Events and Reads

A Concrete Scenario: Payment Events Feeding Three Consumers

A Small Event-Sourced Domain Example

Building a Payment Summary Read Model

Where Akka Projections Fits

Offsets Are Part of Your Correctness Story

At-Least-Once Means Duplicates Are Normal

Internal Read Models and External Integrations Are Not the Same

Internal Read Models

External Integration Pipelines

A Practical Pipeline Shape

Rebuilds Should Be Expected, Not Feared

Lag Is a Business and Operations Concern

Schema Evolution Requires Discipline

When Not to Build Another Read Model

A Good Mental Model for Lesson Fourteen

Summary

Comments

Feedback

Read Models, Projections, and Integration Pipelines

Why the Event Log Is Not the Whole Product

What a Read Model Actually Is

Projections Are the Bridge Between Events and Reads

A Concrete Scenario: Payment Events Feeding Three Consumers

A Small Event-Sourced Domain Example

Building a Payment Summary Read Model

Where Akka Projections Fits

Offsets Are Part of Your Correctness Story

At-Least-Once Means Duplicates Are Normal

Internal Read Models and External Integrations Are Not the Same

Internal Read Models

External Integration Pipelines

A Practical Pipeline Shape

Rebuilds Should Be Expected, Not Feared

Lag Is a Business and Operations Concern

Schema Evolution Requires Discipline

When Not to Build Another Read Model

A Good Mental Model for Lesson Fourteen

Summary

Comments

Feedback

Sign In

Sign Up

Reset Password

Sign Out

Reset Password