ZIO in Production: Logging, Monitoring, and Deployment

Why Production Concerns Matter

You've built a ZIO application. It works perfectly on your machine. But production is different:

Debugging: When something breaks at 3 AM, you need logs
Performance: You need metrics to identify bottlenecks
Configuration: Different settings for dev, staging, and production
Reliability: Graceful shutdown prevents data loss
Observability: You can't fix what you can't see

This lesson transforms your ZIO application from "works on my machine" to production-ready.

Logging with ZIO

The Problem with Traditional Logging

// Traditional approach - manual, no structure
def processOrder(orderId: String): Unit = {
  println(s"Processing order $orderId")  // Lost in production
  // What thread? What timestamp? What level?
}

ZIO provides structured, type-safe logging built into the effect system.

Basic Logging

import zio._

object LoggingExample extends ZIOAppDefault {

  val program = for {
    _ <- ZIO.logInfo("Application started")
    _ <- ZIO.logDebug("Debug information")
    _ <- ZIO.logWarning("Something seems off")
    _ <- ZIO.logError("An error occurred")
  } yield ()

  def run = program
}

Output includes timestamp, level, and fiber information automatically.

Structured Logging

Add context to your logs:

def processOrder(orderId: String, userId: String): Task[Unit] = 
  ZIO.logSpan("process-order") {
    for {
      _ <- ZIO.logInfo(s"Processing order") @@ 
           ZIOAspect.annotated("orderId", orderId) @@
           ZIOAspect.annotated("userId", userId)

      _ <- validateOrder(orderId)
      _ <- chargeCustomer(userId)
      _ <- shipOrder(orderId)

      _ <- ZIO.logInfo("Order completed successfully")
    } yield ()
  }

All logs within the span include the annotations. This makes debugging distributed systems much easier.

Log Levels and Filtering

import zio.LogLevel

object ConfiguredLogging extends ZIOAppDefault {

  // Set minimum log level
  override val bootstrap = Runtime.setConfigProvider(
    ConfigProvider.fromMap(
      Map("logger.level" -> "INFO")
    )
  )

  val program = for {
    _ <- ZIO.logDebug("This won't appear (below INFO)")
    _ <- ZIO.logInfo("This appears")
    _ <- ZIO.logError("This definitely appears")
  } yield ()

  def run = program
}

Custom Logger

import zio.logging._

// JSON logger for production
val jsonLogger = Runtime.removeDefaultLoggers >>>
  consoleJsonLogger(
    LogFormat.default,
    LogLevel.Info
  )

object ProductionApp extends ZIOAppDefault {
  override val bootstrap = jsonLogger

  val program = for {
    _ <- ZIO.logInfo("Server started")
    _ <- ZIO.logInfo("Ready to accept connections") @@
         ZIOAspect.annotated("port", "8080")
  } yield ()

  def run = program
}

Output:

{"timestamp":"2024-01-15T10:30:00Z","level":"INFO","message":"Server started"}
{"timestamp":"2024-01-15T10:30:01Z","level":"INFO","message":"Ready to accept connections","port":"8080"}

Integration with SLF4J

For existing infrastructure:

import zio.logging.slf4j._

val slf4jLogger = Slf4j.slf4j

object LegacyIntegration extends ZIOAppDefault {
  override val bootstrap = Runtime.removeDefaultLoggers >>> slf4jLogger

  def run = ZIO.logInfo("Logs to SLF4J backend")
}

Now your ZIO logs go through Logback, Log4j, or any SLF4J backend.

Metrics and Monitoring

Why Metrics?

Logs tell you what happened. Metrics tell you how well it's happening:

Request latency
Error rates
Resource usage
Business metrics (orders/second, revenue, etc.)

Built-in Metrics

import zio.metrics._

def handleRequest(request: Request): Task[Response] = 
  (for {
    _ <- ZIO.succeed(())
    response <- processRequest(request)
  } yield response)
    @@ Metric.counter("http_requests_total")
    @@ Metric.gauge("active_requests")(1.0)

ZIO tracks these automatically.

Custom Metrics

// Counter: things that increase
val orderCounter = Metric.counter("orders_processed")

// Gauge: values that go up and down
val activeConnections = Metric.gauge("active_connections")

// Histogram: distribution of values
val requestDuration = Metric.histogram(
  "request_duration_seconds",
  Metric.Histogram.Boundaries.linear(0.0, 0.1, 10)
)

// Summary: quantiles
val responseSize = Metric.summary(
  "response_size_bytes",
  maxAge = 10.minutes,
  maxSize = 1000,
  error = 0.01,
  quantiles = Chunk(0.5, 0.9, 0.99)
)

Using Metrics

def processOrder(order: Order): Task[Unit] = 
  (for {
    start <- Clock.nanoTime
    _     <- validateOrder(order)
    _     <- saveOrder(order)
    end   <- Clock.nanoTime

    duration = (end - start).nanoseconds.toMillis
    _ <- ZIO.succeed(duration)
  } yield ())
    @@ orderCounter.increment
    @@ requestDuration.update(duration.toDouble / 1000.0)

Prometheus Integration

import zio.http._
import zio.metrics.connectors.prometheus._

object MetricsServer extends ZIOAppDefault {

  val metricsApp = Http.collect[Request] {
    case Method.GET -> Root / "metrics" =>
      prometheusRegistry.flatMap { registry =>
        ZIO.succeed(
          Response.text(registry.scrape)
        )
      }
  }

  val app = for {
    _ <- ZIO.logInfo("Starting metrics server on :9090")
    _ <- Server.serve(metricsApp).provide(
           Server.defaultWithPort(9090),
           prometheusLayer
         )
  } yield ()

  def run = app
}

Now Prometheus can scrape http://localhost:9090/metrics.

Health Checks

case class HealthStatus(
  database: Boolean,
  redis: Boolean,
  overallHealth: String
)

def checkHealth: Task[HealthStatus] = 
  for {
    dbHealth    <- checkDatabase.either.map(_.isRight)
    redisHealth <- checkRedis.either.map(_.isRight)

    overall = if (dbHealth && redisHealth) "healthy" 
              else if (dbHealth || redisHealth) "degraded"
              else "unhealthy"
  } yield HealthStatus(dbHealth, redisHealth, overall)

val healthEndpoint = Http.collect[Request] {
  case Method.GET -> Root / "health" =>
    checkHealth.map { status =>
      val code = status.overallHealth match {
        case "healthy" => Status.Ok
        case "degraded" => Status.Ok
        case "unhealthy" => Status.ServiceUnavailable
      }
      Response.json(status.toString).withStatus(code)
    }
}

Kubernetes and load balancers use this to route traffic.

Configuration Management

The Problem with Hard-Coded Config

object BadConfig {
  val dbHost = "localhost"        // Different in production!
  val dbPort = 5432               // Might change
  val apiKey = "secret123"        // NEVER hard-code secrets!
  val maxConnections = 10         // Should be configurable
}

ZIO Config

Type-safe configuration from environment variables, files, or system properties:

import zio.config._
import zio.config.magnolia._
import zio.config.typesafe._

case class DatabaseConfig(
  host: String,
  port: Int,
  database: String,
  username: String,
  password: String,
  maxConnections: Int
)

case class ServerConfig(
  host: String,
  port: Int
)

case class AppConfig(
  database: DatabaseConfig,
  server: ServerConfig
)

// Automatic derivation
val configDescriptor = descriptor[AppConfig]

Loading Configuration

val configLayer: Layer[ReadError[String], AppConfig] = 
  ZLayer {
    read(
      configDescriptor.from(
        ConfigSource.fromResourcePath
          .orElse(ConfigSource.fromSystemEnv)
      )
    )
  }

Using Configuration

def startApp: ZIO[AppConfig, Throwable, Unit] = 
  for {
    config <- ZIO.service[AppConfig]
    _      <- ZIO.logInfo(s"Starting server on ${config.server.host}:${config.server.port}")
    _      <- ZIO.logInfo(s"Connecting to database at ${config.database.host}")

    // Use config.database.maxConnections, etc.
  } yield ()

Configuration File (application.conf)

database {
  host = "localhost"
  host = ${?DB_HOST}

  port = 5432
  port = ${?DB_PORT}

  database = "myapp"
  username = "user"
  password = "pass"
  password = ${?DB_PASSWORD}

  max-connections = 10
}

server {
  host = "0.0.0.0"
  port = 8080
  port = ${?PORT}
}

Environment variables override defaults. Perfect for Docker and Kubernetes.

Validation

val validatedConfig = ZLayer {
  for {
    config <- read(configDescriptor.from(ConfigSource.fromResourcePath))

    // Validate
    _ <- ZIO.when(config.database.maxConnections < 1)(
           ZIO.fail(new IllegalArgumentException(
             "maxConnections must be positive"
           ))
         )

    _ <- ZIO.when(config.server.port < 1024 || config.server.port > 65535)(
           ZIO.fail(new IllegalArgumentException(
             "port must be between 1024 and 65535"
           ))
         )
  } yield config
}

Runtime Configuration

Custom Runtime

import zio.internal.stacktracer.Tracer

val customRuntime = Runtime.default.mapRuntimeConfig { config =>
  config
    .copy(
      flags = RuntimeConfigFlags(
        opSupervision = OpSupervision.Off,
        runtimeMetrics = RuntimeMetrics.On,
        fiberDump = FiberDump.Off
      ),
      tracing = Tracing.Enabled
    )
}

Thread Pool Configuration

import java.util.concurrent.Executors

val blockingExecutor = Executors.newCachedThreadPool()

val customBlockingLayer = ZLayer.succeed(
  Executor.fromJavaExecutor(blockingExecutor)
)

// Use for blocking operations
def blockingOp: Task[Unit] = 
  ZIO.blocking {
    ZIO.attempt {
      // Long-running blocking operation
      Thread.sleep(1000)
    }
  }

Fatal Error Handling

val runtime = Runtime.default.mapRuntimeConfig { config =>
  config.copy(
    fatal = cause => {
      // Log to external system
      println(s"FATAL: ${cause.prettyPrint}")
      // Notify ops team
      sendPagerDutyAlert(cause)
      // Default behavior
      Runtime.default.fatalErrorReporter(cause)
    }
  )
}

When a fiber dies from a defect, you get notified immediately.

Deployment Strategies

Graceful Shutdown

import zio.http._

object GracefulServer extends ZIOAppDefault {

  def app: Http[Any, Nothing, Request, Response] = ???

  val server = for {
    _ <- ZIO.logInfo("Starting HTTP server...")
    _ <- Server.serve(app)
  } yield ()

  val withGracefulShutdown = server.onInterrupt {
    for {
      _ <- ZIO.logInfo("Shutdown signal received")
      _ <- ZIO.logInfo("Finishing in-flight requests...")
      _ <- ZIO.sleep(5.seconds)  // Grace period
      _ <- ZIO.logInfo("Closing connections...")
      _ <- closeAllConnections
      _ <- ZIO.logInfo("Server stopped cleanly")
    } yield ()
  }

  def run = withGracefulShutdown
}

When Docker sends SIGTERM, the server finishes current requests before stopping.

Signal Handling

import sun.misc.{Signal, SignalHandler}

def installSignalHandlers(shutdown: UIO[Unit]): Task[Unit] = 
  ZIO.attempt {
    val handler = new SignalHandler {
      def handle(signal: Signal): Unit = {
        println(s"Received signal: ${signal.getName}")
        Unsafe.unsafe { implicit unsafe =>
          Runtime.default.unsafe.run(shutdown)
        }
      }
    }

    Signal.handle(new Signal("TERM"), handler)
    Signal.handle(new Signal("INT"), handler)
  }

Containerization with Docker

Dockerfile:

FROM hseeberger/scala-sbt:11.0.16_1.8.2_2.13.10 AS builder

WORKDIR /app
COPY . .
RUN sbt assembly

FROM openjdk:11-jre-slim

WORKDIR /app
COPY --from=builder /app/target/scala-2.13/myapp.jar .

# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
  CMD curl -f http://localhost:8080/health || exit 1

EXPOSE 8080

CMD ["java", "-jar", "myapp.jar"]

docker-compose.yml:

version: '3.8'

services:
  app:
    build: .
    ports:
      - "8080:8080"
    environment:
      - DB_HOST=postgres
      - DB_PORT=5432
      - DB_PASSWORD=${DB_PASSWORD}
      - LOG_LEVEL=INFO
    depends_on:
      - postgres
    restart: unless-stopped

  postgres:
    image: postgres:15
    environment:
      - POSTGRES_DB=myapp
      - POSTGRES_PASSWORD=${DB_PASSWORD}
    volumes:
      - postgres-data:/var/lib/postgresql/data

volumes:
  postgres-data:

Kubernetes Deployment

deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: zio-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: zio-app
  template:
    metadata:
      labels:
        app: zio-app
    spec:
      containers:
      - name: app
        image: myorg/zio-app:1.0.0
        ports:
        - containerPort: 8080
        env:
        - name: DB_HOST
          value: postgres-service
        - name: DB_PASSWORD
          valueFrom:
            secretKeyRef:
              name: db-secret
              key: password

        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10

        readinessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5

        resources:
          requests:
            memory: "256Mi"
            cpu: "200m"
          limits:
            memory: "512Mi"
            cpu: "500m"

Why separate liveness and readiness probes? Readiness stops traffic during startup. Liveness restarts crashed pods.

Complete Production-Ready Application

Let's put it all together:

import zio._
import zio.http._
import zio.metrics._
import zio.config._

case class AppConfig(
  server: ServerConfig,
  database: DatabaseConfig
)

trait UserService {
  def getUser(id: String): Task[User]
  def createUser(user: User): Task[Unit]
}

object ProductionApp extends ZIOAppDefault {

  // Configuration
  val configLayer: Layer[ReadError[String], AppConfig] = 
    ZLayer.fromZIO(loadConfig)

  // Services
  val databaseLayer: ZLayer[AppConfig, Throwable, DatabaseService] = ???
  val userServiceLayer: ZLayer[DatabaseService, Nothing, UserService] = ???

  // HTTP Routes
  val routes = Http.collectZIO[Request] {
    case Method.GET -> Root / "users" / id =>
      (for {
        user <- ZIO.serviceWithZIO[UserService](_.getUser(id))
        _    <- ZIO.succeed(()) @@ Metric.counter("user_requests").increment
      } yield Response.json(user.toJson))
        .catchAll { error =>
          ZIO.logError(s"Failed to get user: $error") *>
          ZIO.succeed(Response.status(Status.InternalServerError))
        }

    case Method.GET -> Root / "health" =>
      checkHealth.map { status =>
        Response.json(status.toJson)
      }

    case Method.GET -> Root / "metrics" =>
      ZIO.succeed(Response.text(getMetrics))
  }

  // Main application
  val app = for {
    config <- ZIO.service[AppConfig]
    _      <- ZIO.logInfo(s"Starting server on port ${config.server.port}")
    _      <- Server.serve(routes)
  } yield ()

  val withShutdown = app.onInterrupt {
    ZIO.logInfo("Graceful shutdown initiated") *>
    closeResources *>
    ZIO.logInfo("Shutdown complete")
  }

  def run = withShutdown.provide(
    configLayer,
    databaseLayer,
    userServiceLayer,
    Server.defaultWithPort(8080)
  )
}

Monitoring in Action

When you deploy this, you'll see:

Logs (JSON format):

{"timestamp":"2024-01-15T10:30:00Z","level":"INFO","message":"Starting server on port 8080"}
{"timestamp":"2024-01-15T10:30:05Z","level":"INFO","message":"Database connected","host":"postgres"}
{"timestamp":"2024-01-15T10:30:10Z","level":"INFO","message":"Request processed","userId":"123","duration_ms":45}

Metrics (Prometheus format):

# HELP user_requests_total Total user requests
# TYPE user_requests_total counter
user_requests_total 1523

# HELP request_duration_seconds Request duration in seconds
# TYPE request_duration_seconds histogram
request_duration_seconds_bucket{le="0.1"} 1234
request_duration_seconds_bucket{le="0.5"} 1500
request_duration_seconds_sum 678.9
request_duration_seconds_count 1523

Health Check:

{
  "status": "healthy",
  "database": true,
  "redis": true,
  "uptime_seconds": 3600
}

Best Practices

1. Always Log Structured Data

// Bad
ZIO.logInfo(s"User $userId created order $orderId")

// Good
ZIO.logInfo("Order created") @@
  ZIOAspect.annotated("userId", userId) @@
  ZIOAspect.annotated("orderId", orderId)

2. Set Appropriate Log Levels

DEBUG: Detailed flow for debugging
INFO: Normal application events
WARNING: Concerning but not critical
ERROR: Failures requiring attention

3. Monitor What Matters

Don't track everything. Focus on:

Request rate and latency (RED method)
Error rates
Resource usage (CPU, memory, connections)
Business metrics (orders, revenue, active users)

4. Use Health Checks Correctly

// Liveness: Is the app alive?
def livenessCheck: UIO[Boolean] = ZIO.succeed(true)

// Readiness: Can the app serve traffic?
def readinessCheck: Task[Boolean] = 
  checkDatabase *> checkRedis *> ZIO.succeed(true)

5. Externalize All Configuration

Never hard-code:

Hostnames
Ports
Credentials
Feature flags
Timeouts

6. Test Your Shutdown

test("graceful shutdown completes in-flight requests") {
  for {
    fiber <- longRunningRequest.fork
    _     <- ZIO.sleep(1.second)
    _     <- fiber.interrupt
    _     <- verifyRequestCompleted
  } yield assertTrue(true)
}

Common Production Issues

Memory Leaks

// Bad: accumulates forever
var cache: Map[String, User] = Map.empty

// Good: bounded cache
val cache = Ref.make(Map.empty[String, User]).flatMap { ref =>
  ZIO.succeed(new CacheService {
    def get(id: String): Task[Option[User]] = 
      ref.get.map(_.get(id))

    def put(id: String, user: User): Task[Unit] = 
      ref.update { cache =>
        if (cache.size >= 1000) cache.tail + (id -> user)
        else cache + (id -> user)
      }
  })
}

Connection Pool Exhaustion

val dbPool = ZLayer.scoped {
  ZIO.acquireRelease(
    createConnectionPool(
      minSize = 5,
      maxSize = 20,
      timeout = 30.seconds
    )
  )(pool => closePool(pool).orDie)
}

Cascading Failures

// Circuit breaker pattern
def callExternalService: Task[Response] = 
  makeRequest
    .timeout(5.seconds)
    .retry(Schedule.exponential(1.second) && Schedule.recurs(3))
    .catchAll { error =>
      ZIO.logError(s"Service unavailable: $error") *>
      ZIO.succeed(Response.serviceUnavailable)
    }

Key Takeaways

Logging: Structured, contextual logging makes debugging possible
Metrics: Track what matters for performance and reliability
Configuration: Externalize everything, validate early
Health Checks: Separate liveness from readiness
Graceful Shutdown: Always clean up resources properly
Observability: You can't fix what you can't see

What's Next?

You now know how to deploy ZIO applications to production with proper observability and reliability. In Lesson 10: Building a Complete ZIO Application, we'll combine everything you've learned to build a full-featured application from scratch—architecture, testing, deployment, and all.

Ready to build the capstone project? Let's finish strong!

Additional Resources

ZIO Logging Documentation
ZIO Metrics Documentation
ZIO Config Documentation
Production Best Practices
The Twelve-Factor App - Essential reading for production apps