ZIO in Production: Logging, Monitoring, and Deployment

Why Production Concerns Matter

You've built a ZIO application. It works perfectly on your machine. But production is different:

  • Debugging: When something breaks at 3 AM, you need logs
  • Performance: You need metrics to identify bottlenecks
  • Configuration: Different settings for dev, staging, and production
  • Reliability: Graceful shutdown prevents data loss
  • Observability: You can't fix what you can't see

This lesson transforms your ZIO application from "works on my machine" to production-ready.

Logging with ZIO

The Problem with Traditional Logging

// Traditional approach - manual, no structure
def processOrder(orderId: String): Unit = {
  println(s"Processing order $orderId")  // Lost in production
  // What thread? What timestamp? What level?
}

ZIO provides structured, type-safe logging built into the effect system.

Basic Logging

import zio._

object LoggingExample extends ZIOAppDefault {

  val program = for {
    _ <- ZIO.logInfo("Application started")
    _ <- ZIO.logDebug("Debug information")
    _ <- ZIO.logWarning("Something seems off")
    _ <- ZIO.logError("An error occurred")
  } yield ()

  def run = program
}

Output includes timestamp, level, and fiber information automatically.

Structured Logging

Add context to your logs:

def processOrder(orderId: String, userId: String): Task[Unit] = 
  ZIO.logSpan("process-order") {
    for {
      _ <- ZIO.logInfo(s"Processing order") @@ 
           ZIOAspect.annotated("orderId", orderId) @@
           ZIOAspect.annotated("userId", userId)

      _ <- validateOrder(orderId)
      _ <- chargeCustomer(userId)
      _ <- shipOrder(orderId)

      _ <- ZIO.logInfo("Order completed successfully")
    } yield ()
  }

All logs within the span include the annotations. This makes debugging distributed systems much easier.

Log Levels and Filtering

import zio.LogLevel

object ConfiguredLogging extends ZIOAppDefault {

  // Set minimum log level
  override val bootstrap = Runtime.setConfigProvider(
    ConfigProvider.fromMap(
      Map("logger.level" -> "INFO")
    )
  )

  val program = for {
    _ <- ZIO.logDebug("This won't appear (below INFO)")
    _ <- ZIO.logInfo("This appears")
    _ <- ZIO.logError("This definitely appears")
  } yield ()

  def run = program
}

Custom Logger

import zio.logging._

// JSON logger for production
val jsonLogger = Runtime.removeDefaultLoggers >>>
  consoleJsonLogger(
    LogFormat.default,
    LogLevel.Info
  )

object ProductionApp extends ZIOAppDefault {
  override val bootstrap = jsonLogger

  val program = for {
    _ <- ZIO.logInfo("Server started")
    _ <- ZIO.logInfo("Ready to accept connections") @@
         ZIOAspect.annotated("port", "8080")
  } yield ()

  def run = program
}

Output:

{"timestamp":"2024-01-15T10:30:00Z","level":"INFO","message":"Server started"}
{"timestamp":"2024-01-15T10:30:01Z","level":"INFO","message":"Ready to accept connections","port":"8080"}

Integration with SLF4J

For existing infrastructure:

import zio.logging.slf4j._

val slf4jLogger = Slf4j.slf4j

object LegacyIntegration extends ZIOAppDefault {
  override val bootstrap = Runtime.removeDefaultLoggers >>> slf4jLogger

  def run = ZIO.logInfo("Logs to SLF4J backend")
}

Now your ZIO logs go through Logback, Log4j, or any SLF4J backend.

Metrics and Monitoring

Why Metrics?

Logs tell you what happened. Metrics tell you how well it's happening:

  • Request latency
  • Error rates
  • Resource usage
  • Business metrics (orders/second, revenue, etc.)

Built-in Metrics

import zio.metrics._

def handleRequest(request: Request): Task[Response] = 
  (for {
    _ <- ZIO.succeed(())
    response <- processRequest(request)
  } yield response)
    @@ Metric.counter("http_requests_total")
    @@ Metric.gauge("active_requests")(1.0)

ZIO tracks these automatically.

Custom Metrics

// Counter: things that increase
val orderCounter = Metric.counter("orders_processed")

// Gauge: values that go up and down
val activeConnections = Metric.gauge("active_connections")

// Histogram: distribution of values
val requestDuration = Metric.histogram(
  "request_duration_seconds",
  Metric.Histogram.Boundaries.linear(0.0, 0.1, 10)
)

// Summary: quantiles
val responseSize = Metric.summary(
  "response_size_bytes",
  maxAge = 10.minutes,
  maxSize = 1000,
  error = 0.01,
  quantiles = Chunk(0.5, 0.9, 0.99)
)

Using Metrics

def processOrder(order: Order): Task[Unit] = 
  (for {
    start <- Clock.nanoTime
    _     <- validateOrder(order)
    _     <- saveOrder(order)
    end   <- Clock.nanoTime

    duration = (end - start).nanoseconds.toMillis
    _ <- ZIO.succeed(duration)
  } yield ())
    @@ orderCounter.increment
    @@ requestDuration.update(duration.toDouble / 1000.0)

Prometheus Integration

import zio.http._
import zio.metrics.connectors.prometheus._

object MetricsServer extends ZIOAppDefault {

  val metricsApp = Http.collect[Request] {
    case Method.GET -> Root / "metrics" =>
      prometheusRegistry.flatMap { registry =>
        ZIO.succeed(
          Response.text(registry.scrape)
        )
      }
  }

  val app = for {
    _ <- ZIO.logInfo("Starting metrics server on :9090")
    _ <- Server.serve(metricsApp).provide(
           Server.defaultWithPort(9090),
           prometheusLayer
         )
  } yield ()

  def run = app
}

Now Prometheus can scrape http://localhost:9090/metrics.

Health Checks

case class HealthStatus(
  database: Boolean,
  redis: Boolean,
  overallHealth: String
)

def checkHealth: Task[HealthStatus] = 
  for {
    dbHealth    <- checkDatabase.either.map(_.isRight)
    redisHealth <- checkRedis.either.map(_.isRight)

    overall = if (dbHealth && redisHealth) "healthy" 
              else if (dbHealth || redisHealth) "degraded"
              else "unhealthy"
  } yield HealthStatus(dbHealth, redisHealth, overall)

val healthEndpoint = Http.collect[Request] {
  case Method.GET -> Root / "health" =>
    checkHealth.map { status =>
      val code = status.overallHealth match {
        case "healthy" => Status.Ok
        case "degraded" => Status.Ok
        case "unhealthy" => Status.ServiceUnavailable
      }
      Response.json(status.toString).withStatus(code)
    }
}

Kubernetes and load balancers use this to route traffic.

Configuration Management

The Problem with Hard-Coded Config

object BadConfig {
  val dbHost = "localhost"        // Different in production!
  val dbPort = 5432               // Might change
  val apiKey = "secret123"        // NEVER hard-code secrets!
  val maxConnections = 10         // Should be configurable
}

ZIO Config

Type-safe configuration from environment variables, files, or system properties:

import zio.config._
import zio.config.magnolia._
import zio.config.typesafe._

case class DatabaseConfig(
  host: String,
  port: Int,
  database: String,
  username: String,
  password: String,
  maxConnections: Int
)

case class ServerConfig(
  host: String,
  port: Int
)

case class AppConfig(
  database: DatabaseConfig,
  server: ServerConfig
)

// Automatic derivation
val configDescriptor = descriptor[AppConfig]

Loading Configuration

val configLayer: Layer[ReadError[String], AppConfig] = 
  ZLayer {
    read(
      configDescriptor.from(
        ConfigSource.fromResourcePath
          .orElse(ConfigSource.fromSystemEnv)
      )
    )
  }

Using Configuration

def startApp: ZIO[AppConfig, Throwable, Unit] = 
  for {
    config <- ZIO.service[AppConfig]
    _      <- ZIO.logInfo(s"Starting server on ${config.server.host}:${config.server.port}")
    _      <- ZIO.logInfo(s"Connecting to database at ${config.database.host}")

    // Use config.database.maxConnections, etc.
  } yield ()

Configuration File (application.conf)

database {
  host = "localhost"
  host = ${?DB_HOST}

  port = 5432
  port = ${?DB_PORT}

  database = "myapp"
  username = "user"
  password = "pass"
  password = ${?DB_PASSWORD}

  max-connections = 10
}

server {
  host = "0.0.0.0"
  port = 8080
  port = ${?PORT}
}

Environment variables override defaults. Perfect for Docker and Kubernetes.

Validation

val validatedConfig = ZLayer {
  for {
    config <- read(configDescriptor.from(ConfigSource.fromResourcePath))

    // Validate
    _ <- ZIO.when(config.database.maxConnections < 1)(
           ZIO.fail(new IllegalArgumentException(
             "maxConnections must be positive"
           ))
         )

    _ <- ZIO.when(config.server.port < 1024 || config.server.port > 65535)(
           ZIO.fail(new IllegalArgumentException(
             "port must be between 1024 and 65535"
           ))
         )
  } yield config
}

Runtime Configuration

Custom Runtime

import zio.internal.stacktracer.Tracer

val customRuntime = Runtime.default.mapRuntimeConfig { config =>
  config
    .copy(
      flags = RuntimeConfigFlags(
        opSupervision = OpSupervision.Off,
        runtimeMetrics = RuntimeMetrics.On,
        fiberDump = FiberDump.Off
      ),
      tracing = Tracing.Enabled
    )
}

Thread Pool Configuration

import java.util.concurrent.Executors

val blockingExecutor = Executors.newCachedThreadPool()

val customBlockingLayer = ZLayer.succeed(
  Executor.fromJavaExecutor(blockingExecutor)
)

// Use for blocking operations
def blockingOp: Task[Unit] = 
  ZIO.blocking {
    ZIO.attempt {
      // Long-running blocking operation
      Thread.sleep(1000)
    }
  }

Fatal Error Handling

val runtime = Runtime.default.mapRuntimeConfig { config =>
  config.copy(
    fatal = cause => {
      // Log to external system
      println(s"FATAL: ${cause.prettyPrint}")
      // Notify ops team
      sendPagerDutyAlert(cause)
      // Default behavior
      Runtime.default.fatalErrorReporter(cause)
    }
  )
}

When a fiber dies from a defect, you get notified immediately.

Deployment Strategies

Graceful Shutdown

import zio.http._

object GracefulServer extends ZIOAppDefault {

  def app: Http[Any, Nothing, Request, Response] = ???

  val server = for {
    _ <- ZIO.logInfo("Starting HTTP server...")
    _ <- Server.serve(app)
  } yield ()

  val withGracefulShutdown = server.onInterrupt {
    for {
      _ <- ZIO.logInfo("Shutdown signal received")
      _ <- ZIO.logInfo("Finishing in-flight requests...")
      _ <- ZIO.sleep(5.seconds)  // Grace period
      _ <- ZIO.logInfo("Closing connections...")
      _ <- closeAllConnections
      _ <- ZIO.logInfo("Server stopped cleanly")
    } yield ()
  }

  def run = withGracefulShutdown
}

When Docker sends SIGTERM, the server finishes current requests before stopping.

Signal Handling

import sun.misc.{Signal, SignalHandler}

def installSignalHandlers(shutdown: UIO[Unit]): Task[Unit] = 
  ZIO.attempt {
    val handler = new SignalHandler {
      def handle(signal: Signal): Unit = {
        println(s"Received signal: ${signal.getName}")
        Unsafe.unsafe { implicit unsafe =>
          Runtime.default.unsafe.run(shutdown)
        }
      }
    }

    Signal.handle(new Signal("TERM"), handler)
    Signal.handle(new Signal("INT"), handler)
  }

Containerization with Docker

Dockerfile:

FROM hseeberger/scala-sbt:11.0.16_1.8.2_2.13.10 AS builder

WORKDIR /app
COPY . .
RUN sbt assembly

FROM openjdk:11-jre-slim

WORKDIR /app
COPY --from=builder /app/target/scala-2.13/myapp.jar .

# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
  CMD curl -f http://localhost:8080/health || exit 1

EXPOSE 8080

CMD ["java", "-jar", "myapp.jar"]

docker-compose.yml:

version: '3.8'

services:
  app:
    build: .
    ports:
      - "8080:8080"
    environment:
      - DB_HOST=postgres
      - DB_PORT=5432
      - DB_PASSWORD=${DB_PASSWORD}
      - LOG_LEVEL=INFO
    depends_on:
      - postgres
    restart: unless-stopped

  postgres:
    image: postgres:15
    environment:
      - POSTGRES_DB=myapp
      - POSTGRES_PASSWORD=${DB_PASSWORD}
    volumes:
      - postgres-data:/var/lib/postgresql/data

volumes:
  postgres-data:

Kubernetes Deployment

deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: zio-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: zio-app
  template:
    metadata:
      labels:
        app: zio-app
    spec:
      containers:
      - name: app
        image: myorg/zio-app:1.0.0
        ports:
        - containerPort: 8080
        env:
        - name: DB_HOST
          value: postgres-service
        - name: DB_PASSWORD
          valueFrom:
            secretKeyRef:
              name: db-secret
              key: password

        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10

        readinessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5

        resources:
          requests:
            memory: "256Mi"
            cpu: "200m"
          limits:
            memory: "512Mi"
            cpu: "500m"

Why separate liveness and readiness probes? Readiness stops traffic during startup. Liveness restarts crashed pods.

Complete Production-Ready Application

Let's put it all together:

import zio._
import zio.http._
import zio.metrics._
import zio.config._

case class AppConfig(
  server: ServerConfig,
  database: DatabaseConfig
)

trait UserService {
  def getUser(id: String): Task[User]
  def createUser(user: User): Task[Unit]
}

object ProductionApp extends ZIOAppDefault {

  // Configuration
  val configLayer: Layer[ReadError[String], AppConfig] = 
    ZLayer.fromZIO(loadConfig)

  // Services
  val databaseLayer: ZLayer[AppConfig, Throwable, DatabaseService] = ???
  val userServiceLayer: ZLayer[DatabaseService, Nothing, UserService] = ???

  // HTTP Routes
  val routes = Http.collectZIO[Request] {
    case Method.GET -> Root / "users" / id =>
      (for {
        user <- ZIO.serviceWithZIO[UserService](_.getUser(id))
        _    <- ZIO.succeed(()) @@ Metric.counter("user_requests").increment
      } yield Response.json(user.toJson))
        .catchAll { error =>
          ZIO.logError(s"Failed to get user: $error") *>
          ZIO.succeed(Response.status(Status.InternalServerError))
        }

    case Method.GET -> Root / "health" =>
      checkHealth.map { status =>
        Response.json(status.toJson)
      }

    case Method.GET -> Root / "metrics" =>
      ZIO.succeed(Response.text(getMetrics))
  }

  // Main application
  val app = for {
    config <- ZIO.service[AppConfig]
    _      <- ZIO.logInfo(s"Starting server on port ${config.server.port}")
    _      <- Server.serve(routes)
  } yield ()

  val withShutdown = app.onInterrupt {
    ZIO.logInfo("Graceful shutdown initiated") *>
    closeResources *>
    ZIO.logInfo("Shutdown complete")
  }

  def run = withShutdown.provide(
    configLayer,
    databaseLayer,
    userServiceLayer,
    Server.defaultWithPort(8080)
  )
}

Monitoring in Action

When you deploy this, you'll see:

Logs (JSON format):

{"timestamp":"2024-01-15T10:30:00Z","level":"INFO","message":"Starting server on port 8080"}
{"timestamp":"2024-01-15T10:30:05Z","level":"INFO","message":"Database connected","host":"postgres"}
{"timestamp":"2024-01-15T10:30:10Z","level":"INFO","message":"Request processed","userId":"123","duration_ms":45}

Metrics (Prometheus format):

# HELP user_requests_total Total user requests
# TYPE user_requests_total counter
user_requests_total 1523

# HELP request_duration_seconds Request duration in seconds
# TYPE request_duration_seconds histogram
request_duration_seconds_bucket{le="0.1"} 1234
request_duration_seconds_bucket{le="0.5"} 1500
request_duration_seconds_sum 678.9
request_duration_seconds_count 1523

Health Check:

{
  "status": "healthy",
  "database": true,
  "redis": true,
  "uptime_seconds": 3600
}

Best Practices

1. Always Log Structured Data

// Bad
ZIO.logInfo(s"User $userId created order $orderId")

// Good
ZIO.logInfo("Order created") @@
  ZIOAspect.annotated("userId", userId) @@
  ZIOAspect.annotated("orderId", orderId)

2. Set Appropriate Log Levels

  • DEBUG: Detailed flow for debugging
  • INFO: Normal application events
  • WARNING: Concerning but not critical
  • ERROR: Failures requiring attention

3. Monitor What Matters

Don't track everything. Focus on:

  • Request rate and latency (RED method)
  • Error rates
  • Resource usage (CPU, memory, connections)
  • Business metrics (orders, revenue, active users)

4. Use Health Checks Correctly

// Liveness: Is the app alive?
def livenessCheck: UIO[Boolean] = ZIO.succeed(true)

// Readiness: Can the app serve traffic?
def readinessCheck: Task[Boolean] = 
  checkDatabase *> checkRedis *> ZIO.succeed(true)

5. Externalize All Configuration

Never hard-code:

  • Hostnames
  • Ports
  • Credentials
  • Feature flags
  • Timeouts

6. Test Your Shutdown

test("graceful shutdown completes in-flight requests") {
  for {
    fiber <- longRunningRequest.fork
    _     <- ZIO.sleep(1.second)
    _     <- fiber.interrupt
    _     <- verifyRequestCompleted
  } yield assertTrue(true)
}

Common Production Issues

Memory Leaks

// Bad: accumulates forever
var cache: Map[String, User] = Map.empty

// Good: bounded cache
val cache = Ref.make(Map.empty[String, User]).flatMap { ref =>
  ZIO.succeed(new CacheService {
    def get(id: String): Task[Option[User]] = 
      ref.get.map(_.get(id))

    def put(id: String, user: User): Task[Unit] = 
      ref.update { cache =>
        if (cache.size >= 1000) cache.tail + (id -> user)
        else cache + (id -> user)
      }
  })
}

Connection Pool Exhaustion

val dbPool = ZLayer.scoped {
  ZIO.acquireRelease(
    createConnectionPool(
      minSize = 5,
      maxSize = 20,
      timeout = 30.seconds
    )
  )(pool => closePool(pool).orDie)
}

Cascading Failures

// Circuit breaker pattern
def callExternalService: Task[Response] = 
  makeRequest
    .timeout(5.seconds)
    .retry(Schedule.exponential(1.second) && Schedule.recurs(3))
    .catchAll { error =>
      ZIO.logError(s"Service unavailable: $error") *>
      ZIO.succeed(Response.serviceUnavailable)
    }

Key Takeaways

  • Logging: Structured, contextual logging makes debugging possible
  • Metrics: Track what matters for performance and reliability
  • Configuration: Externalize everything, validate early
  • Health Checks: Separate liveness from readiness
  • Graceful Shutdown: Always clean up resources properly
  • Observability: You can't fix what you can't see

What's Next?

You now know how to deploy ZIO applications to production with proper observability and reliability. In Lesson 10: Building a Complete ZIO Application, we'll combine everything you've learned to build a full-featured application from scratch—architecture, testing, deployment, and all.

Ready to build the capstone project? Let's finish strong!

Additional Resources