ZIO in Production: Logging, Monitoring, and Deployment
Why Production Concerns Matter
You've built a ZIO application. It works perfectly on your machine. But production is different:
- Debugging: When something breaks at 3 AM, you need logs
- Performance: You need metrics to identify bottlenecks
- Configuration: Different settings for dev, staging, and production
- Reliability: Graceful shutdown prevents data loss
- Observability: You can't fix what you can't see
This lesson transforms your ZIO application from "works on my machine" to production-ready.
Logging with ZIO
The Problem with Traditional Logging
// Traditional approach - manual, no structure
def processOrder(orderId: String): Unit = {
println(s"Processing order $orderId") // Lost in production
// What thread? What timestamp? What level?
}
ZIO provides structured, type-safe logging built into the effect system.
Basic Logging
import zio._
object LoggingExample extends ZIOAppDefault {
val program = for {
_ <- ZIO.logInfo("Application started")
_ <- ZIO.logDebug("Debug information")
_ <- ZIO.logWarning("Something seems off")
_ <- ZIO.logError("An error occurred")
} yield ()
def run = program
}
Output includes timestamp, level, and fiber information automatically.
Structured Logging
Add context to your logs:
def processOrder(orderId: String, userId: String): Task[Unit] =
ZIO.logSpan("process-order") {
for {
_ <- ZIO.logInfo(s"Processing order") @@
ZIOAspect.annotated("orderId", orderId) @@
ZIOAspect.annotated("userId", userId)
_ <- validateOrder(orderId)
_ <- chargeCustomer(userId)
_ <- shipOrder(orderId)
_ <- ZIO.logInfo("Order completed successfully")
} yield ()
}
All logs within the span include the annotations. This makes debugging distributed systems much easier.
Log Levels and Filtering
import zio.LogLevel
object ConfiguredLogging extends ZIOAppDefault {
// Set minimum log level
override val bootstrap = Runtime.setConfigProvider(
ConfigProvider.fromMap(
Map("logger.level" -> "INFO")
)
)
val program = for {
_ <- ZIO.logDebug("This won't appear (below INFO)")
_ <- ZIO.logInfo("This appears")
_ <- ZIO.logError("This definitely appears")
} yield ()
def run = program
}
Custom Logger
import zio.logging._
// JSON logger for production
val jsonLogger = Runtime.removeDefaultLoggers >>>
consoleJsonLogger(
LogFormat.default,
LogLevel.Info
)
object ProductionApp extends ZIOAppDefault {
override val bootstrap = jsonLogger
val program = for {
_ <- ZIO.logInfo("Server started")
_ <- ZIO.logInfo("Ready to accept connections") @@
ZIOAspect.annotated("port", "8080")
} yield ()
def run = program
}
Output:
{"timestamp":"2024-01-15T10:30:00Z","level":"INFO","message":"Server started"}
{"timestamp":"2024-01-15T10:30:01Z","level":"INFO","message":"Ready to accept connections","port":"8080"}
Integration with SLF4J
For existing infrastructure:
import zio.logging.slf4j._
val slf4jLogger = Slf4j.slf4j
object LegacyIntegration extends ZIOAppDefault {
override val bootstrap = Runtime.removeDefaultLoggers >>> slf4jLogger
def run = ZIO.logInfo("Logs to SLF4J backend")
}
Now your ZIO logs go through Logback, Log4j, or any SLF4J backend.
Metrics and Monitoring
Why Metrics?
Logs tell you what happened. Metrics tell you how well it's happening:
- Request latency
- Error rates
- Resource usage
- Business metrics (orders/second, revenue, etc.)
Built-in Metrics
import zio.metrics._
def handleRequest(request: Request): Task[Response] =
(for {
_ <- ZIO.succeed(())
response <- processRequest(request)
} yield response)
@@ Metric.counter("http_requests_total")
@@ Metric.gauge("active_requests")(1.0)
ZIO tracks these automatically.
Custom Metrics
// Counter: things that increase
val orderCounter = Metric.counter("orders_processed")
// Gauge: values that go up and down
val activeConnections = Metric.gauge("active_connections")
// Histogram: distribution of values
val requestDuration = Metric.histogram(
"request_duration_seconds",
Metric.Histogram.Boundaries.linear(0.0, 0.1, 10)
)
// Summary: quantiles
val responseSize = Metric.summary(
"response_size_bytes",
maxAge = 10.minutes,
maxSize = 1000,
error = 0.01,
quantiles = Chunk(0.5, 0.9, 0.99)
)
Using Metrics
def processOrder(order: Order): Task[Unit] =
(for {
start <- Clock.nanoTime
_ <- validateOrder(order)
_ <- saveOrder(order)
end <- Clock.nanoTime
duration = (end - start).nanoseconds.toMillis
_ <- ZIO.succeed(duration)
} yield ())
@@ orderCounter.increment
@@ requestDuration.update(duration.toDouble / 1000.0)
Prometheus Integration
import zio.http._
import zio.metrics.connectors.prometheus._
object MetricsServer extends ZIOAppDefault {
val metricsApp = Http.collect[Request] {
case Method.GET -> Root / "metrics" =>
prometheusRegistry.flatMap { registry =>
ZIO.succeed(
Response.text(registry.scrape)
)
}
}
val app = for {
_ <- ZIO.logInfo("Starting metrics server on :9090")
_ <- Server.serve(metricsApp).provide(
Server.defaultWithPort(9090),
prometheusLayer
)
} yield ()
def run = app
}
Now Prometheus can scrape http://localhost:9090/metrics.
Health Checks
case class HealthStatus(
database: Boolean,
redis: Boolean,
overallHealth: String
)
def checkHealth: Task[HealthStatus] =
for {
dbHealth <- checkDatabase.either.map(_.isRight)
redisHealth <- checkRedis.either.map(_.isRight)
overall = if (dbHealth && redisHealth) "healthy"
else if (dbHealth || redisHealth) "degraded"
else "unhealthy"
} yield HealthStatus(dbHealth, redisHealth, overall)
val healthEndpoint = Http.collect[Request] {
case Method.GET -> Root / "health" =>
checkHealth.map { status =>
val code = status.overallHealth match {
case "healthy" => Status.Ok
case "degraded" => Status.Ok
case "unhealthy" => Status.ServiceUnavailable
}
Response.json(status.toString).withStatus(code)
}
}
Kubernetes and load balancers use this to route traffic.
Configuration Management
The Problem with Hard-Coded Config
object BadConfig {
val dbHost = "localhost" // Different in production!
val dbPort = 5432 // Might change
val apiKey = "secret123" // NEVER hard-code secrets!
val maxConnections = 10 // Should be configurable
}
ZIO Config
Type-safe configuration from environment variables, files, or system properties:
import zio.config._
import zio.config.magnolia._
import zio.config.typesafe._
case class DatabaseConfig(
host: String,
port: Int,
database: String,
username: String,
password: String,
maxConnections: Int
)
case class ServerConfig(
host: String,
port: Int
)
case class AppConfig(
database: DatabaseConfig,
server: ServerConfig
)
// Automatic derivation
val configDescriptor = descriptor[AppConfig]
Loading Configuration
val configLayer: Layer[ReadError[String], AppConfig] =
ZLayer {
read(
configDescriptor.from(
ConfigSource.fromResourcePath
.orElse(ConfigSource.fromSystemEnv)
)
)
}
Using Configuration
def startApp: ZIO[AppConfig, Throwable, Unit] =
for {
config <- ZIO.service[AppConfig]
_ <- ZIO.logInfo(s"Starting server on ${config.server.host}:${config.server.port}")
_ <- ZIO.logInfo(s"Connecting to database at ${config.database.host}")
// Use config.database.maxConnections, etc.
} yield ()
Configuration File (application.conf)
database {
host = "localhost"
host = ${?DB_HOST}
port = 5432
port = ${?DB_PORT}
database = "myapp"
username = "user"
password = "pass"
password = ${?DB_PASSWORD}
max-connections = 10
}
server {
host = "0.0.0.0"
port = 8080
port = ${?PORT}
}
Environment variables override defaults. Perfect for Docker and Kubernetes.
Validation
val validatedConfig = ZLayer {
for {
config <- read(configDescriptor.from(ConfigSource.fromResourcePath))
// Validate
_ <- ZIO.when(config.database.maxConnections < 1)(
ZIO.fail(new IllegalArgumentException(
"maxConnections must be positive"
))
)
_ <- ZIO.when(config.server.port < 1024 || config.server.port > 65535)(
ZIO.fail(new IllegalArgumentException(
"port must be between 1024 and 65535"
))
)
} yield config
}
Runtime Configuration
Custom Runtime
import zio.internal.stacktracer.Tracer
val customRuntime = Runtime.default.mapRuntimeConfig { config =>
config
.copy(
flags = RuntimeConfigFlags(
opSupervision = OpSupervision.Off,
runtimeMetrics = RuntimeMetrics.On,
fiberDump = FiberDump.Off
),
tracing = Tracing.Enabled
)
}
Thread Pool Configuration
import java.util.concurrent.Executors
val blockingExecutor = Executors.newCachedThreadPool()
val customBlockingLayer = ZLayer.succeed(
Executor.fromJavaExecutor(blockingExecutor)
)
// Use for blocking operations
def blockingOp: Task[Unit] =
ZIO.blocking {
ZIO.attempt {
// Long-running blocking operation
Thread.sleep(1000)
}
}
Fatal Error Handling
val runtime = Runtime.default.mapRuntimeConfig { config =>
config.copy(
fatal = cause => {
// Log to external system
println(s"FATAL: ${cause.prettyPrint}")
// Notify ops team
sendPagerDutyAlert(cause)
// Default behavior
Runtime.default.fatalErrorReporter(cause)
}
)
}
When a fiber dies from a defect, you get notified immediately.
Deployment Strategies
Graceful Shutdown
import zio.http._
object GracefulServer extends ZIOAppDefault {
def app: Http[Any, Nothing, Request, Response] = ???
val server = for {
_ <- ZIO.logInfo("Starting HTTP server...")
_ <- Server.serve(app)
} yield ()
val withGracefulShutdown = server.onInterrupt {
for {
_ <- ZIO.logInfo("Shutdown signal received")
_ <- ZIO.logInfo("Finishing in-flight requests...")
_ <- ZIO.sleep(5.seconds) // Grace period
_ <- ZIO.logInfo("Closing connections...")
_ <- closeAllConnections
_ <- ZIO.logInfo("Server stopped cleanly")
} yield ()
}
def run = withGracefulShutdown
}
When Docker sends SIGTERM, the server finishes current requests before stopping.
Signal Handling
import sun.misc.{Signal, SignalHandler}
def installSignalHandlers(shutdown: UIO[Unit]): Task[Unit] =
ZIO.attempt {
val handler = new SignalHandler {
def handle(signal: Signal): Unit = {
println(s"Received signal: ${signal.getName}")
Unsafe.unsafe { implicit unsafe =>
Runtime.default.unsafe.run(shutdown)
}
}
}
Signal.handle(new Signal("TERM"), handler)
Signal.handle(new Signal("INT"), handler)
}
Containerization with Docker
Dockerfile:
FROM hseeberger/scala-sbt:11.0.16_1.8.2_2.13.10 AS builder
WORKDIR /app
COPY . .
RUN sbt assembly
FROM openjdk:11-jre-slim
WORKDIR /app
COPY --from=builder /app/target/scala-2.13/myapp.jar .
# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD curl -f http://localhost:8080/health || exit 1
EXPOSE 8080
CMD ["java", "-jar", "myapp.jar"]
docker-compose.yml:
version: '3.8'
services:
app:
build: .
ports:
- "8080:8080"
environment:
- DB_HOST=postgres
- DB_PORT=5432
- DB_PASSWORD=${DB_PASSWORD}
- LOG_LEVEL=INFO
depends_on:
- postgres
restart: unless-stopped
postgres:
image: postgres:15
environment:
- POSTGRES_DB=myapp
- POSTGRES_PASSWORD=${DB_PASSWORD}
volumes:
- postgres-data:/var/lib/postgresql/data
volumes:
postgres-data:
Kubernetes Deployment
deployment.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: zio-app
spec:
replicas: 3
selector:
matchLabels:
app: zio-app
template:
metadata:
labels:
app: zio-app
spec:
containers:
- name: app
image: myorg/zio-app:1.0.0
ports:
- containerPort: 8080
env:
- name: DB_HOST
value: postgres-service
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: db-secret
key: password
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
resources:
requests:
memory: "256Mi"
cpu: "200m"
limits:
memory: "512Mi"
cpu: "500m"
Why separate liveness and readiness probes? Readiness stops traffic during startup. Liveness restarts crashed pods.
Complete Production-Ready Application
Let's put it all together:
import zio._
import zio.http._
import zio.metrics._
import zio.config._
case class AppConfig(
server: ServerConfig,
database: DatabaseConfig
)
trait UserService {
def getUser(id: String): Task[User]
def createUser(user: User): Task[Unit]
}
object ProductionApp extends ZIOAppDefault {
// Configuration
val configLayer: Layer[ReadError[String], AppConfig] =
ZLayer.fromZIO(loadConfig)
// Services
val databaseLayer: ZLayer[AppConfig, Throwable, DatabaseService] = ???
val userServiceLayer: ZLayer[DatabaseService, Nothing, UserService] = ???
// HTTP Routes
val routes = Http.collectZIO[Request] {
case Method.GET -> Root / "users" / id =>
(for {
user <- ZIO.serviceWithZIO[UserService](_.getUser(id))
_ <- ZIO.succeed(()) @@ Metric.counter("user_requests").increment
} yield Response.json(user.toJson))
.catchAll { error =>
ZIO.logError(s"Failed to get user: $error") *>
ZIO.succeed(Response.status(Status.InternalServerError))
}
case Method.GET -> Root / "health" =>
checkHealth.map { status =>
Response.json(status.toJson)
}
case Method.GET -> Root / "metrics" =>
ZIO.succeed(Response.text(getMetrics))
}
// Main application
val app = for {
config <- ZIO.service[AppConfig]
_ <- ZIO.logInfo(s"Starting server on port ${config.server.port}")
_ <- Server.serve(routes)
} yield ()
val withShutdown = app.onInterrupt {
ZIO.logInfo("Graceful shutdown initiated") *>
closeResources *>
ZIO.logInfo("Shutdown complete")
}
def run = withShutdown.provide(
configLayer,
databaseLayer,
userServiceLayer,
Server.defaultWithPort(8080)
)
}
Monitoring in Action
When you deploy this, you'll see:
Logs (JSON format):
{"timestamp":"2024-01-15T10:30:00Z","level":"INFO","message":"Starting server on port 8080"}
{"timestamp":"2024-01-15T10:30:05Z","level":"INFO","message":"Database connected","host":"postgres"}
{"timestamp":"2024-01-15T10:30:10Z","level":"INFO","message":"Request processed","userId":"123","duration_ms":45}
Metrics (Prometheus format):
# HELP user_requests_total Total user requests
# TYPE user_requests_total counter
user_requests_total 1523
# HELP request_duration_seconds Request duration in seconds
# TYPE request_duration_seconds histogram
request_duration_seconds_bucket{le="0.1"} 1234
request_duration_seconds_bucket{le="0.5"} 1500
request_duration_seconds_sum 678.9
request_duration_seconds_count 1523
Health Check:
{
"status": "healthy",
"database": true,
"redis": true,
"uptime_seconds": 3600
}
Best Practices
1. Always Log Structured Data
// Bad
ZIO.logInfo(s"User $userId created order $orderId")
// Good
ZIO.logInfo("Order created") @@
ZIOAspect.annotated("userId", userId) @@
ZIOAspect.annotated("orderId", orderId)
2. Set Appropriate Log Levels
- DEBUG: Detailed flow for debugging
- INFO: Normal application events
- WARNING: Concerning but not critical
- ERROR: Failures requiring attention
3. Monitor What Matters
Don't track everything. Focus on:
- Request rate and latency (RED method)
- Error rates
- Resource usage (CPU, memory, connections)
- Business metrics (orders, revenue, active users)
4. Use Health Checks Correctly
// Liveness: Is the app alive?
def livenessCheck: UIO[Boolean] = ZIO.succeed(true)
// Readiness: Can the app serve traffic?
def readinessCheck: Task[Boolean] =
checkDatabase *> checkRedis *> ZIO.succeed(true)
5. Externalize All Configuration
Never hard-code:
- Hostnames
- Ports
- Credentials
- Feature flags
- Timeouts
6. Test Your Shutdown
test("graceful shutdown completes in-flight requests") {
for {
fiber <- longRunningRequest.fork
_ <- ZIO.sleep(1.second)
_ <- fiber.interrupt
_ <- verifyRequestCompleted
} yield assertTrue(true)
}
Common Production Issues
Memory Leaks
// Bad: accumulates forever
var cache: Map[String, User] = Map.empty
// Good: bounded cache
val cache = Ref.make(Map.empty[String, User]).flatMap { ref =>
ZIO.succeed(new CacheService {
def get(id: String): Task[Option[User]] =
ref.get.map(_.get(id))
def put(id: String, user: User): Task[Unit] =
ref.update { cache =>
if (cache.size >= 1000) cache.tail + (id -> user)
else cache + (id -> user)
}
})
}
Connection Pool Exhaustion
val dbPool = ZLayer.scoped {
ZIO.acquireRelease(
createConnectionPool(
minSize = 5,
maxSize = 20,
timeout = 30.seconds
)
)(pool => closePool(pool).orDie)
}
Cascading Failures
// Circuit breaker pattern
def callExternalService: Task[Response] =
makeRequest
.timeout(5.seconds)
.retry(Schedule.exponential(1.second) && Schedule.recurs(3))
.catchAll { error =>
ZIO.logError(s"Service unavailable: $error") *>
ZIO.succeed(Response.serviceUnavailable)
}
Key Takeaways
- Logging: Structured, contextual logging makes debugging possible
- Metrics: Track what matters for performance and reliability
- Configuration: Externalize everything, validate early
- Health Checks: Separate liveness from readiness
- Graceful Shutdown: Always clean up resources properly
- Observability: You can't fix what you can't see
What's Next?
You now know how to deploy ZIO applications to production with proper observability and reliability. In Lesson 10: Building a Complete ZIO Application, we'll combine everything you've learned to build a full-featured application from scratch—architecture, testing, deployment, and all.
Ready to build the capstone project? Let's finish strong!
Additional Resources
- ZIO Logging Documentation
- ZIO Metrics Documentation
- ZIO Config Documentation
- Production Best Practices
- The Twelve-Factor App - Essential reading for production apps
Comments
Be the first to comment on this lesson!