Skip to content

fix(lapis): health check: consider LAPIS "UP" even when SILO is down#1594

Open
fengelniederhammer wants to merge 1 commit intomainfrom
upStatus
Open

fix(lapis): health check: consider LAPIS "UP" even when SILO is down#1594
fengelniederhammer wants to merge 1 commit intomainfrom
upStatus

Conversation

@fengelniederhammer
Copy link
Contributor

PR Checklist

- [ ] All necessary documentation has been adapted.
- [ ] All necessary changes are explained in the llms.txt.
- [ ] The implemented feature is covered by an appropriate test.

@vercel
Copy link

vercel bot commented Mar 12, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
lapis Ready Ready Preview, Comment Mar 12, 2026 10:48am

Request Review

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adjusts LAPIS’ Spring Boot Actuator health reporting so that the service remains reported as UP even if the dependent SILO service is unavailable, while still surfacing SILO-related diagnostics in health details.

Changes:

  • Always returns Health.up() from SiloHealthIndicator.
  • Adds SILO diagnostic details (versions on success; siloStatus/error info on failure) instead of marking overall health as DOWN.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +25 to +37
.withDetail("siloStatus", "DOWN")
.withDetail("error", "SILO not reachable")
.withDetail("message", e.message)
} catch (e: SiloUnavailableException) {
it
.withDetail("siloStatus", "DOWN")
.withDetail("error", "SILO unavailable (HTTP 503)")
.withDetail("message", e.message)
.withDetail("retryAfter", e.retryAfter)
} catch (e: Exception) {
it
.withDetail("siloStatus", "DOWN")
.withDetail("error", "Unexpected error checking SILO")
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"DOWN" is hard-coded multiple times for siloStatus. To avoid typos and keep alignment with Spring Actuator status codes, consider using the Actuator Status constants (e.g., Status.DOWN.code / Status.UP.code) or a single local constant.

Copilot uses AI. Check for mistakes.
.let {
try {
val info = cachedSiloClient.callInfo()
it
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When SILO is reachable, the response doesn’t include a siloStatus detail, but it is present (as "DOWN") in all failure cases. This makes the health JSON schema inconsistent for consumers; consider always setting siloStatus (e.g., "UP" on success, "DOWN" on failure) so clients don’t have to infer from missing fields.

Suggested change
it
it
.withDetail("siloStatus", "UP")

Copilot uses AI. Check for mistakes.
Comment on lines +15 to +41
Health
.up() // LAPIS should always be "up", independent of SILO.
.let {
try {
val info = cachedSiloClient.callInfo()
it
.withDetail("dataVersion", info.dataVersion)
.withDetail("siloVersion", info.siloVersion ?: "unknown")
} catch (e: SiloNotReachableException) {
it
.withDetail("siloStatus", "DOWN")
.withDetail("error", "SILO not reachable")
.withDetail("message", e.message)
} catch (e: SiloUnavailableException) {
it
.withDetail("siloStatus", "DOWN")
.withDetail("error", "SILO unavailable (HTTP 503)")
.withDetail("message", e.message)
.withDetail("retryAfter", e.retryAfter)
} catch (e: Exception) {
it
.withDetail("siloStatus", "DOWN")
.withDetail("error", "Unexpected error checking SILO")
.withDetail("message", e.message)
}
}
.build()
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SiloHealthIndicator now always reports Spring Actuator status UP (even when SILO is down) and only encodes SILO failure via details. This is easy to misinterpret because /actuator/health components will still show this indicator as UP; if the intent is to keep overall LAPIS health UP, consider moving SILO checks into a separate health group/contributor (e.g., readiness) or otherwise returning a real DOWN status for SILO while excluding it from the liveness group.

Suggested change
Health
.up() // LAPIS should always be "up", independent of SILO.
.let {
try {
val info = cachedSiloClient.callInfo()
it
.withDetail("dataVersion", info.dataVersion)
.withDetail("siloVersion", info.siloVersion ?: "unknown")
} catch (e: SiloNotReachableException) {
it
.withDetail("siloStatus", "DOWN")
.withDetail("error", "SILO not reachable")
.withDetail("message", e.message)
} catch (e: SiloUnavailableException) {
it
.withDetail("siloStatus", "DOWN")
.withDetail("error", "SILO unavailable (HTTP 503)")
.withDetail("message", e.message)
.withDetail("retryAfter", e.retryAfter)
} catch (e: Exception) {
it
.withDetail("siloStatus", "DOWN")
.withDetail("error", "Unexpected error checking SILO")
.withDetail("message", e.message)
}
}
.build()
try {
val info = cachedSiloClient.callInfo()
Health.up()
.withDetail("dataVersion", info.dataVersion)
.withDetail("siloVersion", info.siloVersion ?: "unknown")
.build()
} catch (e: SiloNotReachableException) {
Health.down()
.withDetail("siloStatus", "DOWN")
.withDetail("error", "SILO not reachable")
.withDetail("message", e.message)
.build()
} catch (e: SiloUnavailableException) {
Health.down()
.withDetail("siloStatus", "DOWN")
.withDetail("error", "SILO unavailable (HTTP 503)")
.withDetail("message", e.message)
.withDetail("retryAfter", e.retryAfter)
.build()
} catch (e: Exception) {
Health.down()
.withDetail("siloStatus", "DOWN")
.withDetail("error", "Unexpected error checking SILO")
.withDetail("message", e.message)
.build()
}

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@fhennig fhennig Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds valid; we're in the 'SiloHealthIndicator', shouldn't there be a LapisHealthIndicator somewhere maybe?

Edit: This was in response to what ChatGPT said above.

Comment on lines +24 to +33
it
.withDetail("siloStatus", "DOWN")
.withDetail("error", "SILO not reachable")
.withDetail("message", e.message)
} catch (e: SiloUnavailableException) {
it
.withDetail("siloStatus", "DOWN")
.withDetail("error", "SILO unavailable (HTTP 503)")
.withDetail("message", e.message)
.withDetail("retryAfter", e.retryAfter)
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The health endpoint is configured to show-details=always and /actuator/** is publicly accessible, so including raw exception messages (e.message) here can leak internal connectivity details (URIs, exception classes, etc.). Consider removing message from the public health response or replacing it with a sanitized/static value and logging the full exception server-side instead.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@fhennig fhennig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about having UP, DOWN and DEGRADED?

Comment on lines +15 to +41
Health
.up() // LAPIS should always be "up", independent of SILO.
.let {
try {
val info = cachedSiloClient.callInfo()
it
.withDetail("dataVersion", info.dataVersion)
.withDetail("siloVersion", info.siloVersion ?: "unknown")
} catch (e: SiloNotReachableException) {
it
.withDetail("siloStatus", "DOWN")
.withDetail("error", "SILO not reachable")
.withDetail("message", e.message)
} catch (e: SiloUnavailableException) {
it
.withDetail("siloStatus", "DOWN")
.withDetail("error", "SILO unavailable (HTTP 503)")
.withDetail("message", e.message)
.withDetail("retryAfter", e.retryAfter)
} catch (e: Exception) {
it
.withDetail("siloStatus", "DOWN")
.withDetail("error", "Unexpected error checking SILO")
.withDetail("message", e.message)
}
}
.build()
Copy link
Contributor

@fhennig fhennig Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds valid; we're in the 'SiloHealthIndicator', shouldn't there be a LapisHealthIndicator somewhere maybe?

Edit: This was in response to what ChatGPT said above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants