fix(lapis): health check: consider LAPIS "UP" even when SILO is down#1594

Open

fengelniederhammer wants to merge 1 commit intomainfrom

Contributor

fengelniederhammer commented Mar 12, 2026

PR Checklist

~~- [ ] All necessary documentation has been adapted.~~
~~- [ ] All necessary changes are explained in the llms.txt.~~
~~- [ ] The implemented feature is covered by an appropriate test.~~


          fix(lapis): health check: consider LAPIS "UP" even when SILO is down

02e9678

fengelniederhammer requested review from Copilot and fhennig

March 12, 2026 10:48

fengelniederhammer self-assigned this

vercel bot commented Mar 12, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
lapis	Ready	Preview, Comment	Mar 12, 2026 10:48am

Copilot started reviewing on behalf of fengelniederhammer

March 12, 2026 10:49

Copilot AI reviewed

View reviewed changes

Contributor

Copilot AI left a comment

Pull request overview

Adjusts LAPIS’ Spring Boot Actuator health reporting so that the service remains reported as UP even if the dependent SILO service is unavailable, while still surfacing SILO-related diagnostics in health details.

Changes:

Always returns Health.up() from SiloHealthIndicator.
Adds SILO diagnostic details (versions on success; siloStatus/error info on failure) instead of marking overall health as DOWN.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

lapis/src/main/kotlin/org/genspectrum/lapis/health/SiloHealthIndicator.kt

Comment on lines +25 to +37

+                                      .withDetail("siloStatus", "DOWN")
+                                      .withDetail("error", "SILO not reachable")
+                                      .withDetail("message", e.message)
+                              } catch (e: SiloUnavailableException) {
+                                  it
+                                      .withDetail("siloStatus", "DOWN")
+                                      .withDetail("error", "SILO unavailable (HTTP 503)")
+                                      .withDetail("message", e.message)
+                                      .withDetail("retryAfter", e.retryAfter)
+                              } catch (e: Exception) {
+                                  it
+                                      .withDetail("siloStatus", "DOWN")
+                                      .withDetail("error", "Unexpected error checking SILO")

Copilot AI Mar 12, 2026

"DOWN" is hard-coded multiple times for siloStatus. To avoid typos and keep alignment with Spring Actuator status codes, consider using the Actuator Status constants (e.g., Status.DOWN.code / Status.UP.code) or a single local constant.

Copilot uses AI. Check for mistakes.

lapis/src/main/kotlin/org/genspectrum/lapis/health/SiloHealthIndicator.kt

+                          .let {
+                              try {
+                                  val info = cachedSiloClient.callInfo()
+                                  it

Copilot AI Mar 12, 2026

When SILO is reachable, the response doesn’t include a siloStatus detail, but it is present (as "DOWN") in all failure cases. This makes the health JSON schema inconsistent for consumers; consider always setting siloStatus (e.g., "UP" on success, "DOWN" on failure) so clients don’t have to infer from missing fields.

Suggested change

      
                                it
          
                                it
          
                                    .withDetail("siloStatus", "UP")

Copilot uses AI. Check for mistakes.

lapis/src/main/kotlin/org/genspectrum/lapis/health/SiloHealthIndicator.kt

Comment on lines +15 to +41

+                      Health
+                          .up() // LAPIS should always be "up", independent of SILO.
+                          .let {
+                              try {
+                                  val info = cachedSiloClient.callInfo()
+                                  it
+                                      .withDetail("dataVersion", info.dataVersion)
+                                      .withDetail("siloVersion", info.siloVersion ?: "unknown")
+                              } catch (e: SiloNotReachableException) {
+                                  it
+                                      .withDetail("siloStatus", "DOWN")
+                                      .withDetail("error", "SILO not reachable")
+                                      .withDetail("message", e.message)
+                              } catch (e: SiloUnavailableException) {
+                                  it
+                                      .withDetail("siloStatus", "DOWN")
+                                      .withDetail("error", "SILO unavailable (HTTP 503)")
+                                      .withDetail("message", e.message)
+                                      .withDetail("retryAfter", e.retryAfter)
+                              } catch (e: Exception) {
+                                  it
+                                      .withDetail("siloStatus", "DOWN")
+                                      .withDetail("error", "Unexpected error checking SILO")
+                                      .withDetail("message", e.message)
+                              }
+                          }
+                          .build()

Copilot AI Mar 12, 2026

SiloHealthIndicator now always reports Spring Actuator status UP (even when SILO is down) and only encodes SILO failure via details. This is easy to misinterpret because /actuator/health components will still show this indicator as UP; if the intent is to keep overall LAPIS health UP, consider moving SILO checks into a separate health group/contributor (e.g., readiness) or otherwise returning a real DOWN status for SILO while excluding it from the liveness group.

Suggested change

      
                    Health
          
                        .up() // LAPIS should always be "up", independent of SILO.
          
                        .let {
          
                            try {
          
                                val info = cachedSiloClient.callInfo()
          
                                it
          
                                    .withDetail("dataVersion", info.dataVersion)
          
                                    .withDetail("siloVersion", info.siloVersion ?: "unknown")
          
                            } catch (e: SiloNotReachableException) {
          
                                it
          
                                    .withDetail("siloStatus", "DOWN")
          
                                    .withDetail("error", "SILO not reachable")
          
                                    .withDetail("message", e.message)
          
                            } catch (e: SiloUnavailableException) {
          
                                it
          
                                    .withDetail("siloStatus", "DOWN")
          
                                    .withDetail("error", "SILO unavailable (HTTP 503)")
          
                                    .withDetail("message", e.message)
          
                                    .withDetail("retryAfter", e.retryAfter)
          
                            } catch (e: Exception) {
          
                                it
          
                                    .withDetail("siloStatus", "DOWN")
          
                                    .withDetail("error", "Unexpected error checking SILO")
          
                                    .withDetail("message", e.message)
          
                            }
          
                        }
          
                        .build()
          
                    try {
          
                        val info = cachedSiloClient.callInfo()
          
                        Health.up()
          
                            .withDetail("dataVersion", info.dataVersion)
          
                            .withDetail("siloVersion", info.siloVersion ?: "unknown")
          
                            .build()
          
                    } catch (e: SiloNotReachableException) {
          
                        Health.down()
          
                            .withDetail("siloStatus", "DOWN")
          
                            .withDetail("error", "SILO not reachable")
          
                            .withDetail("message", e.message)
          
                            .build()
          
                    } catch (e: SiloUnavailableException) {
          
                        Health.down()
          
                            .withDetail("siloStatus", "DOWN")
          
                            .withDetail("error", "SILO unavailable (HTTP 503)")
          
                            .withDetail("message", e.message)
          
                            .withDetail("retryAfter", e.retryAfter)
          
                            .build()
          
                    } catch (e: Exception) {
          
                        Health.down()
          
                            .withDetail("siloStatus", "DOWN")
          
                            .withDetail("error", "Unexpected error checking SILO")
          
                            .withDetail("message", e.message)
          
                            .build()
          
                    }

Copilot uses AI. Check for mistakes.

Contributor

fhennig Mar 12, 2026 •

edited

Loading

This sounds valid; we're in the 'SiloHealthIndicator', shouldn't there be a LapisHealthIndicator somewhere maybe?

Edit: This was in response to what ChatGPT said above.

lapis/src/main/kotlin/org/genspectrum/lapis/health/SiloHealthIndicator.kt

Comment on lines +24 to +33

+                                  it
+                                      .withDetail("siloStatus", "DOWN")
+                                      .withDetail("error", "SILO not reachable")
+                                      .withDetail("message", e.message)
+                              } catch (e: SiloUnavailableException) {
+                                  it
+                                      .withDetail("siloStatus", "DOWN")
+                                      .withDetail("error", "SILO unavailable (HTTP 503)")
+                                      .withDetail("message", e.message)
+                                      .withDetail("retryAfter", e.retryAfter)

Copilot AI Mar 12, 2026

The health endpoint is configured to show-details=always and /actuator/** is publicly accessible, so including raw exception messages (e.message) here can leak internal connectivity details (URIs, exception classes, etc.). Consider removing message from the public health response or replacing it with a sanitized/static value and logging the full exception server-side instead.

Copilot uses AI. Check for mistakes.

fhennig reviewed

View reviewed changes

Contributor

fhennig left a comment

What do you think about having UP, DOWN and DEGRADED?

lapis/src/main/kotlin/org/genspectrum/lapis/health/SiloHealthIndicator.kt

Comment on lines +15 to +41

+                      Health
+                          .up() // LAPIS should always be "up", independent of SILO.
+                          .let {
+                              try {
+                                  val info = cachedSiloClient.callInfo()
+                                  it
+                                      .withDetail("dataVersion", info.dataVersion)
+                                      .withDetail("siloVersion", info.siloVersion ?: "unknown")
+                              } catch (e: SiloNotReachableException) {
+                                  it
+                                      .withDetail("siloStatus", "DOWN")
+                                      .withDetail("error", "SILO not reachable")
+                                      .withDetail("message", e.message)
+                              } catch (e: SiloUnavailableException) {
+                                  it
+                                      .withDetail("siloStatus", "DOWN")
+                                      .withDetail("error", "SILO unavailable (HTTP 503)")
+                                      .withDetail("message", e.message)
+                                      .withDetail("retryAfter", e.retryAfter)
+                              } catch (e: Exception) {
+                                  it
+                                      .withDetail("siloStatus", "DOWN")
+                                      .withDetail("error", "Unexpected error checking SILO")
+                                      .withDetail("message", e.message)
+                              }
+                          }
+                          .build()

Contributor

fhennig Mar 12, 2026 •

edited

Loading

This sounds valid; we're in the 'SiloHealthIndicator', shouldn't there be a LapisHealthIndicator somewhere maybe?

Edit: This was in response to what ChatGPT said above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet