<?xml version='1.0' encoding='utf-8'?>
<rfc category="info" docName="draft-hays-http-528-outbound-dependency-failed-00" ipr="trust200902" xml:lang="en" sortRefs="true" submissionType="IETF" consensus="true" symRefs="true" tocInclude="true" version="3">
  <front>
    <title abbrev="Http 528 Outbound Dependency Failed">
      Http 528 Outbound Dependency Failed
    </title>

        <author fullname="Jack Hays" initials="J." surname="Hays">
            <organization>PayPal, Inc.</organization>
            <address>
                <email>jackhays@paypal.com</email>
            </address>
        </author>

        <date year="2025" month="October" day="19"/>

        <abstract>
            <t>
                This document defines a new HTTP 5xx status code, 528 (Outbound Dependency Failed),
                used to indicate that a server received, parsed, and processed a request correctly,
                but could not complete it because a required downstream dependency malfunctioned
                or was in a non-transient failure state.
            </t>
            <t>
                Unlike 500 (Internal Server Error), which SHOULD be reserved for actual faults
                within the responding service (including improper error handling), 528 explicitly
                signals that the responding service operated as intended and the failure lies in a
                dependency. Unlike 503 (Service Unavailable), 528 does not imply a temporary
                condition expected to recover without intervention.
            </t>
        </abstract>
    </front>

    <middle>
        <section anchor="intro">
            <name>Introduction</name>
            <t>
                In distributed systems, services commonly rely on downstream resources (other services,
                data stores, message brokers) to fulfill client requests. When those dependencies fail,
                many implementations surface a generic 500 (Internal Server Error), conflating true
                internal faults with dependency failures and degrading observability and automated
                remediation.
            </t>
            <t>
                HTTP status codes such as 502 (Bad Gateway) and 504 (Gateway Timeout) describe failures
                in intermediary behavior on the inbound path, while 503 (Service Unavailable) communicates
                temporary unavailability. None precisely convey that the responding service behaved correctly
                yet could not complete the request due to a non-transient malfunction in an outbound dependency.
            </t>
            <t>
                This specification introduces 528 (Outbound Dependency Failed) to distinguish such conditions
                from internal failures (500) and from temporary unavailability (503).
            </t>
        </section>

        <section anchor="definition">
            <name>Definition</name>
            <t>
                The 528 (Outbound Dependency Failed) status code indicates that the server successfully
                received, understood, and executed its own application logic for the request, but could not
                complete the response because a required downstream or external dependency experienced a
                failure that is not presumed to be transient.
            </t>

            <figure>
                <name>Example 528 Response</name>
                <artwork>
HTTP/1.1 528 Outbound Dependency Failed
Content-Type: application/json

{
  "error": "OUTBOUND_DEPENDENCY_FAILED",
  "dependency": "user-profile-service",
  "dependency_status": 500,
  "message": "Downstream dependency returned HTTP 500",
  "correlation_id": "abc-123"
}
                </artwork>
            </figure>

            <t>The 528 code allows clients and operators to differentiate:</t>
            <ul>
                <li>500 — internal error within this service (including improper error handling)</li>
                <li>503 — temporary unavailability (retry may be appropriate)</li>
                <li>528 — this service operated correctly; a required downstream dependency failed in a condition not assumed to be temporary</li>
            </ul>
        </section>

        <section anchor="semantics">
            <name>Semantics and Use</name>
            <t>Servers SHOULD return 528 when all of the following are true:</t>
            <ul>
                <li>The server's own request handling logic executed as intended.</li>
                <li>Completion of the response required a call to a downstream or external dependency.</li>
                <li>The dependency failed (for example, returned a 5xx status, was misconfigured, was unreachable
                    with a non-transient cause, or produced invalid data) in a way that prevents successful completion.</li>
            </ul>
            <t>
                The 528 status code is not a signal of transient unavailability. Clients SHOULD NOT
                assume retry will succeed without operator or system intervention to restore the failing
                dependency. In contrast, 503 commonly represents a temporary condition for which retry with
                backoff may be appropriate.
            </t>
            <t>
                Implementations MAY include machine-readable fields identifying the affected dependency and any
                observed downstream status code or other relevant diagnostics. Care should be taken to avoid
                leaking sensitive topology details (see Security Considerations).
            </t>
        </section>

        <section anchor="interop">
            <name>Interoperability Notes</name>
            <t>
                502 (Bad Gateway) describes an intermediary receiving an invalid response from an upstream
                server on the inbound path. It does not semantically cover an origin service’s failure to
                complete due to its own outbound dependency malfunction.
            </t>
            <t>
                503 (Service Unavailable) is widely interpreted as a temporary condition (overload or maintenance)
                that may self-resolve. The 528 status code makes no such implication and is intended for dependency
                failures that typically require corrective action.
            </t>
        </section>

        <section anchor="ops">
            <name>Operational Guidance</name>
            <t>
                For automated remediation, a 528 response SHOULD trigger circuit-breaker transitions, alerting,
                or failover logic rather than blind immediate retries. Downstream recovery SHOULD be confirmed
                prior to resuming normal traffic patterns.
            </t>
            <t>
                Services SHOULD continue to use 500 exclusively for actual internal faults, including unhandled
                exceptions and improper catch/log/500 patterns. Misrouted 500 responses mask dependency failures
                and undermine SLO-driven operations.
            </t>
        </section>

        <section anchor="security">
            <name>Security Considerations</name>
            <t>
                Revealing dependency names, network locations, or detailed failure diagnostics in responses may
                expose internal topology. Servers SHOULD minimize sensitive details in public-facing responses and
                prefer correlation identifiers, with full diagnostics recorded in logs accessible to operators.
            </t>
        </section>

        <section anchor="contrib">
            <name>Conformance and Requirements Language</name>
            <t>
                The key words “MUST”, “MUST NOT”, “SHOULD”, “SHOULD NOT”, and “MAY” in this document are to be
                interpreted as described in BCP 14 <xref target="RFC8174"/> when, and only when, they appear in
                all capitals, as shown here.
            </t>
        </section>

        <section anchor="iana">
            <name>IANA Considerations</name>
            <t>
                This document requests the registration of the following entry in the “Hypertext Transfer Protocol
                (HTTP) Status Code Registry”:
            </t>
            <figure>
                <artwork>
Value: 528
Description: Outbound Dependency Failed
Reference: [this document]
                </artwork>
            </figure>
        </section>

        <section anchor="ack">
            <name>Acknowledgments</name>
            <t>
                The author thanks members of the IETF HTTP Working Group for discussions on status code semantics
                and the operational boundary between origin services and their dependencies.
            </t>
        </section>
    </middle>

    <back>
        <references>
            <name>Normative</name>

            <reference anchor="RFC9110" target="https://www.rfc-editor.org/rfc/rfc9110">
                <front>
                    <title>HTTP Semantics</title>
                    <author initials="R." surname="Fielding" fullname="Roy T. Fielding"/>
                    <author initials="J." surname="Reschke" fullname="Julian F. Reschke"/>
                    <date year="2022"/>
                </front>
            </reference>

            <reference anchor="RFC8174" target="https://www.rfc-editor.org/rfc/rfc8174">
                <front>
                    <title>Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words</title>
                    <author initials="B." surname="Leiba" fullname="Barry Leiba"/>
                    <date year="2017"/>
                </front>
            </reference>
        </references>
    </back>
</rfc>