Guide · GitOps

GitOps for MCP Servers — ArgoCD, Flux, and Automated Deployment Pipelines

GitOps turns your Git repository into the authoritative source of truth for your MCP server's cluster state. Every configuration change goes through a pull request, ArgoCD or Flux reconciles the live cluster to match, and PostSync hooks verify the MCP protocol is intact after every deployment. This guide covers the complete setup from Application manifests to secrets management and continuous external monitoring.

TL;DR

GitOps treats your MCP server's Kubernetes manifests as the source of truth in Git. ArgoCD or Flux watches the repository and continuously syncs the live cluster state to match whatever is committed. A PostSync hook runs an MCP protocol probe immediately after every sync, verifying that the deployed server responds correctly to initialize requests. If the hook fails, ArgoCD marks the sync as Failed and the application appears degraded in the UI — no silently broken deployments. AliveMCP provides continuous external monitoring between syncs, catching failures that appear hours after a successful deploy: rotated secrets, memory leaks on new nodes, certificate expiry, and upstream API changes that no PostSync hook ever sees.

GitOps principles applied to MCP servers

GitOps is the practice of using a Git repository as the single source of truth for the desired state of a system. For a Kubernetes-hosted MCP server this means that every resource — Deployment, ConfigMap, HorizontalPodAutoscaler, Ingress, and even Secrets references — lives in a version-controlled repository. No human should ever run kubectl apply directly against a production cluster. Instead, a developer opens a pull request that changes a manifest, a reviewer approves it, it merges to the main branch, and a GitOps controller running inside the cluster detects the difference and applies it automatically.

This model has concrete benefits for MCP server operations. Rollbacks are a single git revert followed by a merge, not a frantic sequence of manual kubectl commands at 2 AM. The Git history is a complete audit trail of every cluster change, with the author, timestamp, and review thread attached. Staging and production environments share the same manifest structure but point to different directories or branches, making promotion predictable and reviewable.

The traditional alternative — push-based CI/CD where a GitHub Actions workflow or Jenkins pipeline runs kubectl apply directly — works, but it introduces a set of problems that compound as teams and clusters grow. The table below compares the two approaches specifically in the context of MCP server deployments:

Dimension	Push-based CI/CD (GitHub Actions, Jenkins)	Pull-based GitOps (ArgoCD, Flux)
Access model	CI runner holds a long-lived kubeconfig credential with write access to the cluster. That credential must be stored in CI secrets and rotated manually.	The GitOps controller runs inside the cluster and pulls changes from Git. No external system ever holds cluster credentials.
Drift detection	None by default. If someone runs a manual `kubectl edit` in production, no pipeline knows or reverts it. The drift is discovered the next time a deploy overwrites it — or never.	Continuous. The controller reconciles every few minutes (ArgoCD) or on every observed Git push (Flux). Manual changes to live resources are reverted automatically when `selfHeal` is enabled.
Rollback mechanism	Re-run an older pipeline, or manually kubectl-apply an older manifest. Both require intervention and are error-prone under pressure.	`git revert <commit>` followed by a merge. The controller applies the reverted state within seconds. Full audit trail preserved.
Audit trail	CI logs record what the pipeline did, but the connection between a Git commit and a cluster change depends on how carefully the pipeline is written.	Every cluster change is traceable to a Git commit, PR, and code review. ArgoCD and Flux both record sync history with Git SHA references.
Multi-cluster support	Each cluster needs its own credentials stored in CI. Adding a cluster means updating CI configuration in multiple places.	ArgoCD's ApplicationSet and Flux's multi-tenancy model let you declaratively manage dozens of clusters from a single control plane.

For MCP servers specifically, the drift detection property is especially valuable. MCP servers often carry fine-grained configuration — tool definitions, allowed origins, rate-limit parameters — in ConfigMaps or environment variables. A developer who "just quickly" edits a ConfigMap in production to test something creates a divergence that may not be noticed until the next planned deployment overwrites it (or doesn't, if the change is never committed). With GitOps and selfHeal: true, that drift is reverted automatically, and the developer gets a clear signal: if you want this change, commit it.

ArgoCD Application for an MCP server

An ArgoCD Application is a Kubernetes custom resource that tells ArgoCD where to find manifests and where to apply them. Here is a production-ready Application for an MCP server deployment:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: mcp-server
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/myorg/mcp-server-infra
    targetRevision: main
    path: kubernetes/mcp-server
  destination:
    server: https://kubernetes.default.svc
    namespace: production
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true
    retry:
      limit: 5
      backoff:
        duration: 5s
        factor: 2
        maxDuration: 3m

Let's break down the key fields. The source block points to the Git repository and path where the Kubernetes manifests live. targetRevision: main means ArgoCD tracks the main branch; for a more controlled promotion workflow you might use a tag or a separate production branch. The destination block tells ArgoCD which cluster and namespace to deploy into — https://kubernetes.default.svc is the in-cluster API server endpoint, used when ArgoCD itself runs in the same cluster as the MCP server.

The syncPolicy.automated section enables automatic sync — ArgoCD will apply changes from Git without requiring a human to click "Sync" in the UI. The prune: true flag means resources that exist in the cluster but have been removed from Git will be deleted, keeping the cluster clean. Without pruning, deleted manifests leave orphaned resources behind.

The selfHeal: true flag is the one that prevents configuration drift. If an engineer manually changes a Pod template spec, a ConfigMap value, or an environment variable directly in the production cluster, ArgoCD detects the divergence within its reconciliation cycle (typically under two minutes) and reverts the live resource back to what Git says it should be. For MCP servers this is critical: the protocol behavior is sensitive to configuration. An environment variable like MCP_MAX_TOOLS=50 changed to 500 in production can cause subtle behavioral differences that are hard to trace without a full audit trail.

The retry block handles transient failures — a temporary API server hiccup, a webhook admission timeout, or a brief network partition between ArgoCD and the Git host. The exponential backoff (5s, 10s, 20s, …, capping at 3m) means five retry attempts before the sync is marked as permanently failed. For MCP servers deployed in shared clusters where other workloads can briefly saturate the API server, this tolerance is important.

The repository at github.com/myorg/mcp-server-infra would typically contain a kubernetes/mcp-server/ directory with a kustomization.yaml and a set of manifest files: deployment.yaml, service.yaml, configmap.yaml, hpa.yaml, and ingress.yaml. ArgoCD natively understands Kustomize, so if the directory contains a kustomization.yaml, ArgoCD runs kustomize build automatically before applying.

ArgoCD PostSync hook for MCP protocol verification

A PostSync hook is a Kubernetes Job annotated with argocd.argoproj.io/hook: PostSync. ArgoCD runs it after all resources in the sync wave have been applied and have reached a healthy state. If the hook job fails, ArgoCD marks the entire sync as Failed, even though the resources themselves were applied successfully. This means the application shows as degraded in the ArgoCD UI, and any alerting you have on application health fires immediately.

For MCP servers the most useful PostSync hook is an MCP protocol probe — a lightweight Job that sends an initialize JSON-RPC request to the newly deployed server and checks that the response contains a valid protocolVersion:

apiVersion: batch/v1
kind: Job
metadata:
  name: mcp-protocol-verify
  annotations:
    argocd.argoproj.io/hook: PostSync
    argocd.argoproj.io/hook-delete-policy: BeforeHookCreation
spec:
  template:
    spec:
      containers:
      - name: mcp-probe
        image: curlimages/curl:8.6.0
        command: [sh, -c]
        args:
          - |
            sleep 10
            curl -sf -X POST http://mcp-server.production.svc.cluster.local:3000/ \
              -H 'Content-Type: application/json' \
              -d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","clientInfo":{"name":"argocd-postsync","version":"1.0"}}}' \
              | grep -q '"protocolVersion"' \
              && echo "MCP protocol sync verified" \
              || exit 1
      restartPolicy: Never

The argocd.argoproj.io/hook-delete-policy: BeforeHookCreation annotation ensures that the Job from the previous sync run is cleaned up before the new one is created. Without this, successive syncs would accumulate Job resources in the namespace. You could alternatively use HookSucceeded to clean up after a passing probe, but BeforeHookCreation makes it easier to inspect the logs from the most recent run if a sync fails.

The ten-second sleep at the start of the probe command is intentional. Even after Kubernetes reports a Pod as Running and its readiness probe has passed, there is a brief window where the MCP server process is initializing its tool registry, loading configuration, and establishing connections to upstream APIs. Sending the probe too early can produce a false failure. Ten seconds is conservative for most MCP servers; if your server has a longer startup path you might increase this to 30 seconds.

The probe itself is minimal by design. It sends a standard initialize request and checks for the presence of "protocolVersion" in the response. A valid response from any MCP server should include this field. If the server is returning HTTP 500, a JSON parse error, or the wrong content type, grep -q will not match and the probe exits with code 1, failing the Job. If you want a stricter check, you can validate the exact protocol version value or confirm specific tools are present in the response.

When this hook fails, the ArgoCD dashboard shows the MCP server application with a red health status and a sync status of Failed. Anyone watching the application — including ArgoCD's built-in notification controller or a Slack integration — sees the failure immediately. The previous running Pods are not affected; Kubernetes's Deployment rolling update already completed. What ArgoCD's failure signals is that the deployment succeeded at the infrastructure level but the MCP protocol is not responding correctly. That distinction matters: it directs the on-call engineer to look at application logs and MCP initialization errors rather than infrastructure provisioning.

Flux CD alternative

Flux CD takes a different architectural approach from ArgoCD. Where ArgoCD is a monolithic controller with a web UI, Flux is a set of small, composable controllers — the Source Controller, the Kustomize Controller, the Helm Controller, and the Image Automation Controller — each managing a narrow concern. Teams that prefer a CLI-first workflow, want tighter integration with Kubernetes RBAC for multi-tenant clusters, or are running in environments where a rich web UI is a liability tend to gravitate toward Flux.

Setting up GitOps for an MCP server in Flux requires two resources: a GitRepository that defines the source, and a Kustomization that defines what to apply from that source and where:

apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
  name: mcp-server-infra
  namespace: flux-system
spec:
  interval: 1m
  url: https://github.com/myorg/mcp-server-infra
  ref:
    branch: main
---
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: mcp-server
  namespace: flux-system
spec:
  interval: 5m
  path: ./kubernetes/mcp-server
  prune: true
  sourceRef:
    kind: GitRepository
    name: mcp-server-infra
  healthChecks:
    - apiVersion: apps/v1
      kind: Deployment
      name: mcp-server
      namespace: production

The GitRepository polls the Git remote every minute (interval: 1m) and caches the latest commit. The Kustomization re-runs every five minutes, but it also reacts to changes in the GitRepository — so in practice a merge to main triggers a reconciliation within about a minute rather than waiting for the five-minute polling cycle.

The healthChecks field is one of Flux's most useful features for MCP server deployments. It instructs the Kustomize Controller to wait for the named Deployment rollout to complete before marking the Kustomization as Ready. Flux evaluates health using the same logic as kubectl rollout status: it waits for the desired number of Pods to be Running, pass their readiness probe, and for the Deployment's observed generation to match the spec generation. Only then does the Kustomization transition to Ready: True.

This means that a failed rollout — where the new MCP server Pods crash on startup, fail their readiness probe, or hit an image pull error — causes the Kustomization to remain in a degraded state with a descriptive error message. Flux's notification controller can forward this status to Slack, PagerDuty, or any webhook. You get the same safety guarantee as ArgoCD's PostSync hook, but via Kubernetes native health checking rather than a custom probe Job.

Flux also supports PreSync-equivalent behavior through a feature called dependency ordering: if you have a separate Kustomization for database migrations, you can declare dependsOn: [{name: mcp-db-migrations}] in the application Kustomization, and Flux will not apply the application manifests until the migration Kustomization is Ready. This is the correct pattern for MCP servers that require schema changes before the new application version can start.

Image automation with Flux

One of the most powerful features of Flux for continuous delivery is its image automation subsystem. Without image automation, the typical GitOps workflow requires a CI pipeline to open a pull request updating the image tag in the manifest repository whenever a new Docker image is built. This works, but it adds friction: the CI pipeline needs repository write access, the PR needs to be merged, and there is a delay between image publication and deployment.

Flux's image automation creates a direct feedback loop: when a new image is pushed to a container registry, Flux detects it, commits an update to the infra repository with the new image tag, and then immediately reconciles the cluster to use the updated tag. The CI pipeline only needs to push the image to the registry — it never touches the Git repository or the cluster directly.

Setting up image automation requires three resources. An ImageRepository watches the container registry for new tags (not shown here for brevity, as it mainly requires the registry URL and optional credentials). An ImagePolicy defines the selection criteria for which tag to use:

apiVersion: image.toolkit.fluxcd.io/v1beta2
kind: ImagePolicy
metadata:
  name: mcp-server
  namespace: flux-system
spec:
  imageRepositoryRef:
    name: mcp-server
  policy:
    semver:
      range: '>=1.0.0'

This policy selects the highest stable semver tag at or above 1.0.0. You could alternatively use range: '>=1.2.0 <2.0.0' to constrain to patch and minor updates within a major version, which is appropriate for production MCP servers where a major version bump might include breaking changes to the tool API.

The ImageUpdateAutomation resource ties the policy to the infra repository. When Flux detects a new tag that satisfies the policy, it updates the image reference in the Deployment manifest file and pushes a commit to the repo. The commit message is configurable; a typical format is chore(image): update mcp-server to v1.3.7, which gives a clear signal in the Git log of what changed and why.

This architecture neatly solves the credentials problem that plagues push-based CD pipelines for MCP servers. Your CI environment — GitHub Actions, CircleCI, whatever you use — needs only permission to push images to your container registry. It never holds a kubeconfig, a service account token, or any other Kubernetes credential. The cluster credentials stay entirely within the cluster, managed by Flux. This is a meaningful security improvement, especially for teams running MCP servers that have access to sensitive external APIs or internal data systems.

One important operational consideration: when Flux's image automation commits an image tag update, the commit appears in the infra repo's history without a human author. You should configure branch protection rules to allow Flux's service account to push directly to the update branch (which feeds into main via a merge, or updates main directly if you use Flux's commit-to-main mode). Some teams prefer to have Flux open a pull request for every image update, preserving the review step; others trust their CI test suite enough to let Flux commit directly. For MCP servers in production, the pull request model gives you one extra check before the deploy.

Secrets in a GitOps workflow

The single most common mistake teams make when adopting GitOps is committing secrets to the infra repository. Since the whole point of GitOps is that everything is in Git, there is a natural temptation to put Kubernetes Secret manifests in the repo alongside the Deployment and ConfigMap. Kubernetes Secret resources are base64-encoded, not encrypted — anyone with read access to the repository can decode them in seconds. For MCP servers that hold API keys for language models, database credentials, or OAuth client secrets, this is a critical vulnerability.

The correct approach is to commit only a reference to the secret, not the secret itself. There are two widely-used solutions: Sealed Secrets (which encrypts the secret with a cluster-specific key, making the encrypted form safe to commit) and External Secrets Operator (which stores secrets in an external vault — AWS Secrets Manager, HashiCorp Vault, Google Secret Manager — and creates Kubernetes Secrets by pulling values at runtime).

External Secrets Operator is generally preferred for teams already using a cloud secrets service, which most production MCP server deployments will be. The ExternalSecret manifest is safe to commit to Git because it contains only metadata: the name of the external secret and the mapping between external properties and Kubernetes Secret keys:

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: mcp-server-secrets
  namespace: production
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secrets-manager
    kind: ClusterSecretStore
  target:
    name: mcp-server-env
  data:
    - secretKey: API_KEY
      remoteRef:
        key: production/mcp-server
        property: api_key

This manifest creates a Kubernetes Secret named mcp-server-env in the production namespace. The actual value of API_KEY is fetched from AWS Secrets Manager at the path production/mcp-server, property api_key. The refreshInterval: 1h means the External Secrets Operator re-reads the value from AWS every hour. If your security team rotates the API key in AWS Secrets Manager, the new value will be reflected in the Kubernetes Secret within an hour, and if your MCP server watches for secret changes (via a file mount or a SIGHUP handler) it can pick up the new value without a redeployment.

There is an important operational implication here that ties directly to the monitoring discussion below: the refreshInterval creates a window where a secret rotation in AWS can silently break your MCP server in ways that no GitOps sync event will detect. The ExternalSecret updates, the Kubernetes Secret updates, but the MCP server Pods may not restart. If the application reads the secret only at startup, it will continue using the old value until the next Pod restart. If the new secret value is wrong (a typo in the rotation, a misconfigured IAM policy, a key that was rotated to a new version without updating the AWS secret), the MCP server will fail to authenticate to its upstream API — but only for new connections, and only intermittently until old Pods are cycled. This is exactly the class of failure that continuous external probing is designed to catch.

AliveMCP as an external health signal in GitOps pipelines

PostSync hooks and Flux health checks are excellent tools for validating a deployment at the moment it happens. They answer the question: "Was the deployment successful?" But they do not answer the question that matters at 3 AM: "Is the MCP server working right now?"

The gap between these two questions is larger than it might appear. Consider the failure modes that emerge after a successful sync:

Secret rotation: As described above, AWS Secrets Manager rotates an API key. The ExternalSecret refreshes within an hour. The MCP server Pods are not restarted. The next request that requires the rotated key fails. The PostSync hook from six hours ago passed — it was testing the state of the cluster at sync time, not now.

Node replacement: A cloud provider replaces a node in your cluster due to underlying hardware maintenance. The new node has a slightly different kernel version. Your MCP server has a native dependency (an NPM package with a C++ extension, for example) that exhibits a memory leak on the new kernel. The Deployment is healthy by all Kubernetes metrics — Pods are Running, readiness probes pass, CPU and memory are nominally fine — but over twelve hours the memory leak grows until the OOM killer terminates the process. Between restarts the MCP server is unavailable for seconds to minutes.

Certificate expiry: A TLS certificate for an upstream API your MCP server depends on expires. The MCP server itself is running and responding to health checks, but all tool calls that use that upstream API fail with a certificate error. The GitOps sync history is clean — nothing has changed in Git.

Upstream API changes: An external service your MCP server wraps pushes a breaking API change without notice. Tool calls begin returning errors. The MCP server process is healthy, the sync is clean, but users cannot use the tool.

None of these failures produce a GitOps sync event. None of them are caught by a PostSync hook. They are caught by an external probe that continuously tests the full MCP protocol stack from outside the cluster.

AliveMCP probes your MCP server's endpoint every minute, sending real MCP protocol requests — including initialize and optionally tools/list — and verifying the responses. It measures response latency, checks for protocol-level errors, and detects transport failures like dropped connections and TLS handshake failures. When a probe fails, AliveMCP pages you immediately via Slack, PagerDuty, email, or webhook.

The practical workflow that combines GitOps with AliveMCP monitoring looks like this: ArgoCD or Flux provides visibility into sync state — is the cluster running what Git says it should run? AliveMCP provides visibility into protocol health — is the MCP server actually responding correctly to clients right now? These are two orthogonal signals that answer different questions.

When AliveMCP fires an alert, the first diagnostic step is to check the ArgoCD Application status. If the application is OutOfSync, someone made an unauthorized change or a recent automatic sync failed — start there. If the application is Synced and Healthy, the problem is not a deployment or configuration drift issue; it is a runtime failure in one of the categories above. That narrows the investigation significantly: look at Pod logs, check ExternalSecret sync events, review upstream API status pages.

You can surface both signals in a single operations dashboard. ArgoCD exposes application health and sync status via its API and the argocd app get command, making it straightforward to include in a Grafana dashboard alongside AliveMCP's uptime metrics. Many teams also embed AliveMCP's status badge in their internal runbooks and incident channels alongside the ArgoCD application link, so the on-call engineer sees both signals simultaneously when an alert fires.

A practical addition to your ArgoCD PostSync hook is a step that reports the sync event to AliveMCP via its API. This creates a timeline correlation: when you look at AliveMCP's response-time graphs, you can see a vertical marker at each GitOps sync event. If a latency spike or error rate increase lines up with a sync event, the sync is a likely cause. If the anomaly appears between syncs, you are dealing with a runtime failure unrelated to deployment.

Frequently asked questions

Should I use ArgoCD or Flux for MCP server deployments?

Both tools handle MCP server GitOps deployments well, and the choice depends more on your team's existing tooling and preferences than on any technical deficiency in either. ArgoCD has a richer web UI with clear visual representation of application health, sync status, and resource trees — it is better for teams that want visibility without writing custom dashboards. ArgoCD is also the more common choice in organizations that operate a small number of clusters where a centralized GitOps controller makes sense. Flux is GitOps-native in a stricter sense — it was designed from the ground up for pull-based reconciliation and integrates more tightly with Kubernetes RBAC for multi-tenant clusters. Flux's image automation subsystem is more mature than ArgoCD's equivalent. If your team already uses Flux for other workloads, adding MCP server management is straightforward. If you are starting fresh and want a good UI out of the box, ArgoCD is the easier on-ramp. Both support the PostSync hook pattern (ArgoCD natively; Flux via dependency ordering and health checks), secrets management via External Secrets Operator, and all the other patterns discussed in this guide.

How do I handle database migrations in a GitOps deployment?

Database migrations in a GitOps workflow require careful ordering: the migration must succeed before the new MCP server version that depends on the migrated schema is deployed. In ArgoCD, use a PreSync hook — a Kubernetes Job annotated with argocd.argoproj.io/hook: PreSync. The Job runs your migration tool (Flyway, Liquibase, Alembic, or a custom script) and must complete successfully before ArgoCD applies the Deployment update. If the migration job fails, ArgoCD does not proceed with the sync, and the running MCP server Pods (using the old schema-compatible version) continue to serve traffic. In Flux, use dependency ordering: create a separate Kustomization for the migration Job with its own health check, and declare it in the dependsOn field of the application Kustomization. The migration Kustomization must reach Ready: True before Flux applies the application manifests. In both cases, make your migrations backward-compatible where possible — the old application version should continue to function on the new schema for at least one deployment cycle, giving you a safe rollback window.

How do I promote from staging to production in a GitOps workflow?

The standard pattern is to maintain separate directories in the infra repository: kubernetes/staging/ and kubernetes/production/. Each directory has its own ArgoCD Application or Flux Kustomization pointing to it. Both environments share the same base manifests via Kustomize overlays, with the overlay for each environment providing environment-specific values (replicas, resource limits, ingress hostnames, and image tags). Promotion works by updating the image tag in the production overlay to match the tag that has been validated in staging. This update is made via a pull request — the PR is the promotion gate. A human reviewer (or an automated gate that checks staging's AliveMCP uptime and test suite results) approves the PR, it merges, and the GitOps controller deploys the new version to production. Never use latest as the image tag in production; it makes rollbacks ambiguous and defeats the reproducibility goal of GitOps. Use explicit semver tags or SHA-based tags so that the exact image deployed in production is always traceable to a specific build and commit.

What happens if ArgoCD cannot reach the Git repo for sync?

ArgoCD continues operating with the last successfully synchronized state. The application's sync status transitions to Unknown because ArgoCD cannot determine whether Git has changed, but the live Kubernetes resources — your MCP server Pods, Service, ConfigMap — are not modified. Running Pods are not restarted. No rollback happens. ArgoCD will alert via its notification controller (if configured) that the Git repository is unreachable, but the MCP server continues serving traffic using whatever configuration was last applied. This is the correct behavior: a temporary Git host outage should not take down your production MCP server. The implication for operations is that AliveMCP's external monitoring continues to probe and report during the window when ArgoCD's sync status is unknown, giving you confidence that the MCP server is still healthy even though the GitOps control plane is degraded. When the Git connectivity is restored, ArgoCD automatically resumes polling and will sync any changes that accumulated during the outage.

How do I pause automatic GitOps syncing during a production incident?

To disable ArgoCD's automated sync for an MCP server application without deleting the application or its resources, run argocd app set mcp-server --sync-policy none. This removes the automated sync policy, leaving the live cluster state unchanged. You can then apply emergency fixes directly with kubectl apply or argocd app set without ArgoCD reverting them. Once the incident is resolved, commit the fix to the infra repository, verify it looks correct in a PR review, and re-enable automated sync with argocd app set mcp-server --sync-policy automated --auto-prune --self-heal. The first sync after re-enabling will reconcile any manual changes against what is now in Git. For Flux, you can suspend a Kustomization with flux suspend kustomization mcp-server and resume it with flux resume kustomization mcp-server. While sync is paused, AliveMCP continues its external probes, so you have real-time visibility into whether your manual fixes are working. Document the pause and resume in your incident timeline — the Git audit trail has a gap during the manual intervention, and the incident notes fill it in.