Skip to main content

Remove data from OpenSearch

This runbook aims to guide you through purging specific logs/data from OpenSearch.

Let’s assume in this guide that we are trying to delete all logs of the my-namespace-production namespace, from the 01/02/2025.

Stop the breach first

If you’re removing data from OpenSearch, this is presumably because sensitive information has been logged by mistake. Please ensure that the underlying problem has been fixed, and no more sensitive data is going to be logged by the same application, before spending time removing what’s already there.

Things to know

An important thing to understand is the difference between an index and a document. An index contains documents. In our case, documents are log entries.

The current logging strategy is to have one index per day, containing all the daily logs.

All application logs are sent to that daily index, ex: live_kubernetes_cluster-2025.02.01.

If you want your query to cover all log data stored in the OpenSearch cluster, you can use an expression like live_kubernetes_cluster-2*.

Although it is possible to delete a whole index with a curator job, this guide will only cover the deletion of specific documents.

Get yourself access

Cloud Platform team members have access to the required IAM permissions to carry out the following commands from work devices. You’ll just need to authenticate as usual with aws sso login.

We’ll be making use of curl and AWS Sigv4 to authenticate our requests to OpenSearch. You can take a look at this guide on how to build curl requests. Alternatively, you can install awscurl, which provides a wrapper around curl that handles the signing for you. We’ll use this for snippets below.

Ensure the indices are in hot storage before deleting

Documents can only be deleted from indices that are in hot storage.

By default, only the most recent day of logs (i.e. today’s index) remains in the hot tier. Older indices are automatically moved to warm and then cold storage as part of the lifecycle policy.

Once an index has moved to warm or cold storage, it becomes read-only and _delete_by_query operations will fail.

Before running any delete command, make sure the relevant indices have been restored to the hot tier.

Follow the steps here to restore indices: Restore OpenSearch indices to hot storage

⚠️ If you plan to restore multiple indices, ensure there is enough available capacity in the hot tier before proceeding.

Build your query

It is possible to reuse the body of a search query for a delete query.

Therefore, if we narrow down a search query to exactly what we want to delete, we’ve done most of the work.

OpenSearch offers a webconsole to test queries.

A standard query is :

GET /live_kubernetes_cluster-2025.02.01/_search
{
  "query": {
    "match": {
      "kubernetes.namespace_name.keyword": "my-namespace-production"
    }
  }
}

Important note: .keyword forces OpenSearch to only return exact matches. Without it, you may end up deleting more than you wish

As an awscurl command, the query above would translate to:

awscurl --service es \
  --region eu-west-2 \
  -XGET "https://search-cp-live-app-logs-jywwr7het3xzoh5t7ajar4ho3m.eu-west-2.es.amazonaws.com/live_kubernetes_cluster-2025.02.01/_search" \
  -H 'Content-Type: application/json' \
  -d '{
    "query": {
      "match": {
        "kubernetes.namespace_name.keyword": "my-namespace-production"
      }
    }
  }'

Once the result of the search query exactly fits what you want to delete, carry on to the next step.

Removing specific logs filtered by namespace and time range

If we need to remove logs for a specific namespace within a defined time window, we can use a _delete_by_query request with a range filter on @timestamp.

In the example below:

  • gte (greater than or equal to) defines the start of the time window (inclusive)
  • lte (less than or equal to) defines the end of the time window (inclusive)

Make sure you carefully adjust both timestamps before running the command.

awscurl --service es \
--region eu-west-2 \
  -XPOST "https://search-cp-live-app-logs-jywwr7het3xzoh5t7ajar4ho3m.eu-west-2.es.amazonaws.com/live_kubernetes_cluster-*/_delete_by_query" \
  -H "Content-Type: application/json" \
  -d '{
  "query": {
    "bool": {
      "must": [],
      "filter": [
        {
          "match_phrase": {
            "kubernetes.namespace_name.keyword": "my-namespace-production"
          }
        },
        {
          "range": {
            "@timestamp": {
              "gte": "2026-01-07T00:00:00.000Z",
              "lte": "2026-01-09T00:00:00.000Z",
              "format": "strict_date_optional_time"
            }
          }
        }
      ],
      "should": [],
      "must_not": []
    }
  }
} '

Here again, .keyword is essential.

⚠️ If a data leak has occurred, it is often safer to delete all logs for the affected namespace during the impacted period rather than attempting to target individual log lines. This reduces the risk of leaving sensitive data behind due to an incomplete query.

Removing specific logs filtered by phrase

If you want to delete log entries which contain a certain phrase, you can use the queries below.

Searching for the phrase “What is your name?” between certain dates, the curl command would look like:

awscurl --service es \
  --region eu-west-2 \
  -XGET "https://search-cp-live-app-logs-jywwr7het3xzoh5t7ajar4ho3m.eu-west-2.es.amazonaws.com/live_kubernetes_cluster-2*/_search" \
  -H 'Content-Type: application/json' \
  -d '{
    "query": {
      "bool": {
        "must": [],
        "filter": [
          {
            "multi_match": {
              "type": "phrase",
              "query": "What is your name?",
              "lenient": true
            }
          },
          {
            "range": {
              "@timestamp": {
                "format": "strict_date_optional_time",
                "gte": "2025-03-13T12:19:24.804Z",
                "lte": "2025-04-14T15:19:24.804Z"
              }
            }
          }
        ],
        "should": [],
        "must_not": []
      }
    }
  }'

Deleting the log entries which have the phrase “What is your name?” between certain dates, the awscurl command would look like this:

awscurl --service es \
  --region eu-west-2 \
  -XPOST "https://search-cp-live-app-logs-jywwr7het3xzoh5t7ajar4ho3m.eu-west-2.es.amazonaws.com/live_kubernetes_cluster-2*/_delete_by_query" \
  -H 'Content-Type: application/json' \
  -d '{
    "query": {
      "bool": {
        "must": [],
        "filter": [
          {
            "multi_match": {
              "type": "phrase",
              "query": "What is your name?",
              "lenient": true
            }
          },
          {
            "range": {
              "@timestamp": {
                "format": "strict_date_optional_time",
                "gte": "2025-03-13T12:19:24.804Z",
                "lte": "2025-04-14T15:19:24.804Z"
              }
            }
          }
        ],
        "should": [],
        "must_not": []
      }
    }
  }'

If you have complex search patterns used in OpenSearch which you want to delete, - narrow down the search in the webconsole - click on “Inspect” - In the popup search window, click on the tab “Request” - Copy and paste the object which mentions "query": { to the above curl command and modify dates as needed

If you are searching for phrases which contain single-quote characters, those will need special handling when adding the phrases to the query

This page was last reviewed on 17 February 2026. It needs to be reviewed again on 17 August 2026 by the page owner #cloud-platform-notify .
This page was set to be reviewed before 17 August 2026 by the page owner #cloud-platform-notify. This might mean the content is out of date.