Remove data from OpenSearch
This runbook aims to guide you through purging specific logs/data from OpenSearch.
Let’s assume in this guide that we are trying to delete all logs of the my-namespace-production
namespace, from the 01/02/2025.
Stop the breach first
If you’re removing data from OpenSearch, this is presumably because sensitive information has been logged by mistake. Please ensure that the underlying problem has been fixed, and no more sensitive data is going to be logged by the same application, before spending time removing what’s already there.
Things to know
An important thing to understand is the difference between an index and a document. An index contains documents. In our case, documents are log entries.
The current logging strategy is to have one index per day, containing all the daily logs.
All application logs are sent to that daily index, ex: live_kubernetes_cluster-2025.02.01
.
If you want your query to cover all log data stored in the OpenSearch cluster, you can use an expression like live_kubernetes_cluster-2*
.
Although it is possible to delete a whole index with a curator job, this guide will only cover the deletion of specific documents.
Get yourself access
Cloud Platform team members have access to the required IAM permissions to carry out the following commands from work devices. You’ll just need to authenticate as usual with aws sso login
.
We’ll be making use of curl
and AWS Sigv4 to authenticate our requests to OpenSearch. You can take a look at this guide on how to build curl
requests. Alternatively, you can install awscurl, which provides a wrapper around curl
that handles the signing for you. We’ll use this for snippets below.
Build your query
It is possible to reuse the body of a search
query for a delete
query.
Therefore, if we narrow down a search query to exactly what we want to delete, we’ve done most of the work.
OpenSearch offers a webconsole to test queries.
A standard query is :
GET /live_kubernetes_cluster-2025.02.01/_search
{
"query": {
"match": {
"kubernetes.namespace_name.keyword": "my-namespace-production"
}
}
}
Important note:
.keyword
forces OpenSearch to only return exact matches. Without it, you may end up deleting more than you wish
As an awscurl
command, the query above would translate to:
awscurl --service es \
--region eu-west-2 \
-XGET "https://search-cp-live-app-logs-jywwr7het3xzoh5t7ajar4ho3m.eu-west-2.es.amazonaws.com/live_kubernetes_cluster-2025.02.01/_search" \
-H 'Content-Type: application/json' \
-d '{
"query": {
"match": {
"kubernetes.namespace_name.keyword": "my-namespace-production"
}
}
}'
Once the result of the search query exactly fits what you want to delete, carry on to the next step.
Delete by query
A POST
request against the _delete_by_query
endpoint will delete all documents matching the query in the request’s body.
If we want to delete all logs from the my-namespace-production
from 01/02/2021:
awscurl --service es \
--region eu-west-2 \
-XPOST "https://search-cp-live-app-logs-jywwr7het3xzoh5t7ajar4ho3m.eu-west-2.es.amazonaws.com/live_kubernetes_cluster-2025.02.01/_delete_by_query" \
-H 'Content-Type: application/json' \
-d '{
"query": {
"match": {
"kubernetes.namespace_name.keyword": "my-namespace-production"
}
}
}'
Here again, .keyword
is essential.
Removing specific logs filtered by phrase
If you want to delete log entries which contain a certain phrase, you can use the queries below.
Searching for the phrase “What is your name?” between certain dates, the curl
command would look like:
awscurl --service es \
--region eu-west-2 \
-XGET "https://search-cp-live-app-logs-jywwr7het3xzoh5t7ajar4ho3m.eu-west-2.es.amazonaws.com/live_kubernetes_cluster-2*/_search" \
-H 'Content-Type: application/json' \
-d '{
"query": {
"bool": {
"must": [],
"filter": [
{
"multi_match": {
"type": "phrase",
"query": "What is your name?",
"lenient": true
}
},
{
"range": {
"@timestamp": {
"format": "strict_date_optional_time",
"gte": "2025-03-13T12:19:24.804Z",
"lte": "2025-04-14T15:19:24.804Z"
}
}
}
],
"should": [],
"must_not": []
}
}
}'
Deleting the log entries which have the phrase “What is your name?” between certain dates, the awscurl
command would look like this:
awscurl --service es \
--region eu-west-2 \
-XPOST "https://search-cp-live-app-logs-jywwr7het3xzoh5t7ajar4ho3m.eu-west-2.es.amazonaws.com/live_kubernetes_cluster-2*/_delete_by_query" \
-H 'Content-Type: application/json' \
-d '{
"query": {
"bool": {
"must": [],
"filter": [
{
"multi_match": {
"type": "phrase",
"query": "What is your name?",
"lenient": true
}
},
{
"range": {
"@timestamp": {
"format": "strict_date_optional_time",
"gte": "2025-03-13T12:19:24.804Z",
"lte": "2025-04-14T15:19:24.804Z"
}
}
}
],
"should": [],
"must_not": []
}
}
}'
If you have complex search patterns used in OpenSearch which you want to delete,
- narrow down the search in the webconsole
- click on “Inspect”
- In the popup search window, click on the tab “Request”
- Copy and paste the object which mentions "query": {
to the above curl command and modify dates as needed
If you are searching for phrases which contain single-quote characters, those will need special handling when adding the phrases to the query
—– THE BELOW SECTION REQUIRES A REWRITE FOR OPENSEARCH —–
Deleting documents stored in warm storage
The documents can only be deleted from indices that are in hot storage. Currently past 14 days data are in hot storage. Once the indices are moved to warm/cold storage, you cannot delete particular documents as they are set as read-only. You can move the indices from warm to hot storage and then delete the documents. Follow the steps below to perform this operation.
Go to kibana -> Dev Tools
For example, if you want to delete the index live_kubernetes_ingress-2023.03.02
, run the following command
POST _ultrawarm/migration/live_kubernetes_ingress-2023.03.02/_hot
This will move the index to hot storage. You can check the status of the migration using the following command
GET _ultrawarm/migration/live_kubernetes_ingress-2023.03.02
Once the migration is complete, you can delete the documents using the procedure mentioned at the beginning of this runbook.
Once the documents are deleted, you can move the indices back to warm storage using the following command
POST _ultrawarm/migration/live_kubernetes_ingress-2023.03.02/_warm
NOTE: If you are planning to move a lot of warm indices to hot, make sure you have enough storage in hot storage.