Skip to content

Add derived_source_keep parameter for array preservation in derived source#22253

Open
praveenvenkat06 wants to merge 11 commits into
opensearch-project:mainfrom
praveenvenkat06:feature/support-field-level-array-preservation
Open

Add derived_source_keep parameter for array preservation in derived source#22253
praveenvenkat06 wants to merge 11 commits into
opensearch-project:mainfrom
praveenvenkat06:feature/support-field-level-array-preservation

Conversation

@praveenvenkat06

@praveenvenkat06 praveenvenkat06 commented Jun 20, 2026

Copy link
Copy Markdown

Description

This PR implements the derived_source_keep field-level parameter to preserve array order and duplicate values when using OpenSearch's Derived Source feature.

Related Issues

Resolves #22200

Problem Statement

When index.derived_source.enabled=true, OpenSearch reconstructs _source from indexed fields. Currently, fields using doc values lose:

  • Array element order (values are sorted)
  • Duplicate values (values are deduplicated)

Solution

Add a new field-level parameter derived_source_keep with two modes:

  • "none" (default): Uses doc values - sorted/deduplicated
  • "arrays": Uses stored fields - preserves order/duplicates

This implementation supports "arrays" mode only. In future, "all" mode for nested objects preservation can be implemented if there is demand.

Example Usage

PUT /orders
{
  "settings": {
    "index.derived_source.enabled": true
  },
  "mappings": {
    "properties": {
      "workflow_steps": {
        "type": "keyword",
        "derived_source_keep": "arrays"
      }
    }
  }
}

POST /orders/_doc/1
{
  "workflow_steps": ["created", "approved", "approved", "shipped"]
}

GET /orders/_doc/1

//Returns: ["created", "approved", "approved", "shipped"]
//Without arrays mode: ["approved", "created", "shipped"]

Key Features

  • Auto-enables store=true when derived_source_keep="arrays" is set
  • Validates and rejects conflicting store=false configuration
  • Supports keyword, numeric, date, boolean, and ip field types
  • Backward compatible (default behavior unchanged)
  • Comprehensive unit test cases added
  • Marked with @PublicApi(since = "3.8.0") for API stability

Check List

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Praveen Raj V <Venkatachalapathy.PraveenRaj@ibm.com>
Signed-off-by: Praveen Raj V <Venkatachalapathy.PraveenRaj@ibm.com>
Signed-off-by: Praveen Raj V <Venkatachalapathy.PraveenRaj@ibm.com>
…d_source_keep=arrays

Signed-off-by: Praveen Raj V <Venkatachalapathy.PraveenRaj@ibm.com>
Signed-off-by: Praveen Raj V <Venkatachalapathy.PraveenRaj@ibm.com>
Signed-off-by: Praveen Raj V <Venkatachalapathy.PraveenRaj@ibm.com>
Signed-off-by: Praveen Raj V <Venkatachalapathy.PraveenRaj@ibm.com>
@github-actions github-actions Bot added enhancement Enhancement or improvement to existing feature or request Storage Issues and PRs relating to data and metadata storage labels Jun 20, 2026
@github-actions

github-actions Bot commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

PR Reviewer Guide 🔍

(Review updated until commit 5bea864)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Possible Issue

The constructor logic uses Objects.requireNonNull(getDerivedFieldPreference()) to determine which fetcher to use, but getDerivedFieldPreference() depends on mappedFieldType.hasDocValues(). If hasDocValues() returns false and storedFieldFetcher is null, the assertion assert storedFieldFetcher != null; will fail. This can occur when a field has no doc values and no stored fields are configured, causing a runtime assertion error instead of a clear validation error.

    if (Objects.requireNonNull(getDerivedFieldPreference()) == FieldValueType.DOC_VALUES) {
        assert docValuesFetcher != null;
        this.fieldValueFetcher = docValuesFetcher;
    } else {
        assert storedFieldFetcher != null;
        this.fieldValueFetcher = storedFieldFetcher;
    }
}
Inconsistent State

The build method modifies this.stored parameter value after validation (this.stored.setValue(storeValue)), but then passes the original stored.getValue() to the BooleanFieldType constructor. If auto-enabling occurs, the field type receives the old value (false) instead of the updated value (true), causing a mismatch between the mapper's internal state and the field type's stored flag.

MappedFieldType ft = new BooleanFieldType(
    buildFullName(context),
    indexed.getValue(),
    stored.getValue(),
Inconsistent State

Same issue as BooleanFieldMapper: the build method sets this.store.setValue(storeValue) after validation, but then passes store.getValue() to DateFieldType constructor. The field type receives the pre-modification value, not the auto-enabled value, causing stored flag inconsistency.

DateFieldType ft = new DateFieldType(
    buildFullName(context),
    index.getValue(),
    store.getValue(),
Inconsistent State

Same issue as BooleanFieldMapper and DateFieldMapper: stored.setValue(storeValue) is called after validation, but stored.getValue() passed to IpFieldType constructor reflects the old value, not the auto-enabled value.

return new IpFieldMapper(
    name,
    new IpFieldType(
        buildFullName(context),
Inconsistent State

Same issue as other mappers: stored.setValue(storeValue) is called after validation, but the NumberFieldType constructor receives stored.getValue() which reflects the pre-modification value, not the auto-enabled value.

    MappedFieldType ft = new NumberFieldType(buildFullName(context), this);
    return new NumberFieldMapper(name, ft, multiFieldsBuilder.build(this, context), copyTo.build(), this);
}

@github-actions

github-actions Bot commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

PR Code Suggestions ✨

Latest suggestions up to 5bea864

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Clarify auto-enable store logic

The validation logic checks if store was explicitly set to false, but then
unconditionally sets storeValue = true and updates the parameter. This means if a
user doesn't explicitly set store, it will be auto-enabled. However, the error
message only triggers when explicitly set to false. Consider whether this behavior
is intentional, as it silently overrides the default store=false behavior without
user awareness.

server/src/main/java/org/opensearch/index/mapper/BooleanFieldMapper.java [152-159]

 if (derivedSourceKeep.requiresStoredFields()) {
     // Check if store was explicitly set to false
     if (this.stored.isSet() && !storeValue) {
         throw new MapperParsingException("Cannot set derived_source_keep='arrays' with store=false for field [" + name() + "]");
     }
-    storeValue = true;
-    this.stored.setValue(storeValue);
+    // Auto-enable store=true when not explicitly set or when set to true
+    if (!this.stored.isSet() || storeValue) {
+        storeValue = true;
+        this.stored.setValue(storeValue);
+    }
 }
Suggestion importance[1-10]: 4

__

Why: The suggestion raises a valid point about the auto-enable behavior, but the current implementation appears intentional. The code correctly throws an error when store=false is explicitly set, and auto-enables store=true otherwise. The suggested change adds unnecessary complexity without clear benefit.

Low
Improve error handling for null fetcher

The null check for storedFieldFetcher when derived_source_keep='arrays' is set may
not be sufficient. If the field mapper's build() method auto-enables store=true but
the fetcher is still null during construction, this will throw an exception. Verify
that the stored field fetcher is properly initialized before the
DerivedFieldGenerator is constructed, or handle this case more gracefully.

server/src/main/java/org/opensearch/index/mapper/DerivedFieldGenerator.java [60-67]

 if (derivedSourceKeep.requiresStoredFields()) {
     // User explicitly requested array preservation via stored fields
     if (storedFieldFetcher == null) {
-        throw new IllegalArgumentException(
-            "derived_source_keep='arrays' requires stored fields to be enabled for field [" + mappedFieldType.name() + "]"
+        throw new IllegalStateException(
+            "Internal error: stored field fetcher is null despite derived_source_keep='arrays' for field [" 
+            + mappedFieldType.name() + "]. This indicates a mapper configuration issue."
         );
     }
     this.fieldValueFetcher = storedFieldFetcher;
 }
Suggestion importance[1-10]: 3

__

Why: The suggestion to change from IllegalArgumentException to IllegalStateException is reasonable but minor. The current exception type is acceptable since it indicates invalid arguments. The improved error message adds some clarity but doesn't significantly improve the code.

Low

Previous suggestions

Suggestions up to commit 14db40c
CategorySuggestion                                                                                                                                    Impact
General
Avoid unnecessary parameter mutation

The validation logic checks if store was explicitly set to false, but then
unconditionally sets storeValue = true and updates the parameter. This means even
when store is not explicitly set, the parameter gets modified. Consider only setting
the value when it wasn't explicitly configured to avoid unnecessary mutations.

server/src/main/java/org/opensearch/index/mapper/BooleanFieldMapper.java [152-159]

 if (derivedSourceKeep.requiresStoredFields()) {
     // Check if store was explicitly set to false
     if (this.stored.isSet() && !storeValue) {
         throw new MapperParsingException("Cannot set derived_source_keep='arrays' with store=false for field [" + name() + "]");
     }
-    storeValue = true;
-    this.stored.setValue(storeValue);
+    if (!this.stored.isSet()) {
+        storeValue = true;
+        this.stored.setValue(storeValue);
+    }
 }
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that storeValue is set to true even when store is already true, causing unnecessary mutation. However, the impact is limited since the final state is correct. The improved code adds a check to only modify when not explicitly set, which is cleaner but not critical.

Medium
Suggestions up to commit 8caa18f
CategorySuggestion                                                                                                                                    Impact
General
Avoid unnecessary parameter mutation

The validation logic checks if store was explicitly set to false, but then
unconditionally sets storeValue = true and updates the parameter. This means if a
user explicitly sets store=true with derived_source_keep='arrays', the code will
still overwrite it. Consider only setting the value when it wasn't explicitly
configured to avoid unnecessary mutations.

server/src/main/java/org/opensearch/index/mapper/BooleanFieldMapper.java [152-159]

 if (derivedSourceKeep.requiresStoredFields()) {
     // Check if store was explicitly set to false
     if (this.stored.isSet() && !storeValue) {
         throw new MapperParsingException("Cannot set derived_source_keep='arrays' with store=false for field [" + name() + "]");
     }
-    storeValue = true;
-    this.stored.setValue(storeValue);
+    if (!this.stored.isSet()) {
+        storeValue = true;
+        this.stored.setValue(storeValue);
+    }
 }
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that when store=true is explicitly set with derived_source_keep='arrays', the code unnecessarily overwrites the value. Adding a check to only set the value when not explicitly configured improves code clarity and avoids redundant mutations.

Medium
Suggestions up to commit eedcf91
CategorySuggestion                                                                                                                                    Impact
Possible issue
Update parameter state after auto-enabling store

Unlike other field mappers, this code doesn't call this.stored.setValue(storeValue)
after auto-enabling store. This inconsistency means the parameter state won't
reflect the actual stored field configuration, potentially causing issues during
mapper serialization or merging operations.

server/src/main/java/org/opensearch/index/mapper/KeywordFieldMapper.java [271-279]

 if (derivedSourceKeep.requiresStoredFields()) {
     // Check if store was explicitly set to false
     if (this.stored.isSet() && !storeValue) {
         throw new MapperParsingException("Cannot set derived_source_keep='arrays' with store=false for field [" + name + "]");
     }
     storeValue = true;
+    this.stored.setValue(storeValue);
 }
 
 fieldtype.setStored(storeValue);
Suggestion importance[1-10]: 8

__

Why: This identifies a critical inconsistency where KeywordFieldMapper doesn't call this.stored.setValue(storeValue) after auto-enabling store, unlike other field mappers. This could cause issues during mapper serialization or merging operations where the parameter state doesn't reflect the actual configuration.

Medium
General
Avoid overwriting explicitly set store value

The validation logic checks if store was explicitly set to false, but then
unconditionally sets storeValue = true and updates the parameter. This means if a
user sets store=true explicitly, the code still overwrites it. Consider only setting
the value when it wasn't explicitly configured to avoid unnecessary mutations.

server/src/main/java/org/opensearch/index/mapper/BooleanFieldMapper.java [152-159]

 if (derivedSourceKeep.requiresStoredFields()) {
     // Check if store was explicitly set to false
     if (this.stored.isSet() && !storeValue) {
         throw new MapperParsingException("Cannot set derived_source_keep='arrays' with store=false for field [" + name() + "]");
     }
-    storeValue = true;
-    this.stored.setValue(storeValue);
+    if (!this.stored.isSet()) {
+        storeValue = true;
+        this.stored.setValue(storeValue);
+    }
 }
Suggestion importance[1-10]: 4

__

Why: The suggestion correctly identifies that the code unconditionally sets storeValue = true even when store=true was explicitly set. However, the impact is minimal since setting an already-true value to true is harmless. The suggested optimization avoids unnecessary mutations but doesn't fix a bug.

Low
Replace redundant null check with assertion

The null check for storedFieldFetcher occurs after the mode validation, but the
stored field requirement should have been enforced during mapper construction. This
defensive check may never trigger if the mapper validation works correctly, creating
unreachable error handling code.

server/src/main/java/org/opensearch/index/mapper/DerivedFieldGenerator.java [60-67]

 if (derivedSourceKeep.requiresStoredFields()) {
     // User explicitly requested array preservation via stored fields
-    if (storedFieldFetcher == null) {
-        throw new IllegalArgumentException(
-            "derived_source_keep='arrays' requires stored fields to be enabled for field [" + mappedFieldType.name() + "]"
-        );
-    }
+    assert storedFieldFetcher != null : "stored field fetcher must be provided when derived_source_keep='arrays'";
     this.fieldValueFetcher = storedFieldFetcher;
 }
Suggestion importance[1-10]: 3

__

Why: The suggestion proposes replacing a defensive null check with an assertion. While this could simplify the code, the existing null check provides better runtime safety and clearer error messages. The suggestion is valid but offers marginal improvement.

Low

@github-actions

Copy link
Copy Markdown
Contributor

✅ Gradle check result for eedcf91: SUCCESS

@codecov

codecov Bot commented Jun 20, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 86.95652% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.33%. Comparing base (abe02c5) to head (eedcf91).
⚠️ Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
...opensearch/index/mapper/DerivedFieldGenerator.java 53.84% 3 Missing and 3 partials ⚠️
...a/org/opensearch/index/mapper/DateFieldMapper.java 88.23% 1 Missing and 1 partial ⚠️
...rg/opensearch/index/mapper/KeywordFieldMapper.java 87.50% 1 Missing and 1 partial ⚠️
...org/opensearch/index/mapper/NumberFieldMapper.java 88.23% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #22253      +/-   ##
============================================
- Coverage     73.36%   73.33%   -0.03%     
- Complexity    75900    75918      +18     
============================================
  Files          6070     6071       +1     
  Lines        344912   345068     +156     
  Branches      49627    49651      +24     
============================================
+ Hits         253042   253065      +23     
- Misses        71706    71862     +156     
+ Partials      20164    20141      -23     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Praveen Raj V <Venkatachalapathy.PraveenRaj@ibm.com>
@github-actions

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 8caa18f

@github-actions

Copy link
Copy Markdown
Contributor

❌ Gradle check result for 8caa18f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 14db40c

@github-actions

Copy link
Copy Markdown
Contributor

❌ Gradle check result for 14db40c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Praveen Raj V <Venkatachalapathy.PraveenRaj@ibm.com>
@github-actions

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 5bea864

@github-actions

Copy link
Copy Markdown
Contributor

❌ Gradle check result for 5bea864: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Enhancement or improvement to existing feature or request Storage Issues and PRs relating to data and metadata storage

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

[Feature Request] Support field-level array preservation for Derived Source reconstruction

2 participants