Skip to content

GODRIVER-3868 Update extended bson cases.#2400

Open
qingyang-hu wants to merge 3 commits into
mongodb:masterfrom
qingyang-hu:godriver3868
Open

GODRIVER-3868 Update extended bson cases.#2400
qingyang-hu wants to merge 3 commits into
mongodb:masterfrom
qingyang-hu:godriver3868

Conversation

@qingyang-hu
Copy link
Copy Markdown
Contributor

GODRIVER-3868

Summary

Update the benchmark framework to run directly against the specifications sub-repo.
This PR decompresses the tgz file and unmarshals the embedded JSON files.

Background & Motivation

We no longer need to maintain local copies for test cases because we use a sub-repo for the test data.

@mongodb-drivers-pr-bot
Copy link
Copy Markdown
Contributor

mongodb-drivers-pr-bot Bot commented May 26, 2026

🧪 Performance Results

Commit SHA: b17879b

The following benchmark tests for version 6a1eda7b30aa0f00077a65e6 had statistically significant changes (i.e., |z-score| > 1.96):

Benchmark Measurement % Change Patch Value Stable Region H-Score Z-Score
BenchmarkBSONDeepDocumentDecoding ns_per_op -3.3181 61217.0000 Avg: 63317.9769
Med: 63343.5000
Stdev: 1059.5067
0.7247 -1.9830
BenchmarkBSONDeepDocumentDecoding ops_per_second_med 3.2634 17254.4689 Avg: 16709.1851
Med: 16683.9069
Stdev: 264.6535
0.7327 2.0604
BenchmarkBSONDeepDocumentDecoding ops_per_second_max 2.8735 17768.9328 Avg: 17272.6051
Med: 17200.4541
Stdev: 253.0210
0.7288 1.9616
BenchmarkBSONDeepDocumentEncoding total_time_seconds -1.7487 1.1732 Avg: 1.1941
Med: 1.1945
Stdev: 0.0092
0.7798 -2.2618
BenchmarkBSONFlatDocumentDecoding total_time_seconds -1.4589 1.1820 Avg: 1.1995
Med: 1.1992
Stdev: 0.0063
0.8095 -2.7585

For a comprehensive view of all microbenchmark results for this PR's commit, please check out the Evergreen perf task for this patch.

@mongodb-drivers-pr-bot
Copy link
Copy Markdown
Contributor

API Change Report

No changes found!

@qingyang-hu qingyang-hu marked this pull request as ready for review May 26, 2026 15:50
@qingyang-hu qingyang-hu requested a review from a team as a code owner May 26, 2026 15:50
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the driver’s benchmark fixtures to consume the Extended BSON benchmark data directly from the specifications git submodule tarball (extended_bson.tgz) instead of relying on locally maintained copies, aligning benchmark execution with upstream spec data.

Changes:

  • Added logic in the internal benchmark framework to locate and extract a specific Extended JSON fixture from extended_bson.tgz.
  • Updated bson package benchmarks to read fixtures from extended_bson.tgz, caching parsed entries for reuse.
  • Renamed benchmark case labels from *.json.gz to *.json to reflect the new fixture source format.

Reviewed changes

Copilot reviewed 2 out of 5 changed files in this pull request and generated 2 comments.

File Description
internal/cmd/benchmark/benchmark_test.go Loads Extended BSON benchmark fixtures directly from the specifications tarball for internal benchmark runs.
bson/benchmark_test.go Replaces per-file gz fixture reads with tarball extraction + cached unmarshalling for BSON package benchmarks.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +215 to +223
tgzPath := filepath.Join(testdataDir(b), "specifications", "source", "benchmarking", "data", "extended_bson.tgz")

file, err := os.Open(tgzPath)
require.NoError(b, err, "failed to open %q", tgzPath)
defer file.Close()

gz, err := gzip.NewReader(file)
require.NoError(b, err, "failed to create gzip reader")
defer gz.Close()
Comment thread bson/benchmark_test.go Outdated
Comment on lines +183 to +191
tr := tar.NewReader(gz)
for {
hdr, err := tr.Next()
if errors.Is(err, io.EOF) {
break
}
if err != nil {
b.Fatalf("error reading tar: %s", err)
return nil
Comment thread bson/benchmark_test.go Outdated
defer func() {
_ = gz.Close()
}()
if extJSONFiles == nil {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] Suggest using sync.Once to help with readability:

func readExtJSONFile(b *testing.B, filename string) map[string]any {
    b.Helper()
    extJSONFilesOnce.Do(func() {
        // load into extJSONFiles
    })

    v, ok := extJSONFiles["extended_bson/"+filename]
    if !ok {
        b.Fatalf("file %q not found in %q", filename, extendedBSONTGZ)
        return nil
    }
    return v
}

Comment thread bson/benchmark_test.go Outdated
return nil
}
defer func() {
_ = file.Close()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[blocking] We should check this error.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the closing error affects loading, and we don't need to log it either.

Comment thread bson/benchmark_test.go Outdated
return nil
}
defer func() {
_ = gz.Close()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[blocking] We should check this error.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the closing error affects loading, and we don't need to log it either.

Comment thread bson/benchmark_test.go Outdated
if err != nil {
panic(fmt.Sprintf("error reading GZIP contents of file: %s", err))
}
extJSONFiles = make(map[string]map[string]any)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] It doesn't matter in this case but for hygiene suggest populating a local map and assigning this at the end of sync.Once, to avoid partially filling and then erroring.

Comment thread bson/benchmark_test.go Outdated
if extJSONFiles == nil {
file, err := os.Open(extendedBSONTGZ)
if err != nil {
b.Fatalf("error opening %q: %s", extendedBSONTGZ, err)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
b.Fatalf("error opening %q: %s", extendedBSONTGZ, err)
b.Fatalf("error opening %q: %v", extendedBSONTGZ, err)

Comment thread bson/benchmark_test.go Outdated
}
data, err := io.ReadAll(tr)
if err != nil {
b.Fatalf("error reading tar entry %q: %s", hdr.Name, err)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
b.Fatalf("error reading tar entry %q: %s", hdr.Name, err)
b.Fatalf("error reading tar entry %q: %v", hdr.Name, err)

Comment thread bson/benchmark_test.go Outdated
}
var v map[string]any
if err = UnmarshalExtJSON(data, false, &v); err != nil {
b.Fatalf("error unmarshalling %q: %s", hdr.Name, err)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
b.Fatalf("error unmarshalling %q: %s", hdr.Name, err)
b.Fatalf("error unmarshalling %q: %v", hdr.Name, err)

Comment thread bson/benchmark_test.go Outdated
// readExtJSONFile reads the named JSON file from the extended_bson.tgz archive and returns it as a
// map[string]any. The first call decompresses the archive and caches all entries; subsequent calls
// only look up the cache. It calls b.Fatal on any errors.
func readExtJSONFile(b *testing.B, filename string) map[string]any {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[blocking] Use testing.TB instead of *testing.B so that the loader is agnostic. In addition, I think we should make a test for the loader to ensure the structure is what we expect.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use the hard-coded file names underneath to cross-verify the structure in the tarball is what we expect.

Comment thread bson/benchmark_test.go Outdated
desc: "deep_bson.json.gz",
value: readExtJSONFile("deep_bson.json.gz"),
desc: "deep_bson.json",
value: readExtJSONFile(b, "deep_bson.json"),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[blocking] We don't need to hardcode the files, just iterate over the results of decompressing the tarball. We should rename readExtJSONFile to loadExtendedBSON. Then we should iterate over the results:

func TestLoadExtendedBSON(t *testing.T) {
	loadExtendedBSON(t)

	for filename, _ := range extJSONFiles {
		t.Run(filename, func(t *testing.T) {
			// Test / Benchmark
		})
	}
}

Comment thread bson/benchmark_test.go Outdated
// readExtJSONFile reads the named JSON file from the extended_bson.tgz archive and returns it as a
// map[string]any. The first call decompresses the archive and caches all entries; subsequent calls
// only look up the cache. It calls b.Fatal on any errors.
func readExtJSONFile(b *testing.B, filename string) map[string]any {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[question] The original drivers ticket aimed to use strong types for these tests. Should we do the same? See this comment: https://github.com/mongodb/mongo-go-driver/pull/2400/changes#r3312898922

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updates of strong-typed benchmarks are in "deep_bson.json.gz". They are verified in #2403.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change should only be applied for verification in #2403.

Comment thread bson/benchmark_test.go
Comment on lines +156 to +159
var once sync.Once
var onceErr error
entryErr := make(map[string]error)
results := make(map[string]map[string]any)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[blocking] These variables should global variables to avoid re-parsing the tarball.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put them in a closure to reduce global variables. They are only invoked once before initializing loadExtendedBSON with the returned function at L161. When loadExtendedBSON is called, e.g., at L246, L250, or L254, the decompressing and parsing is performed once, which is guarded by the sync.Once.Do().

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qingyang-hu Oh I didn't see that loadExtendedBSON is acting as a package initializer. Is there a reason not to just make the variables global? The IIFE solution is hard to read. For example:

var (
    loadExtendedBSONOnce    sync.Once
    loadExtendedBSONErr     error
    loadExtendedBSONEntries map[string]map[string]any
    loadExtendedBSONEntryErrs map[string]error
)

func loadExtendedBSON(tb testing.TB, filename string) map[string]any {
    tb.Helper()
    loadExtendedBSONOnce.Do(func() {
        // ... same loading logic ...
    })
    // ... same lookup logic ...
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants