GODRIVER-3868 Update extended bson cases.#2400
Conversation
🧪 Performance ResultsCommit SHA: b17879bThe following benchmark tests for version 6a1eda7b30aa0f00077a65e6 had statistically significant changes (i.e., |z-score| > 1.96):
For a comprehensive view of all microbenchmark results for this PR's commit, please check out the Evergreen perf task for this patch. |
API Change ReportNo changes found! |
There was a problem hiding this comment.
Pull request overview
Updates the driver’s benchmark fixtures to consume the Extended BSON benchmark data directly from the specifications git submodule tarball (extended_bson.tgz) instead of relying on locally maintained copies, aligning benchmark execution with upstream spec data.
Changes:
- Added logic in the internal benchmark framework to locate and extract a specific Extended JSON fixture from
extended_bson.tgz. - Updated
bsonpackage benchmarks to read fixtures fromextended_bson.tgz, caching parsed entries for reuse. - Renamed benchmark case labels from
*.json.gzto*.jsonto reflect the new fixture source format.
Reviewed changes
Copilot reviewed 2 out of 5 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| internal/cmd/benchmark/benchmark_test.go | Loads Extended BSON benchmark fixtures directly from the specifications tarball for internal benchmark runs. |
| bson/benchmark_test.go | Replaces per-file gz fixture reads with tarball extraction + cached unmarshalling for BSON package benchmarks. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| tgzPath := filepath.Join(testdataDir(b), "specifications", "source", "benchmarking", "data", "extended_bson.tgz") | ||
|
|
||
| file, err := os.Open(tgzPath) | ||
| require.NoError(b, err, "failed to open %q", tgzPath) | ||
| defer file.Close() | ||
|
|
||
| gz, err := gzip.NewReader(file) | ||
| require.NoError(b, err, "failed to create gzip reader") | ||
| defer gz.Close() |
| tr := tar.NewReader(gz) | ||
| for { | ||
| hdr, err := tr.Next() | ||
| if errors.Is(err, io.EOF) { | ||
| break | ||
| } | ||
| if err != nil { | ||
| b.Fatalf("error reading tar: %s", err) | ||
| return nil |
| defer func() { | ||
| _ = gz.Close() | ||
| }() | ||
| if extJSONFiles == nil { |
There was a problem hiding this comment.
[nit] Suggest using sync.Once to help with readability:
func readExtJSONFile(b *testing.B, filename string) map[string]any {
b.Helper()
extJSONFilesOnce.Do(func() {
// load into extJSONFiles
})
v, ok := extJSONFiles["extended_bson/"+filename]
if !ok {
b.Fatalf("file %q not found in %q", filename, extendedBSONTGZ)
return nil
}
return v
}| return nil | ||
| } | ||
| defer func() { | ||
| _ = file.Close() |
There was a problem hiding this comment.
[blocking] We should check this error.
There was a problem hiding this comment.
I don't think the closing error affects loading, and we don't need to log it either.
| return nil | ||
| } | ||
| defer func() { | ||
| _ = gz.Close() |
There was a problem hiding this comment.
[blocking] We should check this error.
There was a problem hiding this comment.
I don't think the closing error affects loading, and we don't need to log it either.
| if err != nil { | ||
| panic(fmt.Sprintf("error reading GZIP contents of file: %s", err)) | ||
| } | ||
| extJSONFiles = make(map[string]map[string]any) |
There was a problem hiding this comment.
[nit] It doesn't matter in this case but for hygiene suggest populating a local map and assigning this at the end of sync.Once, to avoid partially filling and then erroring.
| if extJSONFiles == nil { | ||
| file, err := os.Open(extendedBSONTGZ) | ||
| if err != nil { | ||
| b.Fatalf("error opening %q: %s", extendedBSONTGZ, err) |
There was a problem hiding this comment.
| b.Fatalf("error opening %q: %s", extendedBSONTGZ, err) | |
| b.Fatalf("error opening %q: %v", extendedBSONTGZ, err) |
| } | ||
| data, err := io.ReadAll(tr) | ||
| if err != nil { | ||
| b.Fatalf("error reading tar entry %q: %s", hdr.Name, err) |
There was a problem hiding this comment.
| b.Fatalf("error reading tar entry %q: %s", hdr.Name, err) | |
| b.Fatalf("error reading tar entry %q: %v", hdr.Name, err) |
| } | ||
| var v map[string]any | ||
| if err = UnmarshalExtJSON(data, false, &v); err != nil { | ||
| b.Fatalf("error unmarshalling %q: %s", hdr.Name, err) |
There was a problem hiding this comment.
| b.Fatalf("error unmarshalling %q: %s", hdr.Name, err) | |
| b.Fatalf("error unmarshalling %q: %v", hdr.Name, err) |
| // readExtJSONFile reads the named JSON file from the extended_bson.tgz archive and returns it as a | ||
| // map[string]any. The first call decompresses the archive and caches all entries; subsequent calls | ||
| // only look up the cache. It calls b.Fatal on any errors. | ||
| func readExtJSONFile(b *testing.B, filename string) map[string]any { |
There was a problem hiding this comment.
[blocking] Use testing.TB instead of *testing.B so that the loader is agnostic. In addition, I think we should make a test for the loader to ensure the structure is what we expect.
There was a problem hiding this comment.
We can use the hard-coded file names underneath to cross-verify the structure in the tarball is what we expect.
| desc: "deep_bson.json.gz", | ||
| value: readExtJSONFile("deep_bson.json.gz"), | ||
| desc: "deep_bson.json", | ||
| value: readExtJSONFile(b, "deep_bson.json"), |
There was a problem hiding this comment.
[blocking] We don't need to hardcode the files, just iterate over the results of decompressing the tarball. We should rename readExtJSONFile to loadExtendedBSON. Then we should iterate over the results:
func TestLoadExtendedBSON(t *testing.T) {
loadExtendedBSON(t)
for filename, _ := range extJSONFiles {
t.Run(filename, func(t *testing.T) {
// Test / Benchmark
})
}
}| // readExtJSONFile reads the named JSON file from the extended_bson.tgz archive and returns it as a | ||
| // map[string]any. The first call decompresses the archive and caches all entries; subsequent calls | ||
| // only look up the cache. It calls b.Fatal on any errors. | ||
| func readExtJSONFile(b *testing.B, filename string) map[string]any { |
There was a problem hiding this comment.
[question] The original drivers ticket aimed to use strong types for these tests. Should we do the same? See this comment: https://github.com/mongodb/mongo-go-driver/pull/2400/changes#r3312898922
There was a problem hiding this comment.
The updates of strong-typed benchmarks are in "deep_bson.json.gz". They are verified in #2403.
There was a problem hiding this comment.
This change should only be applied for verification in #2403.
| var once sync.Once | ||
| var onceErr error | ||
| entryErr := make(map[string]error) | ||
| results := make(map[string]map[string]any) |
There was a problem hiding this comment.
[blocking] These variables should global variables to avoid re-parsing the tarball.
There was a problem hiding this comment.
I put them in a closure to reduce global variables. They are only invoked once before initializing loadExtendedBSON with the returned function at L161. When loadExtendedBSON is called, e.g., at L246, L250, or L254, the decompressing and parsing is performed once, which is guarded by the sync.Once.Do().
There was a problem hiding this comment.
@qingyang-hu Oh I didn't see that loadExtendedBSON is acting as a package initializer. Is there a reason not to just make the variables global? The IIFE solution is hard to read. For example:
var (
loadExtendedBSONOnce sync.Once
loadExtendedBSONErr error
loadExtendedBSONEntries map[string]map[string]any
loadExtendedBSONEntryErrs map[string]error
)
func loadExtendedBSON(tb testing.TB, filename string) map[string]any {
tb.Helper()
loadExtendedBSONOnce.Do(func() {
// ... same loading logic ...
})
// ... same lookup logic ...
}
GODRIVER-3868
Summary
Update the benchmark framework to run directly against the specifications sub-repo.
This PR decompresses the tgz file and unmarshals the embedded JSON files.
Background & Motivation
We no longer need to maintain local copies for test cases because we use a sub-repo for the test data.