Skip to content

Swap encryption pr6 redis#6824

Open
DevVegeta wants to merge 21 commits into
GoogleCloudPlatform:masterfrom
ashishsuneja:swap-encryption-pr6-redis
Open

Swap encryption pr6 redis#6824
DevVegeta wants to merge 21 commits into
GoogleCloudPlatform:masterfrom
ashishsuneja:swap-encryption-pr6-redis

Conversation

@DevVegeta

Copy link
Copy Markdown
Collaborator

Add GCP/AWS support and swap toggle to kubernetes_redis_memtier_benchmark

Extend the benchmark to run on both GCP (GKE, default) and AWS (EKS).
Pass --cloud=AWS to target EKS. Pass --cloud=GCP (or omit) for GKE.

BENCHMARK_CONFIG:

  • Add AWS vm_specs: m5.xlarge (servers), c5.8xlarge (clients) alongside
    existing GCP entries so the benchmark runs on either cloud without a
    user config override for the nodepools.

Swap toggle (--kubernetes_redis_memtier_swap_enabled):

  • GCP: upgrades servers nodepool to n4-highmem-32 + hyperdisk-balanced
    (160k IOPS / 2400 MiB/s); injects GCP iops/throughput into swap_config.
  • AWS: upgrades servers nodepool to r6i.8xlarge + gp3
    (16k IOPS / 1000 MiB/s); injects AWS iops/throughput into swap_config.
  • Shared sysctl swap_config keys (enabled, swappiness, min_free_kbytes=67584,
    watermark_scale_factor) applied to both clouds.
  • GCP-only and AWS-only disk params are guarded per cloud in the vm_spec
    loop; no GCP params leak into AWS config and vice versa.

GetConfig(): cloud-specific elif branch (GCP / AWS) for machine type and
disk settings; boot_disk_iops/throughput selected per cloud in swap_config.

_SwapMetadata(): cloud-aware via FLAGS.cloud -- samples carry the correct
machine type and disk constants for GCP or AWS runs.

Protocol fix: inline guard FLAGS['memtier_protocol'].present ensures Redis
protocol is used by default; avoids silent 0-ops from memcache_binary default.

Timeout fix: None-safe fallback (3600s) when MEMTIER_RUN_DURATION is None
(request-count mode).

TypeVar: replaced PEP 695 [T] syntax with TypeVar for Python 3.10 compat.

Note: EKS swap activation via nodeadm (EksSwapConfig._Create()) is a stub
deferred to PR #6780. The benchmark runs on AWS but swap is not activated
on EKS nodes until that PR lands.

Command:
GCP:
python pkb.py --benchmarks=kubernetes_redis_memtier --project= --zones= --kubernetes_redis_memtier_swap_enabled=True

DevVegeta and others added 21 commits June 19, 2025 10:46
…il.IssueCommand

Replace all raw ['gcloud', ...] list + vm_util.IssueCommand calls in
swap_encryption_benchmark.py with PKB's existing GcloudCommand infrastructure:

- _create_benchmark_node_pool: cluster._GcloudCommand() + cmd.flags + cmd.Issue
- _delete_default_node_pool: cluster._GcloudCommand() + cmd.Issue
- _attach_swap_disk: gcp_util.GcloudCommand(_GcpZonalResource) for create+attach
- _delete_disk_by_name: gcp_util.GcloudCommand for describe/detach/delete

Add _GcpZonalResource shim: pins zone for gcloud compute operations.
GcloudCommand auto-injects --project and --zone/--region, handles auth
token refresh -- matching PKB standards.
…fix imports

Replace manual temp-file + kubectl apply in _deploy_daemonset() with
PKB's kubernetes_commands.ApplyManifest():

- Remove _daemonset_yaml() helper
- _deploy_daemonset() delegates to kubernetes_commands.ApplyManifest(
    'cluster/swap_encryption_daemonset.yaml.j2', **kwargs)
- Add kubernetes_commands import; remove vm_util import (now unused)
- Fix import order: providers.gcp before resources.container_service
… remove cgroup hack

Address Ajay review comments on PR GoogleCloudPlatform#6776:

Comment #r3457877984 (linuxConfig.swapConfig):
Extend --system-config-from-file YAML with linuxConfig blocks:
  linuxConfig.swapConfig.enabled: true -- GKE sets up node-level swap
  dedicatedLocalSsdProfile.diskCount: N -- LSSD: use local NVMe for swap
  linuxConfig.sysctl: vm.swappiness=100, vm.min_free_kbytes=200,
    vm.watermark_scale_factor=500
Ref: https://cloud.google.com/kubernetes-engine/docs/how-to/node-memory-swap

Comment #r3457928855 (cgroup hack):
Remove memory.swap.max=max loop from swap_encryption_daemonset.yaml.j2.
With kubeletConfig.memorySwapBehavior=LimitedSwap the kubelet manages
per-container swap allocation; the cgroup hack is unnecessary.
…5); manifest moved to data/cluster and rendered via vm_util
Per Ajay's review comment on PR GoogleCloudPlatform#6758:
- Add _GKE_KUBELET_MEMORY_SWAP flag (default LimitedSwap) so the
  benchmark nodepool is created with kubeletConfig.memorySwapBehavior
  set via --system-config-from-file, enabling pod-level swap usage.
- Wrap gcloud IssueCommand in try/finally to clean up the temp YAML.
- Update nodepool creation log to include kubelet_swap value.
- Add SwapDaemonSet(resource.BaseResource) in resources/container_service/swap_daemonset.py
  - _Create(): apply Jinja2 manifest + wait for Running + /tmp/pkb_ready
  - _Delete(): in-pod swapoff/dmsetup/losetup/pkill teardown; kubectl delete
  - PodExec(): transient-reset retry, rc=137 OOM detection, pod recovery
- Add SwapNodePool(resource.BaseResource) in resources/container_service/swap_nodepool.py
  - _Create(): gcloud node-pools create with linuxConfig.swapConfig + optional swap disk
  - _Delete(): detach+delete disk; delete nodepool
  - DeleteDefaultPool(): remove dummy e2-medium pool after DaemonSet pod Running
- Rewrite benchmark to thin pattern: Prepare() uses resource.Create() + spec.resources
  - Cleanup() is empty - PKB framework auto-deletes spec.resources
  - Run() uses daemonset.PodExec() throughout
- Addresses Zac review: resources pattern, no infra code in benchmark file
- Fix COS_CONTAINERD -> UBUNTU_CONTAINERD (r3472549985)
- swapConfig auto-enables memorySwapBehavior=LimitedSwap (r3472513706)
… NodepoolSpec field

BREAKING: replaces SwapNodePool (standalone nodepool lifecycle) with the
correct PKB pattern: swap configuration declared in BENCHMARK_CONFIG and
applied by the existing GKE cluster creation flow.

New files:
- resources/container_service/swap_config.py
  - GkeSwapConfig(BaseResource): WriteLinuxConfigYaml(), ValidHyperdiskThroughput()
  - EksSwapConfig(BaseResource): stub for nodeadm config (deferred to PR GoogleCloudPlatform#6780)

Core framework changes:
- configs/container_spec.py: add SwapConfigSpec(BaseSpec) + _SwapConfigDecoder
  + swap_config field on NodepoolSpec
- resources/container_service/container.py: add swap_config attr to BaseNodePoolConfig
- resources/container_service/container_cluster.py: propagate swap_config in
  _InitializeNodePool() (mirrors sandbox_config pattern)
- providers/gcp/google_kubernetes_engine.py: _AddNodeParamsToCmd() reads
  nodepool_config.swap_config - applies --system-config-from-file,
  UBUNTU_CONTAINERD, --no-enable-autorepair, boot-disk-provisioned-iops/throughput

Thin benchmark:
- BENCHMARK_CONFIG declares benchmark nodepool with swap_config (no separate
  nodepool create needed - GKE cluster creation handles it)
- Prepare(): deploy SwapDaemonSet + delete default-pool
- Run(): verify swap_active + swap_encrypted; report samples
- Cleanup(): empty (PKB auto-deletes spec.resources)

Addresses Ajay reviews:
- r3457826290: swap as base resource plugged into GKE cluster creation flow
- r3457877984: linuxConfig.swapConfig via --system-config-from-file (GkeSwapConfig)
- r3457928855: removed memory.swap.max hack
- r3457964593: UBUNTU_CONTAINERD set per-nodepool in _AddNodeParamsToCmd
- r3472513706: swapConfig auto-enables memorySwapBehavior=LimitedSwap
- r3472549985: UBUNTU_CONTAINERD required for dm-crypt
… NodepoolSpec field

BREAKING: replaces SwapNodePool (standalone nodepool lifecycle) with the
correct PKB pattern: swap configuration declared in BENCHMARK_CONFIG and
applied by the existing GKE cluster creation flow.

New files:
- resources/container_service/swap_config.py
  - GkeSwapConfig(BaseResource): WriteLinuxConfigYaml(), ValidHyperdiskThroughput()
  - EksSwapConfig(BaseResource): stub for nodeadm config (deferred to PR GoogleCloudPlatform#6780)

Core framework changes:
- configs/container_spec.py: add SwapConfigSpec(BaseSpec) + _SwapConfigDecoder
  + swap_config field on NodepoolSpec
- resources/container_service/container.py: add swap_config attr to BaseNodePoolConfig
- resources/container_service/container_cluster.py: propagate swap_config in
  _InitializeNodePool() (mirrors sandbox_config pattern)
- providers/gcp/google_kubernetes_engine.py: _AddNodeParamsToCmd() reads
  nodepool_config.swap_config - applies --system-config-from-file,
  UBUNTU_CONTAINERD, --no-enable-autorepair, boot-disk-provisioned-iops/throughput

Thin benchmark:
- BENCHMARK_CONFIG declares benchmark nodepool with swap_config (no separate
  nodepool create needed - GKE cluster creation handles it)
- Prepare(): deploy SwapDaemonSet + delete default-pool
- Run(): verify swap_active + swap_encrypted; report samples
- Cleanup(): empty (PKB auto-deletes spec.resources)

Addresses Ajay reviews:
- r3457826290: swap as base resource plugged into GKE cluster creation flow
- r3457877984: linuxConfig.swapConfig via --system-config-from-file (GkeSwapConfig)
- r3457928855: removed memory.swap.max hack
- r3457964593: UBUNTU_CONTAINERD set per-nodepool in _AddNodeParamsToCmd
- r3472513706: swapConfig auto-enables memorySwapBehavior=LimitedSwap
- r3472549985: UBUNTU_CONTAINERD required for dm-crypt
GkeSwapConfig and EksSwapConfig now both inherit from BaseSwapConfig(BaseResource).
Common sysctl attrs (swappiness, min_free_kbytes, watermark_scale_factor) live in
the base class. Cloud-specific attrs remain in each subclass.

Addresses Zac review: GkeSwapConfig & EksSwapConfig should inherit from BaseSwapConfig.
…etries arg, suppress invalid-name on from_spec base class definition
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant