aiming-lab · Wbaker7702 · Dec 4, 2025 · Dec 4, 2025 · Dec 4, 2025 · Dec 4, 2025
diff --git a/Agent0/ENTERPRISE_GUIDE.md b/Agent0/ENTERPRISE_GUIDE.md
@@ -0,0 +1,64 @@
+# Enterprise Readiness Guide
+
+This guide upgrades the operational UX for **Agent0** deployments by documenting security integration, compliance expectations, lint/audit routines, and reproducible builds.
+
+## 1. UX / Operational Quality of Life
+- **Standardized environment variables**: use a `.env` or secret manager so local and CI setups share the same configuration keys.
+- **Clear paths**: keep all runtime artifacts under a single root (e.g., `$STORAGE_PATH`) to simplify cleanup and audits.
+- **Runbook-first**: keep the primary workflow in a single script or Make target to reduce tribal knowledge.
+
+### Suggested environment variables
+| Variable | Purpose |
+| --- | --- |
+| `STORAGE_PATH` | Central location for artifacts and checkpoints. |
+| `HUGGINGFACENAME` | Hugging Face token or username. |
+| `WANDB_API_KEY` | Weights & Biases API key. |
+| `SANDBOX_API_URLS` | Comma-separated list of sandbox endpoints for tool execution. |
+
+## 2. Security Integration
+- **Secrets management**: load credentials through your enterprise secret manager; avoid `.env` in production.
+- **Network policy**: restrict outbound access from training workers to only model, logging, and sandbox endpoints.
+- **Artifact integrity**: store checkpoints in immutable object storage with bucket versioning enabled.
+- **Sandbox isolation**: treat the sandbox service as untrusted execution; use network isolation and per-request rate limiting.
+
+## 3. Compliance & Audit
+- **Data lineage**: log dataset versions, question generation seeds, and filtering thresholds for every training run.
+- **Model governance**: keep a manifest with model hash, base model ID, and training configuration.
+- **Access control**: enforce RBAC on checkpoints, logs, and sandbox services.
+- **Retention**: define retention policies for generated data and intermediate artifacts.
+
+## 4. Linting & Audit Checklist
+Use these as baseline checks in CI (adjust for your environment). A `Makefile` target is provided for quick runs.
+
+- **Python linting**: `ruff` or `flake8` for style and static issues.
+- **Type checks**: `mypy` for critical modules.
+- **Dependency audit**: `pip-audit` or `safety` for known CVEs.
+- **License scan**: `pip-licenses` to ensure dependency compliance.
+
+## 5. Build & Release Hygiene
+- **Reproducible builds**: pin all dependencies in `requirements.txt` and use a lockfile for CI.
+- **Immutable tags**: tag releases with model checkpoint hashes.
+- **Container build**: prefer a single base image for all training and evaluation jobs to avoid drift.
+
+### Example CI sequence
+```bash
+python -m pip install -r requirements.txt
+python -m pip install ruff mypy pip-audit pip-licenses
+ruff check .
+mypy .
+pip-audit
+pip-licenses --format=markdown
+```
+
+### Example local sequence
+```bash
+python -m pip install ruff mypy pip-audit pip-licenses
+make lint
+make audit
+make build
+```
+
+## 6. Suggested Enhancements (Roadmap)
+- Add a `Makefile` or `taskfile.yml` with standardized commands (`lint`, `audit`, `train`, `evaluate`).
+- Add a `SECURITY.md` with responsible disclosure process and contact info.
+- Add CI workflows for linting and dependency audits.
diff --git a/Agent0/Makefile b/Agent0/Makefile
@@ -0,0 +1,12 @@
+.PHONY: lint audit build
+
+lint:
+	python -m ruff check .
+	python -m mypy .
+
+audit:
+	python -m pip-audit
+	python -m pip-licenses --format=markdown
+
+build:
+	python -m pip install -r requirements.txt --dry-run
diff --git a/Agent0/README.md b/Agent0/README.md
@@ -119,4 +119,8 @@ If you find this work helpful, please consider citing our paper:
   author={Xia, Peng and Zeng, Kaide and Liu, Jiaqi and Qin, Can and Wu, Fang and Zhou, Yiyang and Xiong, Caiming and Yao, Huaxiu},
   journal={arXiv preprint arXiv:2511.16043},
   year={2025}
-}
+}
+```
+
+## 🏢 Enterprise Readiness
+For security integration, compliance guidance, linting, and reproducible build recommendations, see [ENTERPRISE_GUIDE.md](./ENTERPRISE_GUIDE.md).
diff --git a/Agent0/curriculum_train/examples/reward_function/curriculum_reward.py b/Agent0/curriculum_train/examples/reward_function/curriculum_reward.py
@@ -1,4 +1,4 @@
-# Copyright 2024 Bytedance Ltd. and/or its affiliates
+# Copyright 2024-2026 Bytedance Ltd. and/or its affiliates
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
@@ -27,13 +27,17 @@
 from sklearn.cluster import AgglomerativeClustering
 import numpy as np
 
-STORAGE_PATH = os.getenv("STORAGE_PATH","")
+STORAGE_PATH = os.getenv("STORAGE_PATH", "")
+
 
 def _bleu_distance_matrix(sentences):
     n = len(sentences)
     dist = np.zeros((n, n))
     smoother = SmoothingFunction().method1
-    for i in tqdm(range(n), desc="  - Calculating BLEU distance matrix", leave=False):
+    for i in tqdm(
+            range(n),
+            desc="  - Calculating BLEU distance matrix",
+            leave=False):
         for j in range(i, n):
             if i == j:
                 score = 1.0
@@ -44,66 +48,81 @@ def _bleu_distance_matrix(sentences):
             dist[i, j] = dist[j, i] = 1 - score
     return dist
 
+
 def cluster_share_per_problem(
-        problems,
-        distance_threshold: float = 0.5,
-        linkage: str = "average"):
+    problems, distance_threshold: float = 0.5, linkage: str = "average"
+):
     if not problems:
         return []
-    print('start clustering')
+    print("start clustering")
     start_time = time.time()
     dist_mat = _bleu_distance_matrix(problems)
 
     clustering = AgglomerativeClustering(
         n_clusters=None,
         distance_threshold=distance_threshold,
         metric="precomputed",
-        linkage=linkage
+        linkage=linkage,
     )
     labels = clustering.fit_predict(dist_mat)
-    print(f'end clustering, time: {time.time() - start_time}')
+    print(f"end clustering, time: {time.time() - start_time}")
     total = len(problems)
     cluster_size = Counter(labels)
     cluster_ratio = {lab: sz / total for lab, sz in cluster_size.items()}
 
     proportions = [cluster_ratio[lab] for lab in labels]
     return proportions
 
+
 def generate_temp_filename(prefix="temp", suffix=".json"):
     timestamp = int(time.time() * 1000)
     rand_part = random.randint(0, 99999)
     return f"{STORAGE_PATH}/temp_results/{prefix}_{timestamp}_{rand_part}{suffix}"
+
+
 def split_list(lst, n=4):
     k, m = divmod(len(lst), n)
-    return [lst[i*k + min(i, m):(i+1)*k + min(i+1, m)] for i in range(n)]
+    return [lst[i * k + min(i, m): (i + 1) * k + min(i + 1, m)]
+            for i in range(n)]
+
 
 os.environ["NO_PROXY"] = "0.0.0.0,127.0.0.1"
 
-def fetch(index,i):
-    response = requests.get(f"http://0.0.0.0:{5000+index}/hello?name={i}")
+
+def fetch(index, i):
+    response = requests.get(f"http://0.0.0.0:{5000 + index}/hello?name={i}")
     return True
 
+
 def generate_results(data):
-    datas = split_list(data,4)
-    random_names = [generate_temp_filename(prefix=f"temp_{i}", suffix=".json") for i in range(4)]
+    datas = split_list(data, 4)
+    random_names = [
+        generate_temp_filename(
+            prefix=f"temp_{i}",
+            suffix=".json") for i in range(4)]
     for i in range(4):
-        with open(random_names[i],'w') as f:
-            json.dump(datas[i],f,indent=4)
+        with open(random_names[i], "w") as f:
+            json.dump(datas[i], f, indent=4)
 
     final_results = []
     with ThreadPoolExecutor(max_workers=4) as executor:
-        futures = [executor.submit(fetch, i,random_names[i]) for i in range(4)]
+        futures = [executor.submit(fetch, i, random_names[i])
+                   for i in range(4)]
 
-        for future in tqdm(as_completed(futures), total=len(futures), desc="  - Servers processing"):
-            future.result() # Simplified to just get the result
+        for future in tqdm(
+                as_completed(futures),
+                total=len(futures),
+                desc="  - Servers processing"):
+            future.result()  # Simplified to just get the result
 
     for i in tqdm(range(4), desc="  - Reading result files", leave=False):
-        with open(random_names[i].replace('.json','_results.json'),'r') as f:
+        with open(random_names[i].replace(".json", "_results.json"), "r") as f:
             final_results.extend(json.load(f))
     for i in range(4):
-        os.remove(random_names[i].replace('.json','_results.json'))
+        os.remove(random_names[i].replace(".json", "_results.json"))
     return final_results
 
+
 def format_reward(predict: str) -> float:
     pattern = re.compile(r"<think>.*</think>.*\\boxed\{.*\}.*", re.DOTALL)
     format_match = re.fullmatch(pattern, predict)
@@ -114,7 +133,11 @@ def accuracy_reward(predict: str, ground_truth: str) -> float:
     answer = extract_boxed_content(predict)
     return 1.0 if grade_answer(answer, ground_truth) else 0.0
 
-def calculate_tool_reward(predict: str, weight: float = 0.05, cap: int = 4) -> float:
+
+def calculate_tool_reward(
+        predict: str,
+        weight: float = 0.05,
+        cap: int = 4) -> float:
     if not predict:
         return 0.0
 
@@ -125,28 +148,54 @@ def calculate_tool_reward(predict: str, weight: float = 0.05, cap: int = 4) -> f
     return capped_calls * weight
 
 
-def compute_score(predicts: List[str], ground_truths: List[str], format_weight: float = 0.1, file_path: str = "") -> List[Dict[str, float]]:
+def compute_score(
+    predicts: List[str],
+    ground_truths: List[str],
+    format_weight: float = 0.1,
+    file_path: str = "",
+) -> List[Dict[str, float]]:
     results = []
-    with open('test.json','w') as f:
-        json.dump(predicts,f,indent=4)
+    with open("test.json", "w") as f:
+        json.dump(predicts, f, indent=4)
     for i in tqdm(range(len(predicts)), desc=" - Parsing predictions"):
-        questions = re.findall(r"<question>(.*?)</question>", predicts[i], re.DOTALL)
+        questions = re.findall(
+            r"<question>(.*?)</question>",
+            predicts[i],
+            re.DOTALL)
         answers = extract_boxed_content(predicts[i])
         if questions and answers:
             try:
                 question = questions[-1].strip()
                 answer = answers[-1].strip()
                 results.append({"question": question, "answer": answer})
-            except:
+            except BaseException:
                 results.append({"question": "", "answer": ""})
         else:
             results.append({"question": "", "answer": ""})
 
     final_results = generate_results(results)
-    penalty = cluster_share_per_problem([result['question'] for result in final_results], distance_threshold=0.5)
+    penalty = cluster_share_per_problem(
+        [result["question"] for result in final_results], distance_threshold=0.5
+    )
     assert len(penalty) == len(final_results)
     scores = []
-    for i in tqdm(range(len(final_results)), desc=" - Calculating final scores"):
-        final_score = (min(final_results[i]["score"],1-final_results[i]["score"]) if final_results[i]['question'] else -1)-penalty[i]+calculate_tool_reward(predicts[i])
-        scores.append({"overall": final_score,"format": 1 if final_results[i]['question'] else 0,"accuracy": penalty[i],"tool_reward": calculate_tool_reward(predicts[i])})
-    return scores
+    for i in tqdm(range(len(final_results)),
+                  desc=" - Calculating final scores"):
+        final_score = (
+            (
+                min(final_results[i]["score"], 1 - final_results[i]["score"])
+                if final_results[i]["question"]
+                else -1
+            )
+            - penalty[i]
+            + calculate_tool_reward(predicts[i])
+        )
+        scores.append(
+            {
+                "overall": final_score,
+                "format": 1 if final_results[i]["question"] else 0,
+                "accuracy": penalty[i],
+                "tool_reward": calculate_tool_reward(predicts[i]),
+            }
+        )
+    return scores
diff --git a/Agent0/curriculum_train/examples/reward_function/math.py b/Agent0/curriculum_train/examples/reward_function/math.py
@@ -1,4 +1,4 @@
-# Copyright 2024 Bytedance Ltd. and/or its affiliates
+# Copyright 2024-2026 Bytedance Ltd. and/or its affiliates
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
@@ -28,19 +28,24 @@ def accuracy_reward(predict: str, ground_truth: str) -> float:
     answer = extract_boxed_content(predict)
     try:
         return 1.0 if grade_answer(answer, ground_truth) else 0.0
-    except:
+    except BaseException:
         return 0.0
 
 
-def compute_score(predicts: List[str], ground_truths: List[str], format_weight: float = 0.1) -> List[Dict[str, float]]:
+def compute_score(
+    predicts: List[str], ground_truths: List[str], format_weight: float = 0.1
+) -> List[Dict[str, float]]:
     scores = []
     for predict, ground_truth in zip(predicts, ground_truths):
-        predict = re.sub(r"\s*(<|>|/)\s*", r"\1", predict)  # handle qwen2.5vl-32b format
+        predict = re.sub(
+            r"\s*(<|>|/)\s*", r"\1", predict
+        )  # handle qwen2.5vl-32b format
         format_score = format_reward(predict)
         accuracy_score = accuracy_reward(predict, ground_truth)
         scores.append(
             {
-                "overall": (1 - format_weight) * accuracy_score + format_weight * format_score,
+                "overall": (1 - format_weight) * accuracy_score
+                + format_weight * format_score,
                 "format": format_score,
                 "accuracy": accuracy_score,
             }

diff --git a/Agent0/curriculum_train/examples/reward_function/r1v.py b/Agent0/curriculum_train/examples/reward_function/r1v.py
@@ -1,4 +1,4 @@
-# Copyright 2024 Bytedance Ltd. and/or its affiliates
+# Copyright 2024-2026 Bytedance Ltd. and/or its affiliates
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
@@ -19,15 +19,18 @@
 
 
 def format_reward(predict: str) -> float:
-    pattern = re.compile(r"<think>.*?</think>\s*<answer>.*?</answer>", re.DOTALL)
+    pattern = re.compile(
+        r"<think>.*?</think>\s*<answer>.*?</answer>",
+        re.DOTALL)
     format_match = re.fullmatch(pattern, predict)
     return 1.0 if format_match else 0.0
 
 
 def accuracy_reward(predict: str, ground_truth: str) -> float:
     try:
         content_match = re.search(r"<answer>(.*?)</answer>", predict)
-        given_answer = content_match.group(1).strip() if content_match else predict.strip()
+        given_answer = (content_match.group(1).strip()
+                        if content_match else predict.strip())
         if grade_answer(given_answer, ground_truth.strip()):
             return 1.0
 
@@ -37,11 +40,18 @@ def accuracy_reward(predict: str, ground_truth: str) -> float:
     return 0.0
 
 
-def compute_score(predict: str, ground_truth: str, format_weight: float = 0.5) -> Dict[str, float]:
+def compute_score(
+    predict: str, ground_truth: str, format_weight: float = 0.5
+) -> Dict[str, float]:
     format_score = format_reward(predict)
     accuracy_score = accuracy_reward(predict, ground_truth)
     return {
-        "overall": (1 - format_weight) * accuracy_score + format_weight * format_score,
+        "overall": (
+            1 -
+            format_weight) *
+        accuracy_score +
+        format_weight *
+        format_score,
         "format": format_score,
         "accuracy": accuracy_score,
     }