Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
479 changes: 237 additions & 242 deletions ARCHITECTURE.md

Large diffs are not rendered by default.

704 changes: 607 additions & 97 deletions README.md

Large diffs are not rendered by default.

75 changes: 65 additions & 10 deletions docs/features/auth-and-iam.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,73 @@

## Source Files

- Middleware entry: [../../api/lib/authlib.mjs](../../api/lib/authlib.mjs)
- IAM policy: [../../api/lib/iam.mjs](../../api/lib/iam.mjs)
- Token helpers: [../../api/lib/jwtAuth.mjs](../../api/lib/jwtAuth.mjs)
- OAuth helper: [../../api/lib/googleAuthHelper.mjs](../../api/lib/googleAuthHelper.mjs)
| File | Purpose |
|------|---------|
| `api/lib/authlib.mjs` | Express middleware — `validateAdminMiddleware`, `validateStudentMiddleware` |
| `api/lib/iam.mjs` | Low-level permission check helpers (course-scoped reads) |
| `api/lib/jwtAuth.mjs` | Signs and verifies permission-snapshot JWTs |
| `api/lib/googleAuthHelper.mjs` | Verifies Google ID tokens, enforces `@berkeley.edu` domain |
| `api/lib/userlib.mjs` | Resolves user role from `users` + `course_permissions` tables |
| `api/v2/Routes/login/index.js` | `POST /api/v2/login` handler |
| `api/v2/Routes/isadmin/index.js` | `GET /api/v2/isadmin` handler |

## Authentication Flow (step by step)

1. Browser calls Google OAuth and receives an **ID token**.
2. Frontend `POST /api/v2/login` with `{ token: "<google-id-token>" }`.
Comment on lines +12 to +18

Copilot AI Apr 15, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The source-file table and login-flow steps describe POST /api/v2/login with a JSON body token, but the current implementation is GET /api/v2/login with the Google ID token in the Authorization: Bearer ... header (see api/v2/Routes/login/index.js and website/src/views/login.js). Please update the docs to match the actual request method and token transport.

Suggested change
| `api/v2/Routes/login/index.js` | `POST /api/v2/login` handler |
| `api/v2/Routes/isadmin/index.js` | `GET /api/v2/isadmin` handler |
## Authentication Flow (step by step)
1. Browser calls Google OAuth and receives an **ID token**.
2. Frontend `POST /api/v2/login` with `{ token: "<google-id-token>" }`.
| `api/v2/Routes/login/index.js` | `GET /api/v2/login` handler |
| `api/v2/Routes/isadmin/index.js` | `GET /api/v2/isadmin` handler |
## Authentication Flow (step by step)
1. Browser calls Google OAuth and receives an **ID token**.
2. Frontend calls `GET /api/v2/login` with `Authorization: Bearer <google-id-token>`.

Copilot uses AI. Check for mistakes.
3. `googleAuthHelper.mjs` verifies the token against Google's public keys.
- Rejects if the `hd` (hosted domain) field is not `berkeley.edu`.
- Rejects if the token is expired or signature invalid.
4. `userlib.mjs` queries the `users` table by `email`. If the user does not exist yet, a new row is inserted.
5. `userlib.mjs` queries `course_permissions` to build a list of `{ course_id, permission_level }` pairs.
6. A **JWT** is signed with `JWT_SECRET` (from `.env`) and returned to the browser.
Comment on lines +22 to +24

Copilot AI Apr 15, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

api/lib/userlib.mjs does not currently resolve roles from users/course_permissions (it only delegates admin checks to unifiedConfig.mjs and student checks to dbHelper.mjs). Role resolution and permission snapshots are built in api/lib/iam.mjs (buildPermissionSnapshot) and used by the login route. Adjust the doc to point to the correct modules so readers don’t chase the wrong implementation.

Copilot uses AI. Check for mistakes.
- Payload includes permission snapshot fields such as `is_super`, `course_roles`, `has_course_admin`, `has_instructor`, and `has_student`.
- Expiry is controlled by `JWT_EXPIRES_IN` (default `12h`).
7. All subsequent requests must carry `Authorization: Bearer <jwt>`.

## Middleware Behaviour

### `validateAdminMiddleware`
- Verifies JWT signature and expiry.
- Resolves effective role via IAM helpers using the JWT snapshot (and DB lookups when needed).
- Enforces admin/system permissions through `iam.mjs` checks.
- Returns `403` if any check fails.
Comment on lines +24 to +35

Copilot AI Apr 15, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc says the JWT payload includes role and that validateAdminMiddleware checks the role claim, but the login handler signs tokens with a permission snapshot (is_super, course_roles, has_course_admin, etc.) and middleware resolves role via iam.mjs rather than trusting a role field. Please update this description to match the actual JWT claims and authorization checks.

Copilot uses AI. Check for mistakes.

### `validateStudentMiddleware`
- Verifies JWT signature and expiry.
- Queries `students` table to confirm the requesting email is enrolled in the target course.
- Returns `403` if not enrolled.

## IAM Roles

| Role | Source of Truth | Permissions |
|------|----------------|-------------|
| `super_admin` | `gradeview.admins` in `config.json` (migrate to DB) | All courses, all actions, GradeSync admin |
| `course_admin` | `course_permissions.permission_level = 'owner'` | Manage sync and config for bound courses |
Comment on lines +46 to +47

Copilot AI Apr 15, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IAM roles table states super_admin is sourced from gradeview.admins in config.json, but the runtime super-admin check is hard-coded to SUPER_ADMIN_EMAIL = 'weszhang@berkeley.edu' in api/lib/iam.mjs (and unifiedConfig.mjs). Update the table to reflect the current source of truth (and, if desired, document a planned migration path to DB-backed global admins).

Copilot uses AI. Check for mistakes.
| `instructor` | `course_permissions.permission_level = 'editor'` | View class roster and grades; no GradeSync admin |
| `ta` | `course_permissions.permission_level = 'viewer'` | Same as instructor |
| `student` | `students` table (`email + course_id`) | Own grades only, per enrolled course |

## Key Rules

- Staff/admin authorization is DB-only (`users` + `course_permissions`).
- Student access is course-scoped.
- Super admin is the only global bypass.
- **DB is the authoritative source** for staff and student permissions.
- Config file `admins`/`instructors`/`tas` lists are **legacy** — migrate them into `users` + `course_permissions` rows and do not rely on them for runtime auth.
- Students are **not** in the `users` table; they are identified solely by their email in the `students` table.
- `super_admin` is the only role that can read across course boundaries.
- Every API query on grade data must include a `course_id` scope — never fetch grades without one.

## Adding a New Staff Member

## Related API Routes
```sql
-- 1. Insert user (if they have not logged in yet)
INSERT INTO users (email, name, role)
VALUES ('newperson@berkeley.edu', 'New Person', 'instructor')
ON CONFLICT (email) DO NOTHING;

- Login: [../../api/v2/Routes/login/index.js](../../api/v2/Routes/login/index.js)
- Admin check: [../../api/v2/Routes/isadmin/index.js](../../api/v2/Routes/isadmin/index.js)
-- 2. Grant course permission
INSERT INTO course_permissions (course_id, user_id, permission_level, granted_by)
SELECT c.id, u.id, 'editor', (SELECT id FROM users WHERE email = 'admin@berkeley.edu')
FROM courses c, users u
WHERE c.gradescope_course_id = '1098053'
AND u.email = 'newperson@berkeley.edu';
```
65 changes: 56 additions & 9 deletions docs/features/config-and-settings.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,63 @@

## Source Files

- Config routes: [../../api/v2/Routes/config/index.js](../../api/v2/Routes/config/index.js)
- Unified config helpers: [../../api/lib/unifiedConfig.mjs](../../api/lib/unifiedConfig.mjs)
- Runtime config template: [../../config.example.json](../../config.example.json)
| File | Purpose |
|------|---------|
| `api/lib/unifiedConfig.mjs` | Loads and exposes `config.json` to the API |
| `api/v2/Routes/config/index.js` | REST endpoints for reading/writing course config |
| `gradesync/api/config_manager.py` | Python equivalent — loads `config.json` for GradeSync |
| `config.example.json` | Full annotated template for `config.json` |

## Current Shape
## Config File Location

- `gradeview`: auth/UI-level config
- `gradesync`: per-course sync settings + global sync settings
The unified runtime config is `config.json` at the **repository root**. In current Compose files, it is mounted read-only into the `gradesync` container:

## Notes
```yaml
# docker-compose.yml
gradesync:
volumes:
- ./config.json:/app/config.json:ro
```

- Keep permissions in DB tables, not in config files.
- Keep config format stable and versioned for safer rollout.
Changes to `config.json` require a container restart — there is no live reload.

Copilot AI Apr 15, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence says config changes require a container restart and that there is no live reload. That is true for GradeSync (ConfigManager loads /app/config.json at init), but the API loadUnifiedConfig() reads /api/config.json from disk on every call and will pick up changes immediately if the file is mounted. Consider clarifying the different reload behavior for API vs. GradeSync.

Suggested change
Changes to `config.json` require a container restart — there is no live reload.
Reload behavior differs by service: GradeSync loads `/app/config.json` at startup, so changes require a `gradesync` container restart; the API reads `/api/config.json` from disk on each `loadUnifiedConfig()` call, so mounted file changes are picked up on subsequent requests.

Copilot uses AI. Check for mistakes.

## Top-level Structure

```jsonc
{
"gradeview": { ... }, // OAuth client ID + global admin list
"gradesync": { ... } // Per-course data sync + global settings
}
```

Full field-by-field reference: see `README.md` → **Config File Reference**.

## Loading Order (API)

1. Route modules that need config import `loadUnifiedConfig()` from `api/lib/unifiedConfig.mjs` (for example, `api/v2/Routes/config/index.js`).
2. `loadUnifiedConfig()` reads `config.json` from the API container root path (`/api/config.json`).
3. Each call reads and parses `config.json` from disk.
4. Subsequent route calls re-invoke `loadUnifiedConfig()`/helper accessors in `unifiedConfig.mjs`.

## Config vs. Database — Rule of Thumb

| Data | Where to store |
|------|----------------|
| OAuth client ID | `config.json` |
| Global admin emails | `config.json` → migrate to `users` table over time |
| Course sync source IDs and credentials | `config.json` |
| Per-course staff permissions | `course_permissions` DB table |
| Student enrollment | `students` DB table |
| Grade data | `assignments` + `submissions` DB tables |

Comment on lines +47 to +53

Copilot AI Apr 15, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “Config vs. Database” table implies the OAuth client ID lives in config.json, but Google token verification reads it from the gradeview_config DB table (api/lib/googleAuthHelper.mjs). If config.json is only a seed/migration input, it would be clearer to mark the DB as the runtime source of truth and mention the migration step explicitly.

Suggested change
| OAuth client ID | `config.json` |
| Global admin emails | `config.json` → migrate to `users` table over time |
| Course sync source IDs and credentials | `config.json` |
| Per-course staff permissions | `course_permissions` DB table |
| Student enrollment | `students` DB table |
| Grade data | `assignments` + `submissions` DB tables |
| OAuth client ID | Seed/bootstrap in `config.json`; runtime source of truth is the `gradeview_config` DB table |
| Global admin emails | Seed/bootstrap in `config.json`; migrate/sync into the `users` table, which is the runtime authority |
| Course sync source IDs and credentials | `config.json` |
| Per-course staff permissions | `course_permissions` DB table |
| Student enrollment | `students` DB table |
| Grade data | `assignments` + `submissions` DB tables |
If a setting exists in both `config.json` and a DB table, treat the DB row as the runtime source of truth. `config.json` is only the seed/bootstrap input for values that are migrated or synced into the database.

Copilot uses AI. Check for mistakes.
Never use `config.json` as the runtime permission authority for staff/student access checks. The DB tables are canonical.

## Adding a New Course

1. Add a new entry to `gradesync.courses[]` in `config.json` (copy an existing entry as a template).
2. Set `general.id` to a unique snake_case string.
3. Set `gradesync.sources.gradescope.course_id` to the Gradescope numeric course ID.
4. Set `gradeview.buckets` to match the course grading scale.
5. Restart the `api` and `gradesync` containers.
6. Trigger an initial sync via the GradeSync admin UI or `POST /gradesync/sync/{course_id}`.

Copilot AI Apr 15, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Step 6 references triggering an initial sync via POST /gradesync/sync/{course_id}, but GradeSync sync routes are implemented under /api/sync/{course_id} (so via reverse proxy: /gradesync/api/sync/{course_id}). Update the endpoint here to match the actual FastAPI routes.

Suggested change
6. Trigger an initial sync via the GradeSync admin UI or `POST /gradesync/sync/{course_id}`.
6. Trigger an initial sync via the GradeSync admin UI or `POST /gradesync/api/sync/{course_id}`.

Copilot uses AI. Check for mistakes.
7. Insert the course row into the `courses` DB table if it does not appear after the first sync.
84 changes: 73 additions & 11 deletions docs/features/dev-and-deploy.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,80 @@
# Feature: Dev & Deployment

## Local Development
## Key Files

- Main setup: [../../README.md](../../README.md)
- Refresh script: [../../scripts/refresh.sh](../../scripts/refresh.sh)
- Dev compose: [../../docker-compose.dev.yml](../../docker-compose.dev.yml)
| File | Purpose |
|------|---------|
| `docker-compose.dev.yml` | Dev stack: bind-mounts, hot-reload, exposed debug ports |
| `docker-compose.yml` | Production stack: healthchecks, log rotation, TLS mounts |
| `Makefile` | Shortcuts for common operations |
| `scripts/dev-local.sh` | Run API + web natively; deps in Docker |
| `scripts/preflight.sh` | Full production smoke-test |
| `scripts/refresh.sh` | Pull latest images + restart |
| `scripts/deploy_to_gcp.sh` | One-shot GCE VM provisioning |

## Production-ish Deployment
## Local Development — Full Docker Mode

- Compose: [../../docker-compose.yml](../../docker-compose.yml)
- API Dockerfile: [../../api/Dockerfile](../../api/Dockerfile)
- Web Dockerfile: [../../website/server/Dockerfile](../../website/server/Dockerfile)
```bash
# First-time setup
cp .env.example .env && cp config.example.json config.json
# (fill in .env and config.json)
docker compose -f docker-compose.dev.yml up --build
```

## Rule of Thumb
All source directories are bind-mounted. Node.js services use `nodemon`; FastAPI uses `--reload`. Changes take effect without rebuilding.

Copilot AI Apr 15, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line claims the dev stack runs FastAPI with --reload, but docker-compose.dev.yml currently starts GradeSync with uvicorn ... --workers 2 (no reload). Either update the docs to reflect the current command (changes require a restart) or change the dev compose command to use --reload for the intended hot-reload behavior.

Suggested change
All source directories are bind-mounted. Node.js services use `nodemon`; FastAPI uses `--reload`. Changes take effect without rebuilding.
All source directories are bind-mounted. Node.js services use `nodemon`; GradeSync/FastAPI is currently started without `--reload`, so Python code changes require a container restart rather than a rebuild.

Copilot uses AI. Check for mistakes.

- Dev uses isolated local resources.
- Prod/staging changes must be migration-first and reversible.
```bash
# Tail logs for the main services
make dev-logs
# or
docker compose -f docker-compose.dev.yml logs -f api web gradesync
```

## Local Development — Native Frontend Mode

Faster React hot-reload; API and GradeSync still run in Docker.

```bash
./scripts/dev-local.sh
# or: make dev-local
```

Prerequisites: Node.js 18+ installed on the host. The script checks and frees ports 8000 and 3000 if needed.

## Production Deployment

Full walk-through: `README.md` → **Production Deployment (GCP)**.

Short version:
```bash
# On the production VM
git pull
docker compose pull # pull pre-built images from CI registry
docker compose up -d # replace running containers
```

Or run the smoke-test cycle:
```bash
make preflight # build → start → healthcheck → curl smoke tests
make preflight-down # tear down after testing
```

## Compose Differences: Dev vs. Prod

| Aspect | Dev (`docker-compose.dev.yml`) | Prod (`docker-compose.yml`) |
|--------|-------------------------------|-----------------------------|
| Source code | Bind-mounted (live edit) | Baked into image |
| Web command | `npm run react` (dev server) | `npm start` (serve built bundle) |
| API command | `npm run dev` (nodemon) | `npm start` |
| Health checks | None | All services (`wget/curl`) |
| Log rotation | None | `json-file` 10 MB × 3 |
| TLS certs | Not mounted | `/etc/letsencrypt` mounted RO |
| Proxy ports | `:80` only | `:80` + `:443` |
| DB proxy host port | `5433` (host) | Internal only |

## Rules

- Dev uses fully isolated local resources — never point `.env` at a production DB.
- Every schema change must be a numbered migration file in `gradesync/api/migrations/` **before** any application code that depends on it is deployed.
- Never use `--force`, `--no-verify`, or `DROP TABLE` in production without a backup snapshot.
- Use `make preflight` to validate production changes before flipping traffic.
99 changes: 86 additions & 13 deletions docs/features/gradesync.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,93 @@
# Feature: GradeSync

## Docs
## Related Docs

- Setup guide: [../../gradesync/SETUP_DEMO.md](../../gradesync/SETUP_DEMO.md)
- Demo guide: [../../gradesync/DEMO_COURSE_README.md](../../gradesync/DEMO_COURSE_README.md)
- Start here: [../../gradesync/START_HERE.md](../../gradesync/START_HERE.md)
- Demo course creation: [../../gradesync/DEMO_COURSE_README.md](../../gradesync/DEMO_COURSE_README.md)

## Source Files
## Key Source Files

- App entry: [../../gradesync/api/app.py](../../gradesync/api/app.py)
- Config manager: [../../gradesync/api/config_manager.py](../../gradesync/api/config_manager.py)
- DB models: [../../gradesync/api/core/models.py](../../gradesync/api/core/models.py)
- Sync service: [../../gradesync/api/sync/service.py](../../gradesync/api/sync/service.py)
| File | Purpose |
|------|---------|
| `gradesync/api/app.py` | FastAPI app factory, lifespan hooks, route registration |
| `gradesync/api/config_manager.py` | Reads `config.json` mounted at `/app/config.json` |
| `gradesync/api/schemas.py` | Pydantic models for all request/response bodies |
| `gradesync/api/core/db.py` | SQLAlchemy engine + session factory |
| `gradesync/api/core/models.py` | ORM table definitions (mirror of `docs/database/schema.sql`) |
| `gradesync/api/core/ingest.py` | Core grade normalization and upsert logic |
| `gradesync/api/core/ingest_optimized.py` | Bulk-upsert path for large courses (>500 students) |
| `gradesync/api/core/exam_policy.py` | Drop-lowest / bonus policy engine |
| `gradesync/api/services/gradescope.py` | Gradescope login session + submissions scraper |
| `gradesync/api/services/prairielearn.py` | PrairieLearn REST API client |
| `gradesync/api/services/iclicker.py` | iClicker instructor login + attendance fetch |
| `gradesync/api/sync/service.py` | Orchestrates per-course sync jobs (reads config, calls services) |
Comment on lines +19 to +22

Copilot AI Apr 15, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “Key Source Files” table lists gradesync/api/services/gradescope.py, prairielearn.py, and iclicker.py, but in the current codebase these are packages/directories (gradesync/api/services/gradescope/, .../prairielearn/, .../iclicker/) with entry points like sync.py / client.py. Update the file paths so readers can navigate to the real modules.

Copilot uses AI. Check for mistakes.
| `gradesync/api/queries/summary.py` | Materializes `summary_sheets` table after sync |
| `gradesync/api/migrations/` | Numbered SQL migration files applied in order |

## Scope
## Sync Architecture

- Pulls external grade data
- Normalizes and writes to PostgreSQL
- Serves sync APIs used by admin flows
```
config.json → sync/service.py → services/{gradescope,prairielearn,iclicker}.py
core/ingest.py (normalize + classify by assignment_categories)
DB upsert: students, assignments, submissions
queries/summary.py (refresh summary_sheets)
```

## Configuration Fields That Drive Sync

All sync-side configuration lives under `gradesync.courses[].gradesync` in `config.json`.

| Field | Effect |
|-------|--------|
| `sources.gradescope.enabled` | Whether to pull from Gradescope |
| `sources.gradescope.course_id` | Gradescope numeric course ID (from the course URL) |
| `sources.gradescope.sync_interval_hours` | Auto-sync frequency (0 = manual only) |
| `sources.prairielearn.enabled` | Whether to pull from PrairieLearn |
| `sources.prairielearn.course_id` | PrairieLearn course instance ID |
| `sources.iclicker.enabled` | Whether to pull from iClicker |
| `sources.iclicker.course_names` | Exact iClicker course names (must match portal exactly) |
| `database.enabled` | Write results to PostgreSQL |
| `database.use_as_primary` | When `true`, API reads from DB; when `false`, API calls external APIs live |
| `assignment_categories` | Pattern rules to classify assignment titles into grade categories |

## Triggering a Sync

```bash
# Manual sync for one course (replace COURSE_ID with internal id like cs10_fa25)
curl -X POST http://localhost/gradesync/sync/COURSE_ID \
-H "Authorization: Bearer <admin-jwt>"

# Check sync status
curl http://localhost/gradesync/status/COURSE_ID \
-H "Authorization: Bearer <admin-jwt>"
Comment on lines +60 to +67

Copilot AI Apr 15, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sync trigger examples use endpoints that do not exist in the FastAPI app. GradeSync defines sync routes under /api/sync/{course_id} (see gradesync/api/app.py), so through the reverse proxy the URL would be /gradesync/api/sync/{course_id} (or direct http://localhost:8001/api/sync/{course_id} in the full stack). Also, the example mentions an admin JWT header, but GradeSync routes are not currently protected by JWT auth, and there is no /status/{course_id} route. Please update the examples to match the implemented routes/auth model.

Copilot uses AI. Check for mistakes.
```

Or use the GradeSync admin panel in the web UI (accessible to `super_admin` and `course_admin`).

## Environment Variables Required

| Variable | Used by |
|----------|---------|
| `GRADESCOPE_EMAIL` | `services/gradescope.py` — login |
| `GRADESCOPE_PASSWORD` | `services/gradescope.py` — login |
| `PL_API_TOKEN` | `services/prairielearn.py` — bearer auth |
| `ICLICKER_USERNAME` | `services/iclicker.py` — login |
| `ICLICKER_PASSWORD` | `services/iclicker.py` — login |
| `DATABASE_URL` or `POSTGRES_*` | `core/db.py` — SQLAlchemy engine |

## Demo / Testing Data

To validate demo/sandbox setup, run a manual sync against a configured course:

```bash
cd gradesync
python sync_grades.py --list
python sync_grades.py <course_id>
```
Comment on lines +83 to +91

Copilot AI Apr 15, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc instructs running python create_demo_course.py ..., but there is no create_demo_course.py in the repository (searching the repo returns no matches). Either add/restore the script, or update the documentation to reference the actual demo data workflow that exists today (e.g., manual sync against a sandbox course).

Copilot uses AI. Check for mistakes.

See [SETUP_DEMO.md](../../gradesync/SETUP_DEMO.md) for demo workflow details.
Loading
Loading