Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions .github/codeql/codeql-config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,10 @@ name: STAR CodeQL configuration
# Issues in bundled libraries (htslib, SIMDe, opal, SGT) must be tracked
# and fixed upstream; scanning them produces noise, not actionable findings.
paths-ignore:
- source/htslib # bundled HTSlib 1.21 (upstream: samtools/htslib)
- '**/htslib/**' # bundled HTSlib 1.21 (upstream: samtools/htslib)
- source/SimpleGoodTuring # bundled SGT implementation
- source/build # CMake build tree — contains FetchContent deps (parasail, doctest, zlib)
- '**/build/**' # CMake build tree
- '**/_deps/**' # FetchContent deps (parasail, doctest, zlib, cpp-httplib, nlohmann/json)

queries:
- uses: security-extended # OWASP top-10 + security-and-quality suite
39 changes: 37 additions & 2 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@ name: Build and Test

on:
push:
branches: [master]
branches: [main]
pull_request:
branches: [master]
branches: [main]

jobs:
build-linux:
Expand Down Expand Up @@ -109,3 +109,38 @@ jobs:
with:
name: STAR-windows-x86_64
path: source\build\STAR.exe

# Verify the genomeGenerate suffix-array rewrite (PR #2687) produces a
# byte-identical index vs the baseline on main, and is deterministic across
# thread counts and chunk layouts.
validate-genome-index:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v7
with:
fetch-depth: 0 # need main history to build the baseline

- name: Install dependencies
run: |
sudo apt-get update
sudo apt-get install -y zlib1g-dev ninja-build

- name: Build candidate (this branch)
run: |
cd source
cmake -B build -G Ninja -DCMAKE_BUILD_TYPE=Release -DSTAR_BUILD_TESTS=OFF
cmake --build build
cp build/STAR "$GITHUB_WORKSPACE/STAR_new"

- name: Build baseline (main, pre-#2687)
run: |
git worktree add /tmp/star-main origin/main
cd /tmp/star-main/source
cmake -B build -G Ninja -DCMAKE_BUILD_TYPE=Release -DSTAR_BUILD_TESTS=OFF
cmake --build build
cp build/STAR "$GITHUB_WORKSPACE/STAR_old"

- name: Genome-index equivalence (old vs new, 1 vs 16 threads, multi-chunk)
run: |
chmod +x extras/tests/scripts/validate_genome_equivalence.sh
extras/tests/scripts/validate_genome_equivalence.sh "$GITHUB_WORKSPACE/STAR_old" "$GITHUB_WORKSPACE/STAR_new"
4 changes: 2 additions & 2 deletions .github/workflows/codeql.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@ name: CodeQL Security Analysis

on:
push:
branches: [master]
branches: [main]
pull_request:
branches: [master]
branches: [main]
schedule:
- cron: '0 6 * * 1' # Weekly Monday 6am UTC

Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ on:
workflow_dispatch:
inputs:
tag_name:
description: 'Release tag (e.g. v2.7.11c). Leave blank to auto-generate from commit hash.'
description: 'Release tag (e.g. v0.0.1). Leave blank to auto-generate from commit hash.'
required: false
default: ''

Expand Down Expand Up @@ -184,14 +184,14 @@ jobs:
echo "tag=${{ inputs.tag_name }}" >> $GITHUB_OUTPUT
else
SHORT_SHA=$(git rev-parse --short HEAD)
echo "tag=2.7.11c_${SHORT_SHA}" >> $GITHUB_OUTPUT
echo "tag=0.0.1_${SHORT_SHA}" >> $GITHUB_OUTPUT
fi

- name: Create GitHub Release
uses: softprops/action-gh-release@v3
with:
tag_name: ${{ steps.tag.outputs.tag }}
name: STAR ${{ steps.tag.outputs.tag }}
name: STAR-Cross ${{ steps.tag.outputs.tag }}
generate_release_notes: true
files: |
STAR-linux-x86_64
Expand Down
13 changes: 8 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,16 @@
STAR 2.7.11c (Community Fork)
STAR-Cross (Community Fork of STAR)
==========
Spliced Transcripts Alignment to a Reference
© Alexander Dobin, 2009-2024
https://www.ncbi.nlm.nih.gov/pubmed/23104886

> **Fork Notice:** The upstream STAR repository (`alexdobin/STAR`) appears to be unmaintained as of 2025
> (see [community discussion](https://www.reddit.com/r/bioinformatics/comments/1joyd0p/the_star_aligner_is_unmaintained_now/)).
> This fork maintains full output compatibility with STAR 2.7.11b while adding **Windows native support**,
> **macOS ARM (Apple Silicon) support**, and upstream bug fixes.
> All changes are validated to produce byte-identical results to the original 2.7.11b release.
> Release binaries are versioned as `2.7.11c_<commit>` for traceability.
> **STAR-Cross** is a cross-platform community fork derived from STAR 2.7.11c. It maintains output
> compatibility with STAR 2.7.11b while adding **Windows native support**, **macOS ARM (Apple Silicon)
> support**, **big-endian support**, **referenceless CRAM output**, and upstream bug/perf fixes.
> Alignment output is validated to be byte-identical to the original 2.7.11b release.
> Releases start at **v0.0.1**; the binary reports its version as `STAR-Cross 0.0.1_<commit>`.

ORIGINAL AUTHOR
===============
Expand Down Expand Up @@ -338,6 +339,7 @@ FORK CHANGES
* **Big-endian support** (`source/byteOrder.h`): the genome, suffix array and packed arrays are accessed as a little-endian byte stream regardless of host byte order, fixing the "next index is smaller than previous" failure on big-endian hosts (s390x, ppc64). Guarded so little-endian builds keep the native single-instruction load (zero performance/behavior change); only known big-endian compiles take the portable byte-wise path. Ported from the patch in upstream issue #2690. *Compile-validated only — no big-endian runner in CI.*

### Performance Optimizations
* Multicore `genomeGenerate` suffix-array build (upstream PR #2687): parallel prefix-bucketed chunk sort with sub-binning, optional in-memory chunk retention, and a "skip first word" comparator fast-path. Index output is **byte-identical** to the previous builder (verified in CI across thread counts and chunk layouts via `extras/tests/scripts/validate_genome_equivalence.sh`). Reconciled with the big-endian-safe comparator and MSVC (no native `__uint128`).
* MSVC compiler: `/O2 /Ob2 /Oi /GL` with `/LTCG` link-time optimization (Windows)
* SRW locks replacing CRITICAL_SECTION (faster mutex, Windows)
* 4MB ifstream read buffer for FASTQ input (Windows)
Expand All @@ -357,6 +359,7 @@ FORK CHANGES
- Trim stitched transcripts to the junction-relevant side before rescoring
- Fix cross-mate `roStart` computation (`a2.Lread` instead of `a1.Lread` on negative strand)
* macOS: spawn `readFilesCommand` via `posix_spawnp` instead of `vfork()`+`execlp()`+`exit()`, fixing "Failed spawning readFilesCommand" with gzipped input on macOS (upstream issue #2663). Avoids the undefined behavior of calling `exit()` in a `vfork` child. POSIX-only path; the Windows `system()`-based path is unchanged.
* Allow the WASP `vW:i` tag in SAM output, not just BAM (upstream PR #2617): `--waspOutputMode` no longer requires `--outSAMtype BAM`, and `vW` is emitted in the SAM/CRAM paths.

### Project Quality
* C++17 standard (upgraded from C++11)
Expand Down
Loading
Loading