Releases: vgteam/vg
vg 1.75.0 - Spike
Known Issues
In this release, the single-command haplotype sampling mode of vg giraffe will include kmc k-mer counting logs in the alignment output files, corrupting them. This issue is fixed in #4938.
Don't forget to mark the static binary executable:
chmod +x vg
Docker Image: quay.io/vgteam/vg:v1.75.0
Buildable Source Tarball: vg-v1.75.0.tar.gz
Includes source for vg and all submodules. Use this instead of Github's "Source Code" downloads; those will not build as they do not include code for bundled dependencies that the vg build process needs.
This release includes:
- Put back the code to finalize giraffe's paired end distribution after trying enough reads.
- vg CI builds of ARM containers should no longer segfault when upgrading libc
- Alignment scoring and mapping quality computation have been broken out of
GSSWAlignerand moved toAlignmentScorerandMappingQualityCalculator. vg depthwill now work on.gbzfiles.vg statsreturns correct aggregate stats even when some values are negativevg filter --tsv-outhas asoftclip_totaloption for convenience (softclip_end+softclip_start)- Speed up minimizer index construction.
- The
vg giraffe--rec-penalty-chainparameter has been split into--rec-penalty(for chaining),--rec-consistency-bonus(a bonus for haplotype consistency used during chaining but not incorporated into the chain score), and--rec-penalty-aln(used to penalize alignment scores per recombination). - Recombination-aware minimizer indexing is now always on when there are few enough haplotypes and the GBZ being indexed is not a path cover. Passing
--rec-modetovg minimizernow just makes it fail if recombination-aware minimizer indexing isn't on (because of too many haplotypes or the presence of synthetic path cover paths). - Recombination-aware mapping is now the default in
vg giraffe, if a recombination-aware minimizer index file is loaded and you are using thehifiorr10presets. To turn it off, pass--no-rec-mode. There's no longer a distinction between.pathminimizer and zipcodes files and normal ones. - The
hifiandr10presets forvg giraffehave been updated with tuned recombination penalty settings. vg giraffeno longer produces alignments with nonempty path and negative or zero score. Potential alignment that would reach or go below a score of 0 (perhaps because of--rec-penalty-aln) will be removed, and if needed an unmapped alignment record will be emitted for the read.- Significant time and memory optimizations to
vg giraffechaining/long-read mode --comments-as-tagsis now under test withvg giraffe's chaining codepath- Surject tests now test SAM tags in GAM with an actual
vg surjectcommand line vg surjectnow preserves unrecognized GAF tags as tags on output alignments (and GAF input in general retains tags)vg giraffechaining mode now properly retains input tags on unmapped readsvg giraffe --track-provenanceshould no longer crash with complaints about the filters. (Fixes an unreleased regression.)- Add option
vg filter --tsv-out "is_aligned"to return whether a read has an alignment - Add new
vg giraffefilter for low-scoring MAPQ 0 R10 reads vg stats -areports aggregate bp/alignment stats as per aligned reads, ignoring unmapped reads- Remove
--item-scaleand--points-per-possible-matchfromvg giraffeas needless unused complexity. vg giraffechaining mode allows negative affine-gap alignment scores to be log-gap rescored before tossing out negatively scoring alignments (minor accuracy improvement)- vg now uses an old version of the multi-arch support container in its CI Docker builds to work around tonistiigi/binfmt#298
vg find -Q/--paths-namedis now deprecated due to its partial-Protobuf outputvg findwill now index its target paths but not other haplotype paths.- vg should no longer position-index haplotype paths unnecessarily in commands using the
PathPositionOverlayHelper. vg filtercan accept GAMPs when it's told to expect them, and errors nicely with--input-mp-alns --tsv-out
Updated Submodules
Thegbwtgraph, libbdsg, and libvgio submodules have been updated.
vg 1.74.1 - Petrie
Don't forget to mark the static binary executable:
chmod +x vg
Docker Image: quay.io/vgteam/vg:v1.74.1
Buildable Source Tarball: vg-v1.74.1.tar.gz
Includes source for vg and all submodules. Use this instead of Github's "Source Code" downloads; those will not build as they do not include code for bundled dependencies that the vg build process needs.
This release includes:
- Added a little test file of reads for new Quickstart page
vg gbwtoption--subgraph-offor marking a GBZ graph a subgraph of another.- Fixed a bug with minimizer indexing that impacted recombination-aware mapping with Giraffe
Updated Submodules
Thegbwtgraph submodule has been updated.
vg 1.74.0 - Petrie
Don't forget to mark the static binary executable:
chmod +x vg
Docker Image: quay.io/vgteam/vg:v1.74.0
Buildable Source Tarball: vg-v1.74.0.tar.gz
Includes source for vg and all submodules. Use this instead of Github's "Source Code" downloads; those will not build as they do not include code for bundled dependencies that the vg build process needs.
This release includes:
- Added
vg giraffe --haplotype-samplingto automatically count kmers and haplotype-index and haplotype-sample the graph. Make sure to have kmc installed. Providing either a--kff-nameor--haplotype-namewill now also trigger generation of the other. To do one-reference sampling, continue to use--set-reference. To do non-diploid sampling with a certain number of haplotypes, use--no-diploid-samplingand--num-haplotypes. vg giraffewill no longer claim to be guessing a GBZ file you definitely told it to usevg paths -ufixed to use use reference path to help root the integrated snarl finder.vg gbwtoption--gbz-v1for writing GBZ version 1 for compatibility with older tools- Remove broken
vg paths --extract-vgoption which would extract a partial Protobuf graph file in a way so poorly explained as to be unusable. - Giraffe no longer ignores the parts of seeds that extend outside their graph nodes to the left when scoring them.
Note that this can reduce R10 read variant calling accuracy versus the previous release of vg.This regression was fixed before release (see below). - Giraffe
hifimapping preset has been re-tuned for new seed score distribution. - Chain visualizations no longer need to be panned or zoomed to show changes to the traceback.
- Chain visualizations no longer accumulate more and more transition lines when mousing in and out of a selected node.
- Giraffe no longer tries to position-index all haplotypes when showing work. If you need all haplotypes position-indexed for debugging chaining against them, use
--haplotype-positions. vg autoindex --gfawill error if the filename seems gzipped- vg CI data is no longer hosted under a user public_html directory
vg autoindexhas a-w samplingworkflow to make indexes for haplotype sampling- Revised Giraffe chain and alignment scoring. Alignments generated from chains are now no longer scored with the affine gap model used for base-level dynamic programming, but instead are scored with a logged-gap-score, variable-mismatch-penalty model borrowed from minimap2.
- Calling results from nanopore reads are now better than v1.73.0 again.
- Simplify an internal return value for
align_sequence_between(). vg giraffewill now stop with an error when the minimizers or zipcodes are older than the distance index they were supposedly generated from.- Add compile-time option to check ziptree iterator for missing seed-to-seed transitions
- augref-related options in vg paths renamed to be gref-related
Updated Submodules
Thegbwtgraph and libvgio submodules have been updated.
vg 1.73.0 - Ducky
Don't forget to mark the static binary executable:
chmod +x vg
Docker Image: quay.io/vgteam/vg:v1.73.0
Buildable Source Tarball: vg-v1.73.0.tar.gz
Includes source for vg and all submodules. Use this instead of Github's "Source Code" downloads; those will not build as they do not include code for bundled dependencies that the vg build process needs.
This release includes:
- Off-reference cover logic moved from
vg deconstructtovg paths.deconstructandcallnow have prototype logic to fully take advantage of it. - Fix regression in
vg clip,depth,simplifyand potentially some uses ofdeconstructandcall, that results from a change that ignores haplotypes in .vg files (to be consistent with how .gbz files would have been treated). - Better distance indexing in complex DAG snarls. Distance indexes should be re-made.
- vg wiki manpage links to subcommand sections now work
- Add option
--exclude-sampletovg paths - Stable GAF sorting is actually stable.
vg surject -Sno longer loses read names with GAM input- Added the total number of recombination in a chain, recombinant anchor are now marked in the chain dump file
- Very minor
vg giraffechaining mode speedup vg surjectnow takes--read-length shortand--read-length long, and sets low-complexity pruning correctly.vg giraffe's built-in surjection now uses low-complexity pruning by default for long reads.vg giraffenow has--no-XXXand--XXXflag options in pairs.- R plotting scripts no longer insist on installing all their dependencies
- Add
#definecompile-time option to print info about sampled haplotypes invg haplotypes - vg
call -Ccan now be used with-a - GBZ-to-GBZ chunking with
vg chunk --gbz(can choose all components or components by contig name). vg convertoptions--gbwtgraph-algorithmand--drop-haplotypeswork correctly together in GBZ to GFA conversion.vg describeworks better with old obsolete files.- Add
vg sim --use-average-lengthoption
Updated Submodules
The gbwt, gbwtgraph, and libbdsg submodules have been updated.
vg 1.72.0 - Littlefoot
Don't forget to mark the static binary executable:
chmod +x vg
Docker Image: quay.io/vgteam/vg:v1.72.0
Buildable Source Tarball: vg-v1.72.0.tar.gz
Includes source for vg and all submodules. Use this instead of Github's "Source Code" downloads; those will not build as they do not include code for bundled dependencies that the vg build process needs.
This release includes:
- Giraffe now just uses a single chaining pass, instead of a fragmenting pass and then a chaining pass
- Remove a useless check/error that can by definition never be raised
vg pathsshould again work on GBZ files containing only haplotypes- Giraffe/DeepVariant is now under CI test.
- Per-unit-test-set binaries (like
bin/unittest/snarl_distance_index) work again - Operations on GBZ graphs no longer hide haplotype paths from the
PathHandleGraphiteration functions.vgattempts to request the appropriate path senses when haplotype paths should be ignored for a particular operation. vg giraffechaining mode bugfix; minor accuracy improvement- vg now requires C++17 on Linux
vg giraffehelp now mentions itsstart[:end[:step]]range specification syntax- Help
vg autoindexnot error when indexing a graph with oversized snarls - Update
vg surjecthelptext to be clear that GAM is the default output format vg giraffein non-chaining mode will no longer mis-index pair distances when rescue failsvg giraffe --supplementarytest/build_graphexecutable should no longer be mistaken for malware- Fixed some non-wrapping
vg indexhelptext - GBZ version 2 with better compression for sequences (existing files can still be used).
Updated Submodules
The gbwt, gbwtgraph, libbdsg, libhandlegraph, libvgio, and sdsl-lite submodules have been updated.
vg 1.71.0 - Cera
Don't forget to mark the static binary executable:
chmod +x vg
Docker Image: quay.io/vgteam/vg:v1.71.0
Buildable Source Tarball: vg-v1.71.0.tar.gz
Includes source for vg and all submodules. Use this instead of Github's "Source Code" downloads; those will not build as they do not include code for bundled dependencies that the vg build process needs.
This release includes:
- Running
vg augmentwith no arguments will print helptext - Recombination aware chaining fixes
- Explainer explanations for reads now get organized into
explanation_<READ_NAME>directories. - Explainer explanations for reads now explain all chains, not just the best one.
- Explainer explanations for reads now include coordinates on all haplotypes, not just references.
- New
scripts/plot_chains.shscript to plot all the chains for explained reads against all the contigs - GBZ graphs store stable graph names (pggname).
- The information is copied to haplotype information files, minimizer indexes, and GFA/GAF headers.
- Some subcommands (e.g.
vg giraffe,vg haplotypes,vg pack) use the information to determine if the input files are compatible.
- Standalone GBWTGraph (
.gg) files are no longer supported. - New version of haplotype information (
.hapl) files with tags. Old files can still be read. - Haplotype sampling should work better with noisy kmer counts.
- Bugfix for
vg giraffechaining; improvements to accuracy and minor effect on runtime - Add option
vg haplotypes --ban-sample vg filtergives a clean error when passing files that don't look like GAMsvg pathsprints a warning if path criteria select 0 pathsvg gbwtandvg autoindexsupport GFA files with grammar-compressed walks.- Random double space in a
vg autoindexlogging line is now a single space - The random zip code tree test works with the
--rng-seedoption. -Poption added tovg snarlsandvg indexto specify a reference backbone for orienting the snarl tree. This can be required to runvg haplotypeson some graphs from minigraph-cactus with newer vg versions. Can be thought of as a much higher-level version of the current-winterface which lets you manually upweight nodes.vg giraffecan compute supplementary alignments with the--supplementaryoption
Updated Submodules
- gbwt
- gbwtgraph
- libvgio
vg 1.70.0 - Zebedassi
Don't forget to mark the static binary executable:
chmod +x vg
Docker Image: quay.io/vgteam/vg:v1.70.0
Buildable Source Tarball: vg-v1.70.0.tar.gz
Includes source for vg and all submodules. Use this instead of Github's "Source Code" downloads; those will not build as they do not include code for bundled dependencies that the vg build process needs.
This release includes:
- Minor formatting improvements in README
- Fix bug in distance indexing where there weren't enough bits per int to represent all values
- Add more softclip statistics to
vg stats vg injectnow has the option--allow-missing-contig/-awhich treats reads mapped to missing contigs as unmapped instead of erroring (Resolves #4613)- Warn in
vg snarlswhen-l,-o, or-ais used without--traversals - Make helptext for
vg index --snarl-limitmatch reality - Minimizer index changes:
- New version of the minimizer index. Existing indexes must be rebuilt.
- Fixes for
vg minimizerandautoindexafter the--rec-modechanges (vg minimizerno longer fails to save oversized zipcode references). - The index now knows the type of the payload stored with each hit.
- A
--rec-modealso knows the path name fields used to identify haplotypes.
- Fix bug causing
vg pack -d -ecrash. vg describesubcommand for identifying and describing files based on header information.- Create utility functions for basic parsing/validity checks, and use them in subcommands +
src/index_registry.cpp- At least attempt to enforce the use of some new standardized parsing/validity functions
- Create utility functions
info(),warn(), anderror()for pretty error/warning printing (also exposed by aLoggerobject) and use them in subcommands +src/index_registry.cpp
- Fix two broken tests in
test/t/03_vg_view.t - Changes to GAF output:
- Header lines starting with
@. All tools reading GAF files must be updated to handle headers. - Unaligned sequences are preserved as insertions aligned to an empty target path.
- Header lines starting with
libhandlegraphversions have been re-synceddeconstruct -foption to write fasta file of off-reference sequence, as well as a tsv table describing its locationsvg giraffewill again use path payloads from the minimizer index- If
vg index --snarl-limithas a threshold equal to a snarl's size, it no longer counts as an oversized snarl - Hint to the user what value they might need to increase
--snarl-limitto vg snarls -woption added to specify node weights (similar toindex -w)vg callandvg deconstructnow use reference-guided snarl decomposition by default.vg clip,vg simplifyandvg statsnow use reference information when applicable/available during snarl computation.- Empty string SAM tags can now be parsed when embedded in GAM records.
vg surject -pnow works on haplotype paths (with their#0,#1etc. fragment numbers) in a GBZ.- vg manpage generator now includes
vg combine - In
vg giraffechaining mode, don't bother calculating DP matrix size if a conservative/minimal size estimate would exceed the maximum threshold - Put long read giraffe preprint link in README for citation
- Remove duct tape by reordering snarl ranks, which breaks previous distance indexes
Updated Submodules
gbwtgraphlibbdsglibhandlegraphlibvgiosdsl-litexg
vg 1.69.0 - Bologna
Don't forget to mark the static binary executable:
chmod +x vg
Docker Image: quay.io/vgteam/vg:v1.69.0
Buildable Source Tarball: vg-v1.69.0.tar.gz
Includes source for vg and all submodules. Use this instead of Github's "Source Code" downloads; those will not build as they do not include code for bundled dependencies that the vg build process needs.
Release Note! Compared to the previous v1.68.0 release, vg giraffe is faster on long reads, but may be less accurate for variant calling from HiFi reads, when using available trained DeepVariant models.
This release includes:
vg injectnow produces useful error messages when reads go out of range on pathsvg autoindexnow gives you hints about what files would help it, when it can't make the indexes it wants to make.vg chainssubcommand for extracting top-level chains from a distance index or a snarls file for GBZ-base.vg injectwill no longer spontaneously map SAM/BAM reads that have their mapping fields filled in but are flagged as unmapped.vg injectwill now throw away scores for unmapped readsvg statsandvg injectcan now understand reads that are asserted to be "mapped", but where the position/path is not provided, a thing the SAM spec does not appear to prohibit.- Zip code trees for
vg giraffe's chaining mode now have non-heuristic* distances in non-DAG snarls [*intra-chain reversals are still not handled at all] As a practical matter, we get significant speedups on HiFi and R10 reads (especially for the slowest reads) and a tiny increase in read identity scores (though some increase and some decrease) - vg mapping tools can now produce supplementary alignments for SAM/BAM output
vg giraffenow implements a recombination aware chaining algorithm- GBWTGraph can again be built for more than 64 paths
vg find -Gnow includes regions of paths touched by the extracted graphvg haplotypes --include-referencenow also includes reference paths that do not visit any snarls.- Breaking changes to the haplotype information (
.hapl) files used byvg haplotypes. Old files can no longer be used. - Improve automatic manpage generation
- Fixed haplotypes supported by minimizers (for recombination-aware
vg giraffe) - Add tiebreak on identity for alignments with identical score (
vg giraffe) - Heuristically detect & fix when snarl ranks are sorted backwards in zip code tree
Updated Submodules
- gbwtgraph
- sdsl-lite
vg 1.68.0 - Rimbocchi
Don't forget to mark the static binary executable:
chmod +x vg
Docker Image: quay.io/vgteam/vg:v1.68.0
Buildable Source Tarball: vg-v1.68.0.tar.gz
Includes source for vg and all submodules. Use this instead of Github's "Source Code" downloads; those will not build as they do not include code for bundled dependencies that the vg build process needs.
This release includes:
vg indexnow accepts a-woption to up-weight nodes to push the top-level chain through them when finding snarls- Added a warning that path selection options are not compatible with
vg paths -g vg haplotypesexits with an error if the snarl decomposition contains a cyclical top-level chain.scripts/check_options.pynow catches if something other than,is between shortform and longform options- Add option
vg autoindex --no-guessingto allow force-regenerating indices - Lookup of regions within paths that are themselves subpaths (like
Stella_v1p1#0#Chr4__Stella_v1p1[11578420-11580540]:0-100) should now work again. - Add errors when using incompatible options in
vg depth - SAM-style tags are no longer lost on unmapped reads during surject
- vg's vcflib build will now use the default
python3instead of the latest installed Python (which might not have its headers) - Add
nodesas avg filter --tsv-outfield option; prints a comma-separated list of nodes traversed by the read's path vg giraffenow has a--softclip-penaltyflag to reduce alignment scores per-base for softclipsvg filternow has a-W/--overwite-scoreflag to save the scores from--rescore.vg filternow checks to make sure you aren't using--rescoreor related options when they would do nothing.- Internal changes in
vg giraffeto allow multiple presets to potentially share settings. - Bug fixes for chain transition distance measurement with the zip code tree in
vg giraffe - vg now supports Protobuf 30+ and its string view return types.
vg modnow has an--invert-keep-pathsoption to save the complement of path names passed to--keep-pathsvg giraffe -b hifipreset now uses a--max-min-chain-scoreof 100vgnow has alibbdsgthat can runis_regular_snarl()on a distance-less distance index.
Updated Submodules
- gbwtgraph
- libbdsg
- libvgio
vg 1.67.0 - Vetria
Don't forget to mark the static binary executable:
chmod +x vg
Docker Image: quay.io/vgteam/vg:v1.67.0
Buildable Source Tarball: vg-v1.67.0.tar.gz
Includes source for vg and all submodules. Use this instead of Github's "Source Code" downloads; those will not build as they do not include code for bundled dependencies that the vg build process needs.
This release includes:
- GAF path end positions are calculated correctly in some edge cases.
--keep-pathcan now be used multiple times in vg modvg giraffe --track-correctnessshould no longer crash when read truth positions are on paths that exist in the graph, but are too short to reach where the read is.- Bring
vg clusterup-to-date: now accepts GBZ files, can do short-read or long-read giraffe, and allows--prefixfor better compatibility withvg autoindex - Add some options to
vg clusterto help with chaining issue diagnosis: print out cyclic snarl sizes, seeds with high hit amounts - Fix GFA haplotype sniffing for GFAs with P-lines
- Use graph metadata and not path name to determine reference/haplotype status for paths in
vg callandvg deconstruct. - Loading transcript files will now produce a human-readable error message when there are duplicate transcripts with the same ID on different paths.
- The GBWT built while sorting GAF with
vg gamsortis now forward-only by default. vg simnow can output in FASTQ format via--fastq-out- Make
vg mod -ttake an argument and stop-Efrom requiring one - In
vg chunk, fix the long names for-P,-c,-r, and-R, and make the latter two accept arguments. - Register command line options correctly & put them under test (
scripts/check_options.py). This involved a lot of minor bugfixes and helptext modifications, collected in a Google Doc. - Manually wrap option helptext lines after 80 characters
vg simnow works with sample name even when no GBWT is provided.- CI now enforces the minimum required GCC version.
- vg now requires a minimum GCC version of 7, the oldest major version available in the Ubuntu releases we test on for CI.
vg giraffeusage example now shows using a.zipcodesfile and a.withzip.minfile.- vg can now be built with the mimalloc allocator (v3 beta)
Updated Submodules
- BBHash
- libbdsg
- libvgio
- sdsl-lite
- sparsepp
- vcflib
New Submodules
- mimalloc
Removed Submodules
- fastahack (now used via vcflib)
