Skip to content

People want to add tags to SAM read groups, and think they can be included in --read-group, but they can't #4924

Description

@adamnovak

People think they can specify a read group for vg giraffe or vg surject with something like:

--read-group "ID:1 LB:lib1 SM:HG002 PL:illumina PU:unit1"

This even results in SAM that looks visually mostly right. But actually, this does not make a read group with the ID 1 and the platform illumina and so on; it makes a read group with all that text in the name of the read group, in the ID tag value, without any other tags on it.

I think we have had this anti-pattern in the official documentation, we still use it in the Long Read Giraffe paper scripts, and it's in the official vg WDL workflows (vgteam/vg_wdl#176). It looks almost exactly like the corresponding bwa-mem -R option. But it doesn't work and needs to be ripped out and replaced with a way of specifying read group metadata that does work (even if that's "include real tab characters on the command line to break out of the ID field and inject more tags", as ugly as that is).

Examples of people picking up the anti-pattern:
#4222 (comment)
#3937
#2302 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions