People think they can specify a read group for vg giraffe or vg surject with something like:
--read-group "ID:1 LB:lib1 SM:HG002 PL:illumina PU:unit1"
This even results in SAM that looks visually mostly right. But actually, this does not make a read group with the ID 1 and the platform illumina and so on; it makes a read group with all that text in the name of the read group, in the ID tag value, without any other tags on it.
I think we have had this anti-pattern in the official documentation, we still use it in the Long Read Giraffe paper scripts, and it's in the official vg WDL workflows (vgteam/vg_wdl#176). It looks almost exactly like the corresponding bwa-mem -R option. But it doesn't work and needs to be ripped out and replaced with a way of specifying read group metadata that does work (even if that's "include real tab characters on the command line to break out of the ID field and inject more tags", as ugly as that is).
Examples of people picking up the anti-pattern:
#4222 (comment)
#3937
#2302 (comment)
People think they can specify a read group for
vg giraffeorvg surjectwith something like:This even results in SAM that looks visually mostly right. But actually, this does not make a read group with the ID
1and the platformilluminaand so on; it makes a read group with all that text in the name of the read group, in theIDtag value, without any other tags on it.I think we have had this anti-pattern in the official documentation, we still use it in the Long Read Giraffe paper scripts, and it's in the official vg WDL workflows (vgteam/vg_wdl#176). It looks almost exactly like the corresponding bwa-mem -R option. But it doesn't work and needs to be ripped out and replaced with a way of specifying read group metadata that does work (even if that's "include real tab characters on the command line to break out of the ID field and inject more tags", as ugly as that is).
Examples of people picking up the anti-pattern:
#4222 (comment)
#3937
#2302 (comment)