‘one variant with N ALT alleles’ to ‘N variants with one ALT’
Use bcftools norm
This program is now part of the main jvarkit
tool. See jvarkit for compiling.
Usage: java -jar dist/jvarkit.jar vcfmulti2oneallele [options] Files
Usage: vcfmulti2oneallele [options] Files
Options:
--bcf-output
If this program writes a VCF to a file, The format is first guessed from
the file suffix. Otherwise, force BCF output. The current supported BCF
version is : 2.1 which is not compatible with bcftools/htslib (last
checked 2019-11-15)
Default: false
-flag, --flag
Info field name that will be added to recall the original alleles.
Default: VCF_MULTIALLELIC_SRC
--generate-vcf-md5
Generate MD5 checksum for VCF output.
Default: false
-h, --help
print help and exit
--helpFormat
What kind of help. One of [usage,markdown,xml].
--keepSpanningDeletions
Keep Alt Spanning deletion alleles *
Default: false
--most-frequent
Keep only most frequent allele.
Default: false
-o, --out
Output file. Optional . Default: stdout
--print-no-alt
Print Variants without ALT allele
Default: false
--version
print version and exit
The project is licensed under the MIT license.
Should you cite vcfmulti2oneallele ? https://github.com/mr-c/shouldacite/blob/master/should-I-cite-this-software.md
The current reference is:
http://dx.doi.org/10.6084/m9.figshare.1425030
Lindenbaum, Pierre (2015): JVarkit: java-based utilities for Bioinformatics. figshare. http://dx.doi.org/10.6084/m9.figshare.1425030
use bcftools norm
Exac contains multi-ALT variants:
$ gunzip -c ExAC.r0.3.sites.vep.vcf.gz | grep rs3828049
1 889238 rs3828049 G A,C 8422863.10 PASS AC=6926,3;AC_AFR=220,0;AC_AMR=485,1;AC_Adj=6890,3;AC_EAS=746,0;AC_FIN=259,0;AC_Het=6442,3,0;AC_Hom=224,0;AC_NFE=3856,0;AC_OTH=41,0;AC_SAS=1283,2;AF=0.057,2.472e-05;AN=121358;AN_AFR=10148;AN_AMR=11522;AN_Adj=119272;AN_EAS=8582;AN_FIN=6358;AN_NFE=65282;AN_OTH=876;AN_SAS=16504;(...)
processed with this tools:
$ java -jar dist/jvarkit.jar vcfmulti2oneallele ExAC.r0.3.sites.vep.vcf.gz | grep rs3828049
1 889238 rs3828049 G A 8422863.10 PASS AC=6926;AC_AFR=220;AC_AMR=485;AC_Adj=6890;AC_EAS=746;AC_FIN=259;AC_Het=6442;AC_Hom=224;AC_NFE=3856;AC_OTH=41;AC_SAS=1283;AF=0.057;AN=121358;AN_AFR=10148;AN_AMR=11522;AN_Adj=119272;AN_EAS=8582;AN_FIN=6358;AN_NFE=65282;AN_OTH=876;AN_SAS=16504;BaseQRankSum=-2.170e-01;VCF_MULTIALLELIC_SRC=A|C;(...)
1 889238 rs3828049 G C 8422863.10 PASS AC=3;AC_AFR=0;AC_AMR=1;AC_Adj=3;AC_EAS=0;AC_FIN=0;AC_Het=3;AC_Hom=0;AC_NFE=0;AC_OTH=0;AC_SAS=2;AF=2.472e-05;AN=121358;AN_AFR=10148;AN_AMR=11522;AN_Adj=119272;AN_EAS=8582;AN_FIN=6358;AN_NFE=65282;AN_OTH=876;AN_SAS=16504;VCF_MULTIALLELIC_SRC=A|C;(....)