Convert VCF with multiple samples to a VCF with one SAMPLE, duplicating variant and adding the sample name in the INFO column. Never used.
I don’t use this anymore. See vcfconcat
This program is now part of the main jvarkit
tool. See jvarkit for compiling.
Usage: java -jar dist/jvarkit.jar vcfmulti2one [options] Files
Usage: vcfmulti2one [options] Files
Options:
--bcf-output
If this program writes a VCF to a file, The format is first guessed from
the file suffix. Otherwise, force BCF output. The current supported BCF
version is : 2.1 which is not compatible with bcftools/htslib (last
checked 2019-11-15)
Default: false
--disable-vc-attribute-recalc
When genotypes are removed/changed, Dd not recalculate variant
attributes like DP, AF, AC, AN...
Default: false
-r, -hr, --discard_hom_ref
discard if variant is hom-ref
Default: false
-c, -nc, --discard_no_call
discard if variant is no-call
Default: false
-a, --discard_non_available
discard if variant is not available
Default: false
--generate-vcf-md5
Generate MD5 checksum for VCF output.
Default: false
-h, --help
print help and exit
--helpFormat
What kind of help. One of [usage,markdown,xml].
-o, --output
Output file. Optional . Default: stdout
--regions
Optional. A source of intervals. The following suffixes are recognized:
vcf, vcf.gz bed, bed.gz, gtf, gff, gff.gz, gtf.gz.Otherwise it could be
an empty string (no interval) or a list of plain interval separated by
'[ \t\n;,]'
--vc-attribute-recalc-ignore-filtered
When recalculating variant attributes like DP AF, AC, AN, ignore
FILTERed **Genotypes**
Default: false
--vc-attribute-recalc-ignore-missing
Ignore missing VCF headers (DP, AF, AC, AN). Default behavior: adding
VCF header if they're missing
Default: false
--version
print version and exit
20150312
The project is licensed under the MIT license.
Should you cite vcfmulti2one ? https://github.com/mr-c/shouldacite/blob/master/should-I-cite-this-software.md
The current reference is:
http://dx.doi.org/10.6084/m9.figshare.1425030
Lindenbaum, Pierre (2015): JVarkit: java-based utilities for Bioinformatics. figshare. http://dx.doi.org/10.6084/m9.figshare.1425030
I don’t use this software anymore.
if there is only one input with the ‘.list’ suffix, it is interpreted as a file containing the path to the vcf files
A file with the suffixes ‘.zip’ or ‘.tar’ or ‘.tar.gz’ is interpreted as an archive and all the entries looking like a vcf are extracted.
24 fev 2020: refactored, the input is not anymore sorted. Use bcftools sort
with zip and tar
$ tar tvfz ~/jeter.tar.gz && unzip -l ~/jeter.zip && java -jar dist/jvarkit.jar vcfmulti2one ~/jeter.tar.gz ~/jeter.zip | bcftools view - | wc -l
-rw-r--r-- lindenb/lindenb 5805 2019-01-11 18:29 src/test/resources/rotavirus_rf.ann.vcf.gz
-rw-r--r-- lindenb/lindenb 27450 2019-01-11 18:29 src/test/resources/rotavirus_rf.freebayes.vcf.gz
-rw-r--r-- lindenb/lindenb 7366 2019-01-11 18:29 src/test/resources/rotavirus_rf.unifiedgenotyper.vcf.gz
Archive: /home/lindenb/jeter.zip
Length Date Time Name
--------- ---------- ----- ----
7366 2019-01-11 18:29 src/test/resources/rotavirus_rf.unifiedgenotyper.vcf.gz
5805 2019-01-11 18:29 src/test/resources/rotavirus_rf.ann.vcf.gz
3661 2019-01-11 18:29 src/test/resources/rotavirus_rf.vcf.gz
27450 2019-01-11 18:29 src/test/resources/rotavirus_rf.freebayes.vcf.gz
--------- -------
44282 4 files
4883
$ curl -s "http://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20130502/ALL.chr1.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz" |\
gunzip -c |\
java -jar dist/jvarkit.jar vcfmulti2one -c -r -a |\
grep -v '##' |\
grep -E '(CHROM|SAMPLENAME)' | head | verticalize
>>> 2
$1 #CHROM : 1
$2 POS : 10177
$3 ID : .
$4 REF : A
$5 ALT : AC
$6 QUAL : 100
$7 FILTER : PASS
$8 INFO : AA=|||unknown(NO_COVERAGE);AC=2130;AF=0.425319;AFR_AF=0.4909;AMR_AF=0.3602;AN=5008;DP=103152;EAS_AF=0.3363;EUR_AF=0.4056;NS=2504;SAMPLENAME=HG00096;SAS_AF=0
.4949
$9 FORMAT : GT
$10 SAMPLE : 1|0
<<< 2
>>> 3
$1 #CHROM : 1
$2 POS : 10177
$3 ID : .
$4 REF : A
$5 ALT : AC
$6 QUAL : 100
$7 FILTER : PASS
$8 INFO : AA=|||unknown(NO_COVERAGE);AC=2130;AF=0.425319;AFR_AF=0.4909;AMR_AF=0.3602;AN=5008;DP=103152;EAS_AF=0.3363;EUR_AF=0.4056;NS=2504;SAMPLENAME=HG00097;SAS_AF=0
.4949
$9 FORMAT : GT
$10 SAMPLE : 0|1
<<< 3
>>> 4
$1 #CHROM : 1
$2 POS : 10177
$3 ID : .
$4 REF : A
$5 ALT : AC
$6 QUAL : 100
$7 FILTER : PASS
$8 INFO : AA=|||unknown(NO_COVERAGE);AC=2130;AF=0.425319;AFR_AF=0.4909;AMR_AF=0.3602;AN=5008;DP=103152;EAS_AF=0.3363;EUR_AF=0.4056;NS=2504;SAMPLENAME=HG00099;SAS_AF=0
.4949
$9 FORMAT : GT
$10 SAMPLE : 0|1
<<< 4
>>> 5
$1 #CHROM : 1
$2 POS : 10177
$3 ID : .import java.util.Comparator;
$4 REF : A
$5 ALT : AC
$6 QUAL : 100
$7 FILTER : PASS
$8 INFO : AA=|||unknown(NO_COVERAGE);AC=2130;AF=0.425319;AFR_AF=0.4909;AMR_AF=0.3602;AN=5008;DP=103152;EAS_AF=0.3363;EUR_AF=0.4056;NS=2504;SAMPLENAME=HG00100;SAS_AF=0
.4949
$9 FORMAT : GT
$10 SAMPLE : 1|0
<<< 5
>>> 6
$1 #CHROM : 1
$2 POS : 10177
$3 ID : .
$4 REF : A
$5 ALT : AC
$6 QUAL : 100
$7 FILTER : PASS
$8 INFO : AA=|||unknown(NO_COVERAGE);AC=2130;AF=0.425319;AFR_AF=0.4909;AMR_AF=0.3602;AN=5008;DP=103152;EAS_AF=0.3363;EUR_AF=0.4056;NS=2504;SAMPLENAME=HG00102;SAS_AF=0
.4949
$9 FORMAT : GT
$10 SAMPLE : 1|0
<<< 6
>>> 7
$1 #CHROM : 1
$2 POS : 10177
$3 ID : .
$4 REF : A
$5 ALT : AC
$6 QUAL : 100
$7 FILTER : PASS
$8 INFO : AA=|||unknown(NO_COVERAGE);AC=2130;AF=0.425319;AFR_AF=0.4909;AMR_AF=0.3602;AN=5008;DP=103152;EAS_AF=0.3363;EUR_AF=0.4056;NS=2504;SAMPLENAME=HG00103;SAS_AF=0
.4949
$9 FORMAT : GT
$10 SAMPLE : 1|0
<<< 7
>>> 8
$1 #CHROM : 1
$2 POS : 10177
$3 ID : .
$4 REF : A
$5 ALT : AC
$6 QUAL : 100
$7 FILTER : PASS
$8 INFO : AA=|||unknown(NO_COVERAGE);AC=2130;AF=0.425319;AFR_AF=0.4909;AMR_AF=0.3602;AN=5008;DP=103152;EAS_AF=0.3363;EUR_AF=0.4056;NS=2504;SAMPLENAME=HG00105;SAS_AF=0
.4949
$9 FORMAT : GT
$10 SAMPLE : 1|0
<<< 8
>>> 9
$1 #CHROM : 1
$2 POS : 10177
$3 ID : .
$4 REF : A
$5 ALT : AC
$6 QUAL : 100
$7 FILTER : PASS
$8 INFO : AA=|||unknown(NO_COVERAGE);AC=2130;AF=0.425319;AFR_AF=0.4909;AMR_AF=0.3602;AN=5008;DP=103152;EAS_AF=0.3363;EUR_AF=0.4056;NS=2504;SAMPLENAME=HG00106;SAS_AF=0
.4949
$9 FORMAT : GT
$10 SAMPLE : 1|0
<<< 9
>>> 10
$1 #CHROM : 1
$2 POS : 10177
$3 ID : .
$4 REF : A
$5 ALT : AC
$6 QUAL : 100
$7 FILTER : PASS
$8 INFO : AA=|||unknown(NO_COVERAGE);AC=2130;AF=0.425319;AFR_AF=0.4909;AMR_AF=0.3602;AN=5008;DP=103152;EAS_AF=0.3363;EUR_AF=0.4056;NS=2504;SAMPLENAME=HG00114;SAS_AF=0
.4949
$9 FORMAT : GT
$10 SAMPLE : 0|1
<<< 10