Split a Bed file into non-overlapping data set.
This program is now part of the main jvarkit
tool. See jvarkit for compiling.
Usage: java -jar dist/jvarkit.jar bednonoverlappingset [options] Files
Usage: bednonoverlappingset [options] Files
Options:
--compress
Bgzip outut bed files
Default: false
-x, --extend
Extend intervals by 'x' bases
Default: 0
-h, --help
print help and exit
--helpFormat
What kind of help. One of [usage,markdown,xml].
-m, --manifset
Manifest file file containing the generated filenames/number of item.
* -o, --out
An existing directory or a filename ending with the '.zip' or '.tar' or
'.tar.gz' suffix.
-R, -r, --reference
Indexed fasta Reference file. This file must be indexed with samtools
faidx and with picard/gatk CreateSequenceDictionary or samtools dict If
defined, will be used to sort the bed record on chrom/pos before writing
the bed records.
--version
print version and exit
20180607
The project is licensed under the MIT license.
Should you cite bednonoverlappingset ? https://github.com/mr-c/shouldacite/blob/master/should-I-cite-this-software.md
The current reference is:
http://dx.doi.org/10.6084/m9.figshare.1425030
Lindenbaum, Pierre (2015): JVarkit: java-based utilities for Bioinformatics. figshare. http://dx.doi.org/10.6084/m9.figshare.1425030
GATK DepthOfCoverage merge overlapping segments (see https://gatkforums.broadinstitute.org/gatk/discussion/1865/ ). I wan’t to get the coverage for a set of overlapping windows.
$ awk '{printf("%s\t0\t%s\n",$1,$2);}' src/test/resources/rotavirus_rf.fa.fai |\
bedtools makewindows -w 500 -s 100 -b - |\
java -jar dist/bednonoverlappingset.jar -o out.zip -m tmp.manifest
[INFO][BedNonOverlappingSet]saving tmp.00001.bed 44
[INFO][BedNonOverlappingSet]saving tmp.00002.bed 39
[INFO][BedNonOverlappingSet]saving tmp.00003.bed 37
[INFO][BedNonOverlappingSet]saving tmp.00004.bed 36
[INFO][BedNonOverlappingSet]saving tmp.00005.bed 33
$ head -n 2 tmp.000*.bed
==> tmp.00001.bed <==
RF01 0 500
RF01 500 1000
==> tmp.00002.bed <==
RF01 100 600
RF01 600 1100
==> tmp.00003.bed <==
RF01 200 700
RF01 700 1200
==> tmp.00004.bed <==
RF01 300 800
RF01 800 1300
==> tmp.00005.bed <==
RF01 400 900
RF01 900 1400
$ cat tmp.manifest
tmp.00001.bed 44
tmp.00002.bed 39
tmp.00003.bed 37
tmp.00004.bed 36
tmp.00005.bed 33
(...)
java -jar dist/bednonoverlappingset.jar -x 1 -R ref.fa -o "tmp.__SETID__.bed" -m tmp.manifest input.bed
cut -f 1 tmp.manifest | while read B
do
${java_exe} -Djava.io.tmpdir=. -jar GenomeAnalysisTK.jar \
-T DepthOfCoverage -R "ref.fa" \
-o "SAMPLE" -I input.bam -L "${B}" --omitDepthOutputAtEachBase --omitLocusTable --omitPerSampleStats
grep -v '^Target' "${sample}.sample_interval_summary" | awk -F ' ' '{printf("%s\\t%s\\n",\$1,\$3);}' >> tmp.tsv
rm "SAMPLE.sample_interval_summary" "SAMPLE.sample_interval_statistics"
done
LC_ALL=C sort -t ' ' -k1,1 tmp.tsv >> "SAMPLE.win.cov.tsv"
(...)