Scan structural variants for case/controls data


Usage: scansv [options] Files
      Print all original variants from each file instead of printing just one.
      Default: false
      A source of intervals. The following suffixes are recognized: vcf, 
      vcf.gz bed, bed.gz, gtf, gff, gff.gz, gtf.gz.Otherwise it could be an 
      empty string (no interval) or a list of plain interval separated by '[ 
      Two BND variants are the same if their bounds are distant by less than 
      xxx bases. A distance specified as a positive integer.Commas are 
      removed. The following suffixes are interpreted : b,bp,k,kb,m,mb
      Default: 100
      When comparing two BND, check that their mate (using the ALT allele) are 
      the same too
      Default: false
    -c, --controls
      Controls indexed VCF files. a file endings with the suffix '.list' is 
      interpretted as a list of path.
      Default: []
      When comparing two SV variants, their INFO/SVTYPE should be the same. 
      Default is to just use coordinates to compare non-BND variants.
      Default: false
    -h, --help
      print help and exit
      What kind of help. One of [usage,markdown,xml].
    -L, --large
      Large number of controls: By default, all VCF readers for controls are 
      opened and are kept opened. It's fast but requires a lot of resources. 
      This option open+close the controls if needed but it makes things 
      slower. It's the number of VCF that should be keept open, So '0' = 
      ignore/all re-open+close (slow)
      Default: 0
      Max frequency of variants found in controls. 0:no control should carry 
      the variant
      Default: 0.0
    -o, --out
      Output file. Optional . Default: stdout
      When comparing two non-BND SV variants, use their ALT alleles to adjust 
      the interval. It solves the problem of  
      Default: false
      Two CNV/DEL/.. variants are the same if they share 'x' fraction of their 
      Default: 0.75
      Two non-BND variants are the same if they overlap and both have a 
      length<= 'x'. A distance specified as a positive integer.Commas are 
      removed. The following suffixes are interpreted : b,bp,k,kb,m,mb
      Default: 10
      print version and exit



Download and Compile

$ git clone "https://github.com/lindenb/jvarkit.git"
$ cd jvarkit
$ ./gradlew scansv

The java jar file will be installed in the dist directory.

The project is licensed under the MIT license.


Should you cite scansv ? https://github.com/mr-c/shouldacite/blob/master/should-I-cite-this-software.md

The current reference is:


Lindenbaum, Pierre (2015): JVarkit: java-based utilities for Bioinformatics. figshare. http://dx.doi.org/10.6084/m9.figshare.1425030


find CONTROLS/ -name "*.vcf.gz" > controls.list

java -Xmx3g -Djava.io.tmpdir=. -jar scansv.jar --controls controls.list -d2 25 --fraction 0.6 cases1.vcf cases2.vcf > out.vcf