% DSIMULATOR(1) 1.0
%
% September 2015

# NAME

dsimulator - generate synthetic reads for a random genome

# SYNOPSIS

**dsimulator** *genlen:double* [**-c***double(20.)*] [**-b***double(.5)*]
	[**-r***int*] [**-m***int(10000)*]  [**-s***int(2000)*]
                              [**-x***int(4000)*]   [**-e***double(.15)*]
                              [**-M***file*]

# DESCRIPTION

**dsimulator** first generates a fake genome of size *genlen*`*1Mb` long, that has an AT-bias of **-b**.  It then
generates sample reads of mean length **-m** from a log-normal length distribution with
standard deviation **-s**, but ignores reads of length less than **-x**.  It collects enough
reads to cover the genome **-c** times and introduces **-e** fraction errors into each read
where the ratio of insertions, deletions, and substitutions are set by defined
constants `INS_RATE` (default 73%) and `DEL_RATE` (default 20%) within generate.c.  One
can also control the rate at which reads are picked from the forward and reverse
strands by setting the defined constant `FLIP_RATE` (default 50/50).  The **-r** option seeds
the random number generator for the generation of the genome so that one can
reproducibly generate the same underlying genome to sample from.  If this parameter is
missing, then the job id of the invocation seeds the random number generator.  The
output is sent to the standard output (i.e. it is a UNIX pipe).  The output is in
Pacbio .fasta format suitable as input to **fasta2DB**(1).  Finally, the **-M** option requests
that the coordinates from which each read has been sampled are written to the indicated
file, one line per read, ASCII encoded.  This "map" file essentially tells one where
every read belongs in an assembly and is very useful for debugging and testing
purposes.  If a read pair is say b,e then if `b < e` the read was sampled from [b,e] in
the forward direction, and if `b > e` from [e,b] in the reverse direction.

# SEE ALSO

**daligner**(1)
