batch_seqstructhmm¶
Trains multiple Hidden Markov Models for the sequence-structure binding preferences of a given set of RNA-binding proteins. The models are trained on sequences and structures in FASTA format located in a given data directory. During the training process, statistics about the models are printed on stdout. In every iteration, the current model and a visualization of the model are stored in the batch directory. The training processes terminate when no significant progress has been made for three iterations.
usage: batch_seqstructhmm [-h] [--cores CORES]
[--structure_type STRUCTURE_TYPE]
[--motif_length MOTIF_LENGTH] [--baum_welch]
[--flexibility FLEXIBILITY]
[--block_size BLOCK_SIZE] [--threshold THRESHOLD]
[--termination_interval TERMINATION_INTERVAL]
data_directory proteins batch_directory
Positional Arguments¶
data_directory | data directory; must contain the sequence files under fasta/<protein>/positive.fasta and structure files under <structure_type>/<protein>/positive.txt |
proteins | list of RNA-binding proteins to analyze (surrounded by quotation marks, separated by whitespace) |
batch_directory | |
directory for batch output |
Named Arguments¶
--cores, -c | number of cores to use (if not given, all cores are used) |
--structure_type, -s | |
structure type to use; must match location of structure files (see data_directory argument above) (default: shapes) Default: “shapes” | |
--motif_length, -n | |
length of the motifs that shall be found (default: 6) Default: 6 | |
--baum_welch, -b | |
should the models be initialized with a Baum-Welch optimized sequence motif (default: yes) Default: True | |
--flexibility, -f | |
greedyness of Gibbs sampler: model parameters are sampled from among the top f configurations (default: f=10), set f to 0 in order to include all possible configurations Default: 10 | |
--block_size | number of sequences to be held-out in each iteration (default: 1) Default: 1 |
--threshold, -t | |
the iterative algorithm is terminated if this reduction in sequence structure loglikelihood is not reached for any of the 3 last measurements (default: 10) Default: 10.0 | |
--termination_interval, -i | |
produce output every <i> iterations (default: i=100) Default: 100 |