train_seqstructhmm¶
Trains a Hidden Markov Model for the sequence-structure binding preferences of an RNA-binding protein. The model is trained on sequences and structures from a CLIP-seq experiment given in two FASTA-like files. During the training process, statistics about the model are printed on stdout. In every iteration, the current model and a visualization of the model can be stored in the output directory. The training process terminates when no significant progress has been made for three iterations.
usage: train_seqstructhmm [-h] [--motif_length MOTIF_LENGTH] [--random]
[--flexibility FLEXIBILITY]
[--block_size BLOCK_SIZE] [--threshold THRESHOLD]
[--job_name JOB_NAME]
[--output_directory OUTPUT_DIRECTORY]
[--termination_interval TERMINATION_INTERVAL]
[--no_model_state] [--only_best_shape]
training_sequences training_structures
Positional Arguments¶
training_sequences | |
FASTA file with sequences for training | |
training_structures | |
FASTA file with RNA structures for training |
Named Arguments¶
--motif_length, -n | |
length of the motif that shall be found (default: 6) Default: 6 | |
--random, -r | Initialize the model randomly (default: initialize with Baum-Welch optimized sequence motif) Default: False |
--flexibility, -f | |
greedyness of Gibbs sampler: model parameters are sampled from among the top f configurations (default: f=10), set f to 0 in order to include all possible configurations Default: 10 | |
--block_size, -s | |
number of sequences to be held-out in each iteration (default: 1) Default: 1 | |
--threshold, -t | |
the iterative algorithm is terminated if this reduction in sequence structure loglikelihood is not reached for any of the 3 last measurements (default: 10) Default: 10.0 | |
--job_name, -j | name of the job (default: “job”) Default: “job” |
--output_directory, -o | |
directory to write output files to (default: current directory) Default: “.” | |
--termination_interval, -i | |
produce output every <i> iterations (default: i=100) Default: 100 | |
--no_model_state, -w | |
do not write model state every i iterations Default: False | |
--only_best_shape | |
train only using best structure for each sequence (default: use all structures) Default: False |