train_seqstructhmm

Trains a Hidden Markov Model for the sequence-structure binding preferences of an RNA-binding protein. The model is trained on sequences and structures from a CLIP-seq experiment given in two FASTA-like files. During the training process, statistics about the model are printed on stdout. In every iteration, the current model and a visualization of the model can be stored in the output directory. The training process terminates when no significant progress has been made for three iterations.

usage: train_seqstructhmm [-h] [--motif_length MOTIF_LENGTH] [--random]
                          [--flexibility FLEXIBILITY]
                          [--block_size BLOCK_SIZE] [--threshold THRESHOLD]
                          [--job_name JOB_NAME]
                          [--output_directory OUTPUT_DIRECTORY]
                          [--termination_interval TERMINATION_INTERVAL]
                          [--no_model_state] [--only_best_shape]
                          training_sequences training_structures

Positional Arguments

training_sequences
 FASTA file with sequences for training
training_structures
 FASTA file with RNA structures for training

Named Arguments

--motif_length, -n
 

length of the motif that shall be found (default: 6)

Default: 6

--random, -r

Initialize the model randomly (default: initialize with Baum-Welch optimized sequence motif)

Default: False

--flexibility, -f
 

greedyness of Gibbs sampler: model parameters are sampled from among the top f configurations (default: f=10), set f to 0 in order to include all possible configurations

Default: 10

--block_size, -s
 

number of sequences to be held-out in each iteration (default: 1)

Default: 1

--threshold, -t
 

the iterative algorithm is terminated if this reduction in sequence structure loglikelihood is not reached for any of the 3 last measurements (default: 10)

Default: 10.0

--job_name, -j

name of the job (default: “job”)

Default: “job”

--output_directory, -o
 

directory to write output files to (default: current directory)

Default: “.”

--termination_interval, -i
 

produce output every <i> iterations (default: i=100)

Default: 100

--no_model_state, -w
 

do not write model state every i iterations

Default: False

--only_best_shape
 

train only using best structure for each sequence (default: use all structures)

Default: False