train_seqstructhmm¶

Trains a Hidden Markov Model for the sequence-structure binding preferences of an RNA-binding protein. The model is trained on sequences and structures from a CLIP-seq experiment given in two FASTA-like files. During the training process, statistics about the model are printed on stdout. In every iteration, the current model and a visualization of the model can be stored in the output directory. The training process terminates when no significant progress has been made for three iterations.

usage: train_seqstructhmm [-h] [--motif_length MOTIF_LENGTH] [--random]
                          [--flexibility FLEXIBILITY]
                          [--block_size BLOCK_SIZE] [--threshold THRESHOLD]
                          [--job_name JOB_NAME]
                          [--output_directory OUTPUT_DIRECTORY]
                          [--termination_interval TERMINATION_INTERVAL]
                          [--no_model_state] [--only_best_shape]
                          training_sequences training_structures

Positional Arguments¶

`training_sequences`
	FASTA file with sequences for training
`training_structures`
	FASTA file with RNA structures for training

Named Arguments¶

`--motif_length, -n`
	length of the motif that shall be found (default: 6) Default: 6
`--random, -r`	Initialize the model randomly (default: initialize with Baum-Welch optimized sequence motif) Default: False
`--flexibility, -f`
	greedyness of Gibbs sampler: model parameters are sampled from among the top f configurations (default: f=10), set f to 0 in order to include all possible configurations Default: 10
`--block_size, -s`
	number of sequences to be held-out in each iteration (default: 1) Default: 1
`--threshold, -t`
	the iterative algorithm is terminated if this reduction in sequence structure loglikelihood is not reached for any of the 3 last measurements (default: 10) Default: 10.0
`--job_name, -j`	name of the job (default: “job”) Default: “job”
`--output_directory, -o`
	directory to write output files to (default: current directory) Default: “.”
`--termination_interval, -i`
	produce output every <i> iterations (default: i=100) Default: 100
`--no_model_state, -w`
	do not write model state every i iterations Default: False
`--only_best_shape`
	train only using best structure for each sequence (default: use all structures) Default: False