Bio::BioStudio::Basic is a Perl module that provides basic BioStudio functions.
FUNCTIONS
configure_BioStudio() This function loads the configuration file into a hash ref. You must pass it the path to the directory containing the configuration file; it will use Config::Auto to ``magically'' parse the file.
fetch_custom_features() Pass the config hashref, receive a hashref of the custom features defined in the BioStudio configuration directory. Each feature has four attributes: NAME, KIND, SOURCE, and SEQ
fetch_custom_markers() Pass the config hashref, receive a hashref of the custom markers defined in the BioStudio configuration directory. Each marker is a GFF file that gets read into the attributes NAME, SEQ, DB (a Bio::DB::SeqFeature::Store), and COLOR (if a color is defined in the GFF file).
fetch_enzyme_lists() Pass the config hashref, receive an array that contains the names of the enzyme lists in the BioStudio configuration directory. Each list is a GeneDesign compatible list of restriction enzyme recognition sites.
make_mask() Given a length, a reference to a list full of Bio::SeqFeatures, and optionally an offset, returns a string of integers where each positon corresponds to a base of sequence, and the integer represents the number of features that overlap that base. Obviously limited to ten overlapping features before a serious bug sets in :(
mask_combine() Takes two string masks (see make_mask()) and adds them. Returns the merged mask.
mask_filter() Takes a string mask (see make_mask()) and returns a listref of break coordinates; that is, where does feature sequence end and interfeature sequence begin, and where does interfeature sequence end and feature sequence begin? For example, if the mask is "0001100033221100", the resulting list would be [0 3 5 8 14 16], meaning that features exist from 4 to 5 and 9 to 14. Intergenic sequence coordinates can thus be pulled out by hashing the array, %inter = @{mask_filter($mask)} where each key + 1 is the left coordinate, and the value is the right coordinate.
get_src_path() Given a chromosome name and the config hashref, returns the absolute path to that chromosome in the BioStudio genome repository.
get_genome_list() Given the config hashref, returns a list of all chromosomes in the BioStudio genome repository.
gather_versions() Given a species, a target, and the config hashref, returns a hashref of all chromosomes in the species in the BioStudio genome repository that match the target. The target is an integer that represents a version. If target is set to 0, we will return every wildtype version. If target is set to -1, we will return every latest version. For any other target (1, 3, 5) we will return that particular version.
rollback() Given a chromosome name and the BioStudio config hashref, removes that chromosome from the BioStudio genome repository.
ORF_compile() given a reference to an array full of Bio::SeqFeature gene objects, returns a reference to a hash with gene ids as keys and concatenated 5' to 3' coding sequences as values
get_feature_sequence() For when you can't use the Bio::SeqFeature seq function. Given a Bio::SeqFeature compliant feature and a sequence, returns the sequence that the coordinates of the feature indicate.
check_new_sequence() Best when used as a confirmation that your edits went as expected. Given a Bio::SeqFeature compliant feature that has a``newseq'' attribute, checks if the newseq and the actual sequence occupied by the feature are the same
flatten_subfeats() Given a seqfeature, iterate through its subfeatures and add all their subs to one big array. Mainly need this when CDSes are hidden behind mRNAs in genes.
gene_names() Given a list of Bio::SeqFeature gene objects and the BioStudio config hashref, returns a hash where each gene id is the key to a display friendly string.
allowable_codon_changes() Given two codons (a from, and a to) and a GeneDesign codon table hashref, this function generates every possible peptide pair that could contain the from codon and checks to see if the peptide sequence can be maintained when the from codon is replaced by the to codon. This function is of particular use when codons are being changed in genes that overlap one another.
print_as_fasta() takes a sequence as a string and a sequence id and returns an 80 column FASTA formatted sequence block as an array reference
Product's homepage
Requirements:
· Perl