SequenceForge-Lite

SequenceForge-Lite is a lightweight tool designed to work with biological sequence data, providing various functionalities for filtering FASTQ files and manipulating FASTA files. Additionally, it offers utilities for parsing BLAST output files.

Code & full README
Usage Guide

Features

FASTQ Filtering

  • Filter FASTQ files based on GC content, sequence length, and quality score.
  • Specify custom ranges for GC content and sequence length.
  • Set a minimum quality score threshold for sequences.

FASTA File Manipulation

  • Get quick info on each sequence in FASTA file.
  • Convert multiline FASTA files to one-line format.
  • Shift the start position of one-line FASTA sequences by a specified amount.

BLAST Output Parsing

  • Extract the top hit for each query from BLAST output files.
  • Results are sorted alphabetically for easy analysis.

DNA, RNA & amino acid classes

  • Calculates GC content in DNA and RNA sequences.
  • Prints complement sequence for DNA.
  • Transcribes DNA sequence to RNA.
  • Prints RNA sequence in codons.
  • Finds motifs in nucleic acids sequences.
  • Translates RNA sequence to amino acid (without biological meaning, it does it “dumbly”).
  • Calculates molecular weight of amino acid sequence.

Custom RandomForestClassifier

  • Self written implementation of RandomForestClassifier.
  • Has parallelisation functionality (speeds up 2 times when specifying 2 threads).