Sequence Alignment of DNA Proteins with BLOSUM62

Python program for global and local sequence alignment between DNA and protein using BLOSUM62 and gap penalties.

Overview

This project implements sequence alignment between a DNA sequence (translated into protein) and a protein sequence, using both global and local alignment. The tool is flexible for both types of queries, and is controlled via command line arguments.

The algorithm leverages:

  • Needleman-Wunsch/Smith-Waterman dynamic programming for alignment.
  • Translation of DNA to protein using BioPython.
  • Flexible use of the BLOSUM62 substitution matrix for scoring.
  • A user-specified, length-independent gap penalty.

Features

  • Global and Local Alignment
    Easily switch between global and local (Smith-Waterman) alignment with a command-line flag.
  • Translation of DNA
    The DNA sequence is translated (forward frames) before alignment to allow meaningful comparison with the protein sequence.
  • Customizable Gap Penalty
    The user sets the gap penalty (can be negative, e.g. -5).
  • Flexible Scoring
    Accepts any substitution matrix in standard format (default: BLOSUM62).
  • Comprehensive Output
    Reports:
    • Length of translated DNA and given protein.
    • The full DP (dynamic programming) table (optional, for debugging/analysis).
    • The optimal alignment score.
    • The final aligned sequences (in standard block format).

Motivation

Sequence alignment is a core tool in bioinformatics, enabling comparison of genetic/protein sequences, annotation, and evolutionary studies. This project supports direct DNA-to-protein comparisons with custom scoring and is adaptable for many use-cases.

GitHub Repo

Link

How To Run

```bash python3 sequencealignment.py filed.fasta filep.fasta 1 -5 blosum62.txt