Exercise 1 (10 points) A palindrome is a word, phrase, number or other sequence of units that
can be read the same way in either direction. E.g. the word level, the number XXXXXXXXXX, the
phrase Step on no pets.
Write a Python program, that reads a text file and searches for all palindromes in this file. The
program should write all palindromes (except phrases) found, together with their multiplicity
to an output file. Handle all strings case insensitive. I.e. the word Level is also a palindrome.
The input and output file should be specified as command line arguments. Copy some arbitrary
text (e.g. from the internet) and apply your program to it.
Exercise 2 (12 points) Write a Python program, that finds restriction sites in a DNA sequence.
Restriction sites are positions where restriction enzymes cut the DNA. They are usually recognized
by a short, specific sequence motif.
Here are the recognition sequences for the restriction enzymes PpuMI, MspA1I, and MslI:
PpuMI RGˆGWCCY
MspA1I CMGˆCKG
MslI CAYNNˆNNRTG
Note: K stands for G or T, M means A or C, N stands for A, C, G or T, R is A or G, W is A or
T, and Y is short for C or T. The caret (ˆ) indicates the cut site.
Given a file with DNA sequences use regular expressions to look for all restriction sites of
the three enzymes listed above and print the position after the cut site to an output file (e.g.
the position of the G for PpuMI). Make sure that the name of the input and output file can
be specified as command line arguments and exactly two command line arguments have been
specified.
Use the UCSC Genome Browser http://genome.ucsc.edu/ in order to download the DNA
sequence from human reference genome version hg38 of chromosome 22 band q13.1 (chr22
bp XXXXXXXXXX). Remove the header line (manually or skip it in your program) and
use the created file as input for your program.
Many restriction enzymes have so called palindromic recognition sequences. Read up what
palindromic means in the context of DNA sequences. Which of the three enzymes from above
has such a palindromic recognition sequence (argument your choice)? What is the advantage
of a palindromic recognition sequence?
Hint: Unlike for most computer scientists, for biologists the first base of a sequence is at
position 1 not 0.