Midterm 2
posted: 9 am, 05.07.2021
due: midnight, 09.07.2021
You are not allowed to work in groups for the midterm. You can look in books and online, but
you must not discuss the exam problems with anyone. If you don’t understand a question, contact
Travis or one of the TAs for an explanation (but no hints). All problems are weighted the same
and, for problems broken down into subproblems, all subproblems are weighted the same.
As with assignments, answers the markers consider hard to read will get 0!
1. Suppose your class is on a field trip to an island with n towns on it, connected by m townto-town buses (which run in both directions), run by c companies. Each company’s buses
are a different colour, and there can be buses from two or more companies running between
two towns. You have a map showing which companies run buses between which towns. The
drivers have a relaxed attitude to schedules and the buses run often, so there’s no telling
which buses will be arriving and leaving next.
Your classmate Ryan has wandered off and got lost and you’re (somewhat reluctantly) trying
to find him. You’d told him which buses the class was supposed to take during the day, and
given him tickets from the appropriate companies, the same colours as the buses and stapled
together in the right order. Ryan didn’t remember which towns the class was going to visit,
however, so he always took the first bus he saw of the colour of the next ticket, tearing off
that ticket and giving it to the driver.
Design a polynomial-time dynamic-programming algorithm that, given the map of the bus
routes, Ryan’s starting point and the colours of the buses he took, calculates the probability
Ryan is in each of the n towns.
For example, if the map is as shown below and Ryan started in town A and took a red bus,
a green bus, a blue bus and another green bus, then he could have gone
A-B-C-D-B,
A-B-D-A-C,
A-B-D-C-A,
A-B-D-C-B,
A-C-A-B-C,
A-C-A-B-D,
A-C-A-D-B,
A-C-B-A-C .
The probability Ryan went first from A to B is 0.5, and the probability he went first from A
to C is 0.5. Therefore, after one trip, the probability he was in B is 0.5 and the probability
he was in C is 0.5.
The probability Ryan’s second trip took him from B to C is 0.5 times the probability he was
in B, or 0.25. The probability it took him from B to D is also 0.5 times the probability he
was in B, or 0.25. The probability it took him from C to A is 0.5 times the probability he
was in C, or 0.25. The probability it took him from C to B is 0.5 times the probability he
1
was in C, or 0.25. Therefore, after two trips, the probability is 0.25 he was in any particular
town.
The probability Ryan’s third trip took him from A to B is 0.5 times the probability he was
in A, or 0.125. The probability it took him from A to D is 0.5 times the probability he was
in A, or 0.125. The probability it took him from B to A is the probability he was in A, or
0.25. The probability it took him from C to D is the probability he was in C, or 0.25. The
probability it took him from D to A is 0.5 times the probability he was in D, or 0.125. The
probability it took him from D to C is 0.5 times the probability he was in D, or 0.125.
Therefore, after three trips, the probabilities Ryan was in A, B, C, D are, respectively, 0.25 +
0.125 = 0.375 (the probability his third trip took him from B to A plus the probability it
took him from D to A), 0.125, 0.125 and 0.125 + 0.25 = 0.375 (the probability his third trip
took him from A to D plus the probability it took him from C to D).
You can compute the probability of Ryan being in a particular town after four trips and five
trips similarly.
Isn’t it lucky you’re on an island, so you can’t accidentally lose Ryan forever?
(Hint: first design an algorithm that computes the number of ways Ryan could have ended
up in a town, and then modify it to compute the probability.)
B
A
D
C
2. Your professor Travis told your TA Sarah that he was going to ask you to modify the code
in the lecture notes for computing edit distance, to compute an optimal alignment using only
one pass through the matrix.* (The question on Assignment 5 allows filling in the matrix
and then walking back from the bottom right corner to the top left corner to compute the
alignment.)
Travis claimed it was possible by keeping two more arrays, B[0..m, 0..n] and C[0..m, 0..n],
where B[i, j] is a pointer to a string of length O(m + n) containing the top line in an optimal
alignment of S[1..i] to T [1..j], and C[i, j] is a pointer to a string of length O(m+n) containing
the bottom line in that alignment. To compute B[i, j], we sprintf into an empty string either
B[i − 1, j] or B[i − 1, j − 1] or B[i, j − 1], followed by either S[i] or ‘-’. To compute C[i, j],
we sprintf into an empty string either C[i − 1, j] or C[i − 1, j − 1] or C[i, j − 1], followed by
either T [j] or ‘-’.
Sarah correctly pointed out that sprintfing a string of length O(m + n) takes O(m + n)
time, so Travis’s solution takes cubic time. Help Travis by figuring out how to modify his
solution to use quadratic time again.
*
Yes, this question is based on a true story.
2
(Hint: pointers are your friends!)
Bonus (worth 10% of the midterm): Can you reduce the space usage to O((m + n)k),
assuming you’re given the edit distance k between S and T ?
(Hint: banded dynamic programming and garbage collections are your friends!)
3. For the problem Longest Kind-Of Increasing Subsequence (LKOIS), we’re given a
sequence S[1..n] of integers and asked to find the longest subsequence S 0 of S such that
S 0 [i − 1] − 3 ≤ S 0 [i] for 1 < i ≤ |S 0 |. Give an O(n log n) algorithm for LKOIS.
4. For the problem Partition, we’re given a set S of positive integers that sum to 2t and asked
if there is a subset of S that sums to exactly t. Prove Partition is NP-complete by
showing Partition is in NP,
reducing one of the NP-complete problems we’ve seen in class to Partition.
5. Write a program that, given a list of the edges in a connected graph G on the vertices
1, . . . , n, in polynomial time outputs a Boolean formula F that is satisfiable if and only if G
has a Hamiltonian path. You can assume the list of edges looks something like
(1,
(1,
(4,
(6,
(5,
2)
3)
2)
5)
3)
with one pair per line, and your output should consist of a single line containing copies of
space, (, ), AND, OR, NOT and variables that look something like x1, x2, etc.
3
CSCI 3110: Design and Analysis of Algorithms
May 4th, 2020
Lecture 1a
Lecturer: Travis Gagie
Scribe: Travis Gagie
We start the course with 5 key ideas:
• computer scientists study computations, not necessarily computers;
• you can do computations and computer science without a computer;
• two problems can seem similar while one is easy and the other is hard;
• two problems can seem unrelated while being somehow equivalent;
• “basic” techniques — such as divide and conquer — can be applied to hard problems.
Computers perform computations and computations are performed by computers, but
when Turing wrote about computers in the 1930s, he meant people. These days people
use computers to do lots of things without knowing or caring what computations are being
carried out, so let us consider a computational problem that predates electronic computers.
1
Eulerian cycles
In the 1700s, the citizens of Königsberg in Prussia wrote to Leonhard Euler, asking him if
someone could walk across each of the bridges in the city exactly once and arrive back where
they started. In the process of showing it was impossible, Euler invented graph theory.
If we model the city as a graph, with the shores and islands as vertices and the bridges
as edges as shown in Figure 1, then such a walk corresponds to a cycle in the graph that
crosses each edge once. Obviously the graph must be connected, which it is in this example;
also, for every arrival at a vertex there must be a matching departure, and vice versa, so
the degree of every vertex must be even, which is not true here. It is easy to see that the
two conditions mentioned above are necessary, and Euler showed that, at least for finite
graphs, they are also sufficient.
Definition 1 An Eulerian cycle of a graph is a cycle that contains each edge exactly once.
A graph is Eulerian if it contains an Eulerian cycle.
Theorem 2 A finite graph is Eulerian if and only if it is connected and all its vertices’
degrees are even.
Proof Suppose a finite graph meets the conditions and we start at a vertex of the graph
and start walking, never crossing the same edge twice. Either we eventually arrive back
where we started, or we get stuck at some other vertex with no way to leave. To see that
the second case is impossible, suppose that whenever we cross an edge, we remove it from
the graph, reducing the degree of the vertex we just left by 1 and the degree of the vertex
we just arrived at by 1. Since all the degrees are initially even, after we leave the vertex
1a-1
Figure 1: A map of Königsberg with the bridges shown in green (left) and the corresponding graph (right). (Both images from https://en.wikipedia.org/wiki/Seven_Bridges_
of_Konigsberg.)
where we start and until we return to it, its degree is odd and the degree of the vertex we
are currently visiting is odd. Since 0 is even, it follows that we always have an edge to leave
across, at least until we return to where we started.
When we return to where we started the graph may be disconnected but, as long as there
are edges, we can again choose a starting point and walk until we return to it. Repeating
this until there are no edges left in the graph, we decompose the graph into edge-disjoint
cycles. We can take any of the cycles in our current collection, walk around it until we
reach a vertex it shares with another cycle, walk around that other cycle completely, and
then continue walking around the first cycle from the shared vertex. This gives us a single
cycle containing all the edges of those two cycles, so we can replace them with it. Repeating
this, eventually we obtain a single cycle, which crosses each edge in the graph exactly once.
Figure 2 shows an example.
By essentially the same arguments, Theorem 2 can be extended to directed cycles in
directed graphs:
Theorem 3 A finite directed graph is Eulerian if and only the underlying undirected graph
is Eulerian and each vertex’s in-degree and out-degree are equal.
Eulerian cycles are not just a mathematical curiousity: for example, bioinformaticians
build Eulerian cycles of de Bruijn for de novo assembly of genomes. Early sequencing and
assembly of the COVID-19 genome was essential in the development of tests for infection,
without which the world would be in an even worse situation right now.
To sequence a genome, a sample is put through a sequencing machine, which produces
fragments of DNA called reads. Most sequencing machines produce short reads, which
are about 150 or 200 characters long, while some now produce long reads, which can be
thousands of characters long but are much less accurate. In comparison, the COVID-19
genome is about 30 thousand characters long and the human genome is about 3 billion
characters long.
1a-2
To help you see how building Eulerian cycles can help bioinformaticians piece the reads
together to get the whole genome, your first homework problem is “Tanya and Password”
from Codeforces (https://codeforces.com/problemset/problem/508/D):
While dad was at work, a little girl Tanya decided to play with dad’s password to his secret database. Dad’s password is a string consisting of n + 2
characters. She has written all the possible n three-letter continuous substrings
of the password on pieces of paper, one for each piece of paper, and threw the
password out. Each three-letter substring was written the number of times it
occurred in the password. Thus, Tanya ended up with n pieces of paper.
Then Tanya realized that dad will be upset to learn about her game and
decided to restore the password or at least any string corresponding to the final
set of three-letter strings. You have to help her in this difficult task. We know
that dad’s password consisted of lowercase and uppercase letters of the Latin
alphabet and digits. Uppercase and lowercase letters of the Latin alphabet are
considered distinct.
Actually, because Dad’s password need not end with the same three-letter substring it
starts with, what we are looking for in this problem is an Eulerian path or Eulerian tour —
a walk that crosses each edge once but may end at a different vertex that where it started
— not necessarily and Eulerian cycle. In particular, we are looking for an Eulerian path
in a de Bruijn graph, which is a graph in which the vertices are tuples of characters and
there can be an edge from one vertex to another only if it is possible to delete the leftmost
character from the tuple for the first vertex, add a character on the right, and obtain the
tuple for the second vertex.
If a directed graph has an Eulerian path then we need add at most one edge such that
it contains an Eulerian cycle: either each vertex’s in-degree is equal to its out-degree, or
there is one vertex with 1 more in-edge than out-edge and one vertex with 1 more out-edge
than in-edge; we add an edge from the first vertex to the second one. Figure 3 shows the
graph for the first example instance on the Codeforces page, abacaba, together with the
edge we must add to make the graph Eulerian. Note that the vertices are pairs, because the
three-letter substrings in the problem correspond to edges that move us from pair to pair!
Once we find an Eulerian cycle in the graph with the extra edge inserted, we can delete
that edge and obtain an Eulerian path.
A reasonably efficient implementation of Euler’s construction will solve the Codeforces
problem in linear time, but if we change the problem just a little — asking that the cycle
visit each vertex exactly once instead of cross each edge exactly once — then no one knows
a polynomial-time algorithm for it, and we strongly suspect no such algorithm exists. Generally we consider an algorithm to be efficient if it takes time polynomial in the size of its
input (even if sometimes that definition is too loose to be useful in practice).
2
Hamiltonian cycles
A Hamiltonian cycle of a graph is a cycle that visits each vertex exactly once: for example,
see http://dm.compsciclub.ru/app/quiz-hamiltonian-cycle. The problem of finding
1a-3
b
c
a
d
Euler cycle:
g
abcdecgefbgfa
f
e
b
b
c
a
c
d
g
g
e
f
cdec
abgfa
e
f
bcgefb
abcgefbgfa
abcdecgefbgfa
Figure 2: A decomposition of a graph into edge-disjoint cycles, and how they can be
combined into an Euler cycle.
AB
BA
CA
AC
Figure 3: The graph for the first example from the Codeforces page, with the dashed line
at the top showing the edge we add to make the graph Eulerian. The Eulerian path visits
AB, BA, AC, CA, AB again, and BA again, and corresponds to the strings ABACABA.
1a-4
a Hamiltonian cycle is not a mathematical curiousity either, since it is a special case of the
Travelling Salesperson (TSP) problem, which is of practical interest to logisticians.
For TSP, we are given a map of some cities and roads, modelled as an edge-weighted
graph, and asked to plan a route for a salesperson to visit each city and return home
while travelling the minimum distance, by finding a cycle that visits each vertex exactly
once and has the minimum possible total edge-weight. If all the edge-weights are 1 then
finding a cycle with total edge-weight equal to the number of cities is equivalent to finding a
Hamiltonian cycle. Therefore, if we cannot find a Hamiltonian cycle, then we cannot solve
TSP optimally in general. Later in the course we will see that the reverse holds as well.
Just because we cannot find Hamiltonian cycles in general, does not mean there is no
reason to study the problem of doing so. Sometimes it is easy for us to tell in polynomial
if a Hamiltonian cycle does not does not exists — when the graph is a tree, for example
— and sometimes, even if we cannot find a polynomial-time algorithm, we can find one
that is better than just trying all the possible cycles. For example, we believe there is no
polynomial-time algorithm to determine if there is a Hamiltonian cycle in a graph even
when the graph is restricted to be planar — meaning it can be drawn in the plane with no
edges crossing each other — and the maximum degree
is 3.
m
The obvious algorithm is to try each of the n possible subsets of n of the m edges,
where n is the number of vertices, and see if it is a Hamiltonian cycle. Since each edge has
2 endpoints and there are at most 3n endpoints in the graph, 2m ≤ 3n and so the there
are at most 23n/2 ≤ 2.83n subsets of the edges in total. However, since the graph is planar,
there is a theorem that tells us that we can find a planar separator in quadratic time: that
√
is, a subset of at most 2 n vertices such that removing them breaks the graph into pieces
each of size at most 2n/3. If we take such a separator and consider all the subsets of edges
incident to its vertices which might not be in the Hamiltonian cycle, and recurse on the
pieces — not looking for Hamiltonian cycles of the pieces but for subsets of edges that visit
all their vertices exactly once and join up correctly with the edges we have chosen√incident
to the separator — then we get a divide-and-conquer algorithm that takes 2O( n) time,
which is asymptotically faster than an algorithm taking something like 2.83n time.
3
Summary
So far, we have discussed the first three of the key ideas we started with, and touched on
the fifth:
• computer scientists study computations, not necessarily computers;
• you can do computations and computer science without a computer;
• two problems can seem similar while one is easy and the other is hard;
• two problems can seem unrelated while being somehow equivalent;
• “basic” techniques — such as divide and conquer — can be applied to hard problems.
In the next half-lecture, we will talk more about the divide-and-conquer approach to
algorithm design, and see how to use it to solve the problem of finding the maximum-sum
1a-5
subarray in an array, which we can do efficiently (in polynomial time). We will talk about reductions between that problem and a variant of it, and them return to discussing hard problems, time time graph colouring: http://dm.compsciclub.ru/app/quiz-map-coloring.
To drive home the idea that figuring out what makes problems hard is itself a hard and
interesting problem, consider Figure 4. Actually, though, the difference between finding
an Eulerian cycle and finding a Hamiltonian cycle is somehow more subtle than what is
considered in that cartoon. Determining whether a photo is of a bird is a “messy” problem,
in that there may not be enough information to judge, people may differ on what they
accept as a bird, etc. Finding cycles in graphs is clean and well-defined, and the division
between easy and hard here is something deep and interesting.
Figure 4: We will consider more interesting separations between “easy” and “hard”. (Image from https://xkcd.com/1425/.)
1a-6
CSCI 3110: Design and Analysis of Algorithms
May 6th, 2020
Lecture 1b: The Maximum Sub Subarray Problem
Lecturer: Travis Gagie
Scribes: A. Ahmed, B. Zhao, F. Fei, K. McDonald-Landry, M. Rajendran, O. Dechant
1
Introduction to the Problem
Initially, the maximum subarray problem was surveyed by Ulf Grenander around 1980s in
digital images for estimation of some patterns. To make it simple, there is a one-dimension
array that may include either negative or positive numbers. This problem needs to find at
least one contiguous subarray or a singleton subarray which contains the maximum sum.
Hence, there are 3 conditions we may consider:
1. If the numbers of the array a...

Purchase answer to see full
attachment