View on GitHub

CS2040C

This is the lab related content for the module CS2040C (Data structures and Algorithms)

Lab 06

Feedback form (Please fill this form, to give feedback about this lab)

Here is a pdf version of the ppt I covered during the lab. And the codes that I (skimmed through) / (demonstrated live).

content.pdf (The pdf version of the ppt shown)

map.cpp (The map demo code)

IndexingCityNames.cpp (The code for the problem of indexing city names discussed in one of the slides)

KattisCompoundWords.cpp (The solution code for the problem Kattis Compound Words discussed in the lab which involves the use of set/map/set_intersection)

Challenge Questions -

Q) Try to the Kattis Compound Words question using O(N) memory. Assume that all the strings are of small length and therefore the length of the strings does not affect the memory analysis. Our original approach in the lab was putting O(N^2) strings in a set, which therefore requires O(N^2) memory. Now I want you to restrict the usage of at most O(N) memory and still print out all the N^2 strings. Do note that you would still be printing N^2 strings, which does not mean usage of N^2 memory.

Hint - Try to generate all the strings in a systematic manner such that you get them in sorted order. Use a heap / priority_queue.

Solution The solution idea is this -
Sort the strings in ascending order and label them s1 < s2 < ... < sn.
Observe that s1 + s2 < s1 + s3 < s1 + s4 < ... < s1 + sn. Let this be equation 1.
Similarly, s2 + s1 < s2 + s3 < s2 + s4 < ... < s2 + sn. Let this be equation 2.
And so on till - sn + s1 < sn + s2 < sn + s3 < ... < sn + s(n - 1). Let this be equation n.
Create a priority queue of strings which sorts them in ascending order, so techincally a min heap. Label it pq.
Push in {s1 + s2, s2 + s1, s3 + s1, s4 + s1, ..., sn + s1} inside the PQ, i.e all the first terms in equation 1, equation 2, equation 3, ... equation n.
Now pick the top element of pq, print it. Pop it. Then let us say this element which has printed just now is s(x) + s(y), then because you popped s(x) + s(y), you should push in s(x) + s(y + 1) inside the pq. Here I assume s(x) = x_th string.
Keep on doing this. Do note, that x ≠ y + 1, when pushing s(x) + s(y + 1).
Also note that the in pq each element should essentially be a tuple<string, int, int> where get<1> = a, get<2> = b, get<0> = s(a) + s(b).
The intutive idea of the algo was to first take the min term out of all the first terms in equation 1, equation 2, ... equation n. Then for that particular equation which gave the min term, move to the second term for that equation. And again take the minimum. So keep on taking the minimum term, and shifting the pointer for the equation which gives the mininum term by 1 position to the right.
Net time complexity is O(c * N * N * log(N)), where c is the average length of the strings. And the memory complexity is O(c * N) = O(N), when c is small enough to be ignored.