Sei sulla pagina 1di 5

93

12.1. Nearest repetition

12.1

Nearest repetition

People do not like reading text in which a word is used multiple times in a short
paragraph. You are to write a function which helps identify such a problem.
Problem 12.1 : Let s be an array of strings. Write a function which finds a closest pair
of equal entries. For example, if s = [All, work, and, no, play, makes,
for, no, work, no, fun, and, no, results], then the second and third
occurrences of no is the closest pair.
pg. 268

ElementsOfProgrammingInterviews.com

13.1. Counting sort (

13.1

95

Counting sort (

Suppose you need to reorder the elements of a very large array so that equal elements
appear together. More formally, if A is an array, you are to permute the elements of
A so that after the permutation 8i < j < k A[i] = A[k] ) A[j] = A[i].
If the entries are integers, this can be done by sorting the array. If the number of
distinct integers is very small relative to the size of the array, an efficient approach to
sorting the array is to count the number of occurrences of each distinct integer and
write the appropriate number of each integer, in sorted order, to the array.
Problem 13.1 : You are given an array of n Person objects. Each Person object has
a field key. Rearrange the elements of the array so that Person objects with equal
keys appear together. The order in which distinct keys appear is not important.
Your algorithm must run in O(n) time and O(k) additional space. How would your
solution change if keys have to appear in sorted order?
pg. 269

ElementsOfProgrammingInterviews.com

268
string candidate , buf;
int count = 0;
while (sin >> buf) {
if ( count == 0) {
candidate = buf;
count = 1;
} else if ( candidate == buf) {
++ count ;
} else {
--count ;
}
}
return candidate ;

2
3
4
5
6
7
8
9
10
11
12
13
14
15

Solution 13.1

The code above assumes a majority word exists in the sequence. If no word has a
strict majority, it still returns a word from the stream, albeit without any meaningful
guarantees on how common that word is. We could check with a second pass whether
the returned word was a majority. Similar ideas can be used to identify words that
appear more than n/k times in the sequence, as discussed in Problem ?? on Page ??.
Problem 12.1, pg. 93 : Let s be an array of strings. Write a function which finds a closest pair
of equal entries. For example, if s = [All, work, and, no, play, makes, for,
no, work, no, fun, and, no, results], then the second and third occurrences
of no is the closest pair.
Solution 12.1: We make a scan through the array. For each i, we determine the
index j of the most recent occurrence of s[i]. If i j is less than the dierence of the
closest duplicate pair seen so far, we update that dierence to i j. The most recent
occurrence of s[i] is computed through a hash table lookup. The time complexity is
O(n), since we perform a constant amount of work per entry. The space complexity
is O(d), where d is the number of distinct strings in the array.
1
2
3
4
5
6
7
8
9
10
11
12

int find_nearest_repetition (const vector <string > &s) {


unordered_map <string , int > string_to_location ;
int closest_dis = numeric_limits <int >:: max ();
for (int i = 0; i < s.size (); ++i) {
auto it = string_to_location .find(s[i]);
if (it != string_to_location .end ()) {
closest_dis = min( closest_dis , i - it -> second );
}
string_to_location [s[i]] = i;
}
return closest_dis ;
}

Problem 13.1, pg. 95 : You are given an array of n Person objects. Each Person object
has a field key. Rearrange the elements of the array so that Person objects with equal keys
appear together. The order in which distinct keys appear is not important. Your algorithm

ElementsOfProgrammingInterviews.com

269

Solution 13.1

must run in O(n) time and O(k) additional space. How would your solution change if keys
have to appear in sorted order?
Solution 13.1: We use the approach described in the introduction to the problem.
However, we cannot apply it directly, since we need to write objects, not integers
two objects may have the same key but other fields may be dierent.
We use a hash table C to count the number of distinct occurrences of each key. We
iterate over each key k in C and keep a cumulative count s which is the starting oset
in the array where elements with key k are to be placed. We put the key-value pair
(k, s) in a hash table Mbasically M partitions the array into the subarrays holding
objects with equal keys.
We then iteratively get a key k from M and swap the element e at ks current oset
(which we get from M) with the location appropriate for es key e.key (which we also
get from M). Since e is now in its correct location, we update M by advancing the
oset corresponding to e.key, taking care to remove e.key from M when all elements
with key equal to e.key are correctly placed.
The time complexity is O(n), since the first pass entails n hash table inserts, and the
second pass performs a constant amount of work to move one element to the right
location. (Selecting an arbitrary key from a hash table is a constant time operation.)
The additional space complexity dictated by C and M, and is O(k), where k is the
number of distinct keys.
If the objects are additionally required to appear in sorted key order, we can store
M using a BST-based map instead of a hash table. The time complexity becomes
O(n + k log k), since BST insertion takes time O(log k). This should make sense, since
if k = n, we are doing a regular sort, which is known to be O(n log n) for sorting based
on comparisons.
1
2
3
4
5
6
7
8
9
10
11
12

template <typename KeyType >


void counting_sort (vector <Person <KeyType >> & people ) {
unordered_map <KeyType , int > key_to_count ;
for ( const Person <KeyType > &p : people ) {
++ key_to_count [p.key_ ];
}
unordered_map <KeyType , int > key_to_offset ;
int offset = 0;
for ( const auto p : key_to_count ) {
key_to_offset [p. first ] = offset ;
offset += p. second ;
}

13
14
15
16
17
18
19
20
21
22
23

while ( key_to_offset .size ()) {


auto from = key_to_offset . begin ();
auto to = key_to_offset .find( people [from -> second ]. key_);
swap( people [from -> second ], people [to -> second ]);
// Use key_to_count to see when we are finished with a particular key
if (-- key_to_count [to ->first ]) {
++to -> second ;
} else {
key_to_offset . erase(to);
}

ElementsOfProgrammingInterviews.com

270
}

24
25

Solution 14.1

Problem 14.1, pg. 96 : Write a function that takes as input the root of a binary tree whose
nodes have a key field, and returns true i the tree satisfies the BST property.
Solution 14.1: Several solutions exist, which dier in terms of their space and time
complexity, and the eort needed to code them.
The simplest is to start with the root r, and compute the maximum key r.left.max
stored in the roots left subtree, and the minimum key r.right.min in the roots right
subtree. Then we check that the key at the root is greater than or equal to r.right.min
and less than or equal to r.left.max. If these checks pass, we continue checking the
roots left and right subtree recursively.
Computing the minimum key in a binary tree is straightforward: we compare the
key stored at the root with the minimum key stored in its left subtree and with the
minimum key stored in its right subtree. The maximum key is computed similarly.
(Note that the minimum may be in either subtree, since the tree may not satisfy the
BST property.)
The problem with this approach is that it will repeatedly traverse subtrees. In a
worst case, when the tree is BST and each nodes left child is empty, its complexity
is O(n2 ), where n is the number of nodes. The complexity can be improved to O(n)
by caching the largest and smallest keys at each node; this requires O(n) additional
storage.
We now present two approaches which have O(n) time complexity and O(h)
additional space complexity.
The first, more straightforward approach, is to check constraints on the values for
each subtree. The initial constraint comes from the root. Each node in its left (right)
child must have a value less than or equal (greater than or equal) to the value at the
root. This idea generalizes: if all nodes in a tree rooted at t must have values in the
range [l, u], and the value at t is w 2 [l, u], then all values in the left subtree of t must
be in the range [l, w], and all values stored in the right subtree of t must be in the
range [w, u]. The code below uses this approach.
1
2
3
4
5
6
7
8

template <typename T>


bool is_BST_helper (const shared_ptr < BinaryTree <T>> &r, const T &lower ,
const T &upper) {
if (!r) {
return true;
} else if (r->data < lower || r->data > upper ) {
return false ;
}

return is_BST_helper (r->left , lower , r->data) &&


is_BST_helper (r->right , r->data , upper );

10
11
12

13
14
15

template <typename T>


bool is_BST ( const shared_ptr < BinaryTree <T>> &r) {

ElementsOfProgrammingInterviews.com

Potrebbero piacerti anche