Searching
Note: extra information on pg 150
[! QUESTION] Should you use binary search?
- Do you need to search over a sorted array? If so, binary search should be your go-to algorithm.
Relevant problems
- Find Median From Data Stream: Find and update the median as new values are added
- When a new value is obtained, use binary search (e.g.
bisect_left
) to know where to add it. - Take the median value given the known length of the array
- When a new value is obtained, use binary search (e.g.
Binary Search
A solid and intuitive way to implement binary search is with a two-index approach, one starts on the first element the other on the last, then you manipulate the search range on each bisection. Binary search has a time complexity of O(log n)
.
Example: Binary search function
# Option A: Manually implemented
def bsearch_manual(target, A):
if len(nums) == 1 and nums[0] == target:
return 0
left, right = 0, len(A) - 1
while left <= right: # The `<=` is CRUCIAL here or the loop may not continue correctly
mid = ((right - left) // 2) + left
if target > A[mid]:
left = mid + 1 # The +1 is CRUCIAL or the loop may never complete
elif target < A[mid]:
right = mid - 1 # The -1 is CRUCIAL of the loop may never complete
else:
return mid
return None
# Option B: Implement via `bisect_left`
from bisect import bisect_left
def bsearch_bisect(target, A):
if target > A[-1] or target < A[0]:
return None
idx = bisect_left(A, target)
return idx if idx < len(A) and A[idx] == target else None
# Option A: Manually implemented
def bsearch_manual(target, A):
if len(nums) == 1 and nums[0] == target:
return 0
left, right = 0, len(A) - 1
while left <= right: # The `<=` is CRUCIAL here or the loop may not continue correctly
mid = ((right - left) // 2) + left
if target > A[mid]:
left = mid + 1 # The +1 is CRUCIAL or the loop may never complete
elif target < A[mid]:
right = mid - 1 # The -1 is CRUCIAL of the loop may never complete
else:
return mid
return None
# Option B: Implement via `bisect_left`
from bisect import bisect_left
def bsearch_bisect(target, A):
if target > A[-1] or target < A[0]:
return None
idx = bisect_left(A, target)
return idx if idx < len(A) and A[idx] == target else None
[! SUCCESS] Working with search algorithms
- Know how to implement binary search using the two-index approach
- If the interviewer allows it use the
bisect_left
andbisect_right
functions from thebisect
module instead of implementing your own.
bisect_left(A, target)
returns the index to the left of targetbisect_right(A, target)
returns the index to the right of target- It is easy to make overflow or incorrect index mistakes on binary search problems.
Problems
11.1 Search a sorted array for first occurrence of k
Booyeah! Solved this one with bisect_left
in about 6 minutes (bisect*left implements a solution that is close but not perfect for this specific problem -> see my code solution), and with manual implementation in about 15 more minutes. The key to my solution was, once I found a middle value that matched, I looped backwards, using knowledge that the array is sorted until a non-match was found. In terms of the bisect_method, you had to ensure the found index was equal to (but not greater than) the target index to filter out targets that aren't in the array. According to the authors, my implementation is the naive approach because when searching backwards your time complexity it O(n), though to bisect_left
solution would be fine. This is a good example of why to use bisect*\*
functions if allowed because they reduce your room for error.
Lessons Learned
- While binary search has time complexity O(log n), searching backwards from there has worst case time complexity O(n) which makes the worst case time complexity of the solution O(n), nullifying the benefit of binary search in the first place. The better solution is to continue halving (or reducing by one) the search space, saving the element you know is a match.
11.3 Search a cyclically sorted array
Good solution in a good amount of time, but beware, you missed a few edge cases
Lessons Learned
- If there are any, use advantage of special properties of the array, in this case, cyclically sorted implies that there exists two fully sorted sub-arrays => bisection will create a sorted sub array and an unsorted subarray, or if two sorted subarrays are found via bisection, then the bisect point is the smallest entry (unless the array truly is sorted).
- Be cautious of your edge cases. Slow it down to think them through instead of using trial and error.
11.4 Compute the integer square root
Original solution didn't use binary search at all, simply used a square-root followed by a floor operation. Since this doesn't address the concept I am trying to learn I also implemented a binary search procedure. My implementation was timely and successful. The core idea here is to initialize an array on the interval [0, k] and then check whether the square of the value at the midpoint was larger than the input k
. Next, select the half depending on whether youre value was larger or smaller than that.
11.8 find k largest
Didn't do great on this one, recognized the true brute for solution but not the heap solution (which shouldve been obvious because heaps should be under consideration anytime I'm concerned only with maximums and minimums). The key to the best solution was to use a randomly selected "pivot" index, around which you swap entries. This was the solution I was looking for with the array thoughts, but I didn't recognize how exactly to implement it. redo this problem
11.9 Find the missing IP
uhhh, what? It's not always clear what the authors are asking. Plus developing the IO for the problem would take a lot of time to actually test and write so not going to waste my time doing that. Good review of generators though. Didn't learn much other than that to do this type of searching, you can start with one of the bits, assess whether it's present using counter (e.g. a full iteration through the file to count the number of IPs that start with bit=0 and then assess if there are less than 2^31 bits that start with leftmost bit=0 then there is at least one IP address that starts with leftmost bit=0 and isn't in the file).
LC 278 Is Bad Version
Solution: Use binary search to identify the transition index between good versions and bad versions. If the middle value is a good version, and the middle value +1 is a bad version, then return this entry. Otherwise update according to binary search rules.
Trick: One trick with this question is that you should set the right pointer to mid
not mid - 1
since in the latter case you could find yourself in "always false" territory. Draw out a simple example and this will become clear.