March 29, 2012

Generic solutions and specific solutions

From certain points of view, programming is fairly easy.  Most of the common problems have known solutions.  Often, programming is simply applying common solutions to specific problems.

I have a web app whose sole purpose is to create thumbnails of images whose URLs are given.  It took a few seconds for serving each request, so I applied a common solution to the “my program is too slow” problem: caching.  With two levels of caching, I brought down the average response time to around 150ms.  (Caching code is in boldface.)
if PICASA_URL_REGEX.matches(original_url):
  # Resizing Picasa URLs is child's play.
  # All we need is some URL tweaking.
  response.Redirect(resize_picasa(original_url))
  return

if FLICKR_URL_REGEX.matches(original_url):
  original_url = mid_size_flickr_url(original_url)
# local_cache is a simple hash map.
if original_url in local_cache:
  response.Redirect(local_cache[original_url])
  return

thumb_url = memcache.Get(original_url)
if not thumb_url:
  thumb_url = data_store.Get(original_url)
  if not thumb_url:
    image_bytes = http_client.Fetch(original_url)
    thumbnail_bytes = image_resizer.Resize(image_bytes)
    thumb_url = data_store.Store(original_url, thumbnail_bytes)
  memcache.Set(original_url, thumb_url)
local_cache[original_url] = thumb_url
response.Redirect(thumb_url)
Though 150ms isn’t too bad, I had been wanting to optimise this further for several months.  Only I couldn’t figure out how.  Yesterday, something dawned on me: I was using the caches all wrongly!  I was unnecessarily repeating a bunch of string operations and regular expression searches for every request before even touching the caches.  A better approach would be to hit the cache as early as possible.  I changed the algorithm to be like:
# response_cache is a a simple hash map.
if request.query_string in response_cache:
  response.Redirect(response_cache[request.query_string])
  return

if PICASA_URL_REGEX.matches(original_url):
  # Resizing Picasa URLs is child's play.
  # All we need is some URL tweaking.
  thumb_url = resize_picasa(original_url)
  response_cache[request.query_string] = thumb_url
  response.Redirect(thumb_url)
  return

if FLICKR_URL_REGEX.matches(original_url):
  original_url = mid_size_flickr_url(original_url)
thumb_url = memcache.Get(original_url)
if not thumb_url:
  thumb_url = data_store.Get(original_url)
  if not thumb_url:
    image_bytes = http_client.Fetch(original_url)
    thumbnail_bytes = image_resizer.Resize(image_bytes)
    thumb_url = data_store.Store(original_url, thumbnail_bytes)
  memcache.Set(original_url, thumb_url)
response_cache[request.query_string] = thumb_url
response.Redirect(thumb_url)
Immediately the average response time came down to around 40ms.  Plus, it’s not unusual now to find requests that get served in less than 5ms!

Algorithmic difference between the two code snippets is subtle, yet the performance difference is substantial.  What makes programming interesting is the fact that knowing the solution is not always sufficient.  Ability to adapt a generic solution to specific problems is one of the things that makes one a good programmer.

Appendix
Sample logs before code change:
Sample HTTP request log before code change
After code change:
Sample HTTP request log after code change

No comments:

Post a Comment