Yahoo Open Data Tables is probably the last free source of financial information available after Google shut down it’s finance API a few years back. In a recent project to analyze my stocks trades, I decided to cache 10 year historical prices form the Yahoo Open Data Tables.

Yahoo API queries are syntactically similar to SQL. For instance, to retrieve the stock prices for “YHOO” for the month of October 2016, I would structure my query as -

select * from where symbol = "YHOO" and startDate = "2016-10-01" and endDate = "2016-10-30"

However, an attempt to set startDate = "2006-01-01" and endDate = "2016-10-01" gets me -

  "query": {
    "count": 0,
    "created": "2016-10-21T05:09:05Z",
    "lang": "en-US",
    "results": null

additionally, the diagnostic information reveals that the error thrown is -

  "name": "",
  "verb": "select",
  "content": "Too many instructions executed: 50220753"

So, the way around was to use map the time range into several 365 day ranges and finally merge the results (reduce). To this end I wore a python function to fetch the data and a wrapper that would perform the map-reduce.

Here is my code -

import json
import urllib
import time
from datetime import datetime, timedelta, date

def yql_stock_history(symbol,start_date,end_date):
    endpoint = ""
    env = "store://"
    response_format = "json"
    query='select * from where symbol = "'+symbol+'" and startDate = "'+start_date+'" and endDate = "'+end_date+'"'
    params = {'format':response_format, 'env':env, 'q':query}
    req_data = urllib.parse.urlencode(params).encode('ascii')
        res = urllib.request.urlopen('',req_data)
        data ='utf-8')
    except urllib.error.URLError as e:
    json_parsed = json.loads(data)['query']['results']
    if (json_parsed != None):
        return json_parsed['quote']
        return None

def yql_stock_long_history(symbol, start_date, end_date):
    start_d=datetime.strptime(start_date, '%Y-%m-%d')
    end_d=datetime.strptime(end_date, '%Y-%m-%d')
    diff=end_d - start_d
    max_days = 365
    if (diff.days < 0):
        return None
    elif(diff.days > max_days):
        data_part = []
        s = start_d
        while((end_d-s).days >= max_days):
            price_history=yql_stock_history(symbol, str(s), str(s+timedelta(max_days)))
            s=s+timedelta(days = max_days+1)
            if (price_history == None):
        price_data = [b for sublist in data_part for b in sublist]
        price_history = yql_stock_history(symbol, str(s), str(s+timedelta(max_days)))
        price_data = [b for sublist in data_part for b in sublist]
        return price_data
    elif(diff.days == 1):
        return [yql_stock_history(symbol, start_date, end_date)]
        return yql_stock_history(symbol, start_date, end_date)

Here is the gist containing the code.

The wrapper yql_stock_long_history is the wrapper that performs the map reduce if the query is too large. Feel free to leave comments on the gist page.