Redash Python functions

November 18th, 2016


We’ve been using Redash at Sentry lately. I’d never heard of it before (Mode being the frontrunner in this category), but it’s damned good so far. And before you ask me about Metabase, it’s a totally different use case.

One of the best features is the Python query runner, allowing you to manipulate data from your SQL queries. You can then output to visualizations, csv, etc. The global python functions aren’t obvious (I looked through their code for them), documenting here for others. I also make liberal use of print, which the re:dash guys have thoughtfully made print to the page.

result

At the end of the query, re:dash looks at the variable result. If it’s in the right shape, it’ll be show in the table below. The result is a dictionary and is the same that is returned by execute_query and get_query_result. The format should be like this:

result = {
    "rows": [
        {"user_id": 1, "anonymous_id": "s2f1"},
    ],
    "columns": [
        {
            "name": "user_id",
            "friendly_name": "user_id",
            "type": TYPE_INTEGER,
        },
        {
            "name": "anonymous_id",
            "friendly_name": "anonymous_id",
            "type": TYPE_STRING,
        }
    ]
}

The available value types are:

  • TYPE_INTEGER
  • TYPE_STRING
  • TYPE_INTEGER
  • TYPE_FLOAT
  • TYPE_BOOLEAN
  • TYPE_DATE
  • TYPE_DATETIME

TYPE_DATE and TYPE_DATETIME are both actually strings, not python datetime objects. Use strptime to convert them. This will vary by database, but for Postgres, I made a helper function:

def convert_to_native_datetime(result):
    import datetime
    datetime_columns = [column['name'] for column in result['columns'] if column['type'] == TYPE_DATETIME]
    date_columns = [column['name'] for column in result['columns'] if column['type'] == TYPE_DATE]
    for row in result['rows']:
        for column in row:
            if column in datetime_columns:
                try:
                    row[column] = datetime.datetime.strptime(row[column], '%Y-%m-%dT%H:%M:%S')
                except ValueError:
                    # in case format includes milliseconds and timezones
                    # no currently handling for this (all our data is stored as the same timezone)
                    row[column] = datetime.datetime.strptime(row[column].split('+')[0].split('.')[0], '%Y-%m-%dT%H:%M:%S')

            elif column in date_columns:
                row[column] = datetime.date.strptime(row[column], '%Y-%m-%d')
    return result

execute_query

Let’s you…well, execute a query within Python.

result = execute_query("<data source name>", "your SQL query")

get_query_result

Grabs the results on a previous query. I prefer this, as it means while I’m debugging a python script, I’m not hitting the database over and over again.

get_query_result(<query_id>)

add_result_column

If you want to create a new result from scratch (and just append data to it), the base result dictionary is just {"columns": [], "rows": []}.

add_result_column(result, "<new column name>", "<new friendly name for column>", "<column type>")

add_result_row

values should be a dictionary with the keys being column names.

add_result_row(result, "<values>")

The current result format isn’t vey friendly to data analysts. At some point, I’ll probably create a helper function that allows us to work in something closer to data frames (like R) or maybe figure out how re:dash and pandas works together.

Importing the standard library

TYPE_DATE and TYPE_DATETIME still return strings, so you’ll need to import datetime. If you’re doing this within a Python function, because of how RestrictedPython works, you’ll need to import it from within the function. Top-of-query imports don’t work, as it seems like every function has its own namespace (as if it was its own file).

Does NOT work

import datetime

def hello():
    print(datetime.datetime.now())

Fixed!

def hello():
    import datetime
    print(datetime.datetime.now())