Dec 2019 (Revised on Nov 2020)

Problem Description

With the rise of internet productivity tools such as email, text messaging and social media, companies are required to be nimble and act within a couple of hours.

Fortunately, advancements in hardware have allowed us to parse information in real time, providing wisdom in making these split-second decisions.

In this portfolio, we will be building a simple real time tracker through AWS and visualizing it in Highcharts.

A Simple Use Case

In this study, we will be comparing the performance of two exchange traded funds, VTI and VXUS.

Both track the economy of United States, and the rest of the world respectively.

Performances are defined by their daily returns:

\[ \frac{\text{Current Price} - \text{Price 24 Hours Ago}}{\text{Price 24 Hours Ago}}\]

The returns are measured in basis points ( 1 / 100th percent).

The Real Time Tracker

Building The Data Pipeline

A Schematic Overview

: Writing       : Reading

Database & Structure

We choose Amazon DynamoDB as the database since it provides sufficient free storage, writing and reading capacities.

The database is organized in the following format:

Ticker: The ID of the index fund (VTI)
Time: Time of day (09:30:30)
Last Price: Last recorded prices(151.00)
Last Returns: Last calculated returns (0.00005)
Last TS: Last modified timestamp (2019-12-12 09:30:30)

Writing the Data

There are 3 main components involved in writing:

  • Stock API: This API provides real-time stock/fund prices. In this project, we use Polygon through the Alpaca API.
  • Database: Where all information is stored.
  • updateTable Script: This Python script extracts raw information from the other 2 components, transforms it and writes it back into the database.

The Writing Pipeline

Reading The Data

There are 4 main components in the reading process:

  • Tracker: The tracker pings every 2 seconds to retrieve the latest returns.
  • Amazon API: This API acts as the gateway into the back-end ecosystem. It receives the pings and returns information to the tracker.
  • Database: Where all the information is stored.
  • getReturns Script: This Python script queries the latest returns from the database and formats them in a digestible manner.

The Reading Pipeline

Limitations

The Database

Under the current process, past prices of funds are replaced with new ones.

This approach was chosen so that DynamoDB storage would always fall within the free tier. If costs were not an issue, we should refrain from deleting for any future use.

Alternatively, we could also utilize S3 and EC2 since DynamoDB’s pricing may increase significantly with scale.

The 10 Minute Lag

The tracker is not purely real-time since there is a 10 minute lag.

The lag is to accommodate AWS Lambda’s free tier limitations.

By purchasing more request counts, we could potentially reduce the lag to 1 minute.