2023-07-26 –, Online talks and posters
Historical data is useful in the financial industry, particularly for back-testing trading strategies. However, a single machine can't store real-time data for even a single ticker symbol. Thus, constructing a distributed scalable computer system to persist real-time financial data requires time and a significant financial commitment. Using cloud computing services reduces this barrier. This talk describes a real-time cloud-based financial data streaming and storage system implemented in Julia.
This section will introduce our process for selecting cloud service providers and data sources, compare various system backend implementations, and conclude with a system evaluation.
Reasons for AWS cloud
Initially, we deployed our program on the Amazon Web Services (AWS) cloud because it offers scalable data storage and can handle concurrent requests from multiple customers/agents. Among the leading cloud service providers (such as Microsoft Azure or Google Cloud), AWS is currently the market leader and provides one-third of the world's cloud infrastructure services.
Data Source
We chose Polygon as the data source for our data feed, and particularly, their WebSocket data feed mechanism. Polygon is a leading financial market data platform that supplies market data for stocks, options, futures, and cryptocurrencies. Their aggregated data have up to a second resolution.
Key implementation
The backend, the main part of this talk, was implemented in Julia using existing packages. We explored two ways to do this; the first is to use various AWS services via AWS.jl, LambdaMaker.jl, etc. Those packages provide different microservices to build up our system. The second approach was to set up EC2 (Amazon Elastic Compute Cloud) instances and deploy the program to these instances. The first approach was the easiest to develop but strongly depended on AWS solutions. For example, we used DynamoDb (an AWS NoSQL database service) to store real-time market data for certain tickers. However, if Amazon shuts down DynamoDb or migrates to another service, we must make changes accordingly. Hence the second approach is not constrained by particular Amazon services; it requires virtual machines to be configured by users and does not lead to vendor lock-in issues.
As for the frontend, we display real-time data from the data source. This component was constructed as a Typescript React App.
The following diagram describes how the frontend client/user interacts with the backend AWS services.
Evaluations
We considered several key factors when evaluating the real-time data storage system: latency, scalability, accuracy, reliability, and throughput. Of these, scalability and throughput may be influenced by our choice of cloud service. We have preliminary implemented the first backend using AWS solutions, and the frontend is a React application deployed on AWS Amplify service. The frontend does not have a limit on concurrency but has some constraints such as total data transfer, request duration, and the total number of requests in the free tier. We will be able to determine which one reaches the limit first after building the frontend features. Our backend service DynamoDB has limitations on concurrency, which is 25 read requests per second. If we consider each request of historical data requires one read from the database, our free tier supports a maximum of 25 concurrent requests at the same time.
Latency can be measured in various ways, including end-to-end latency and processing latency. End-to-end latency measures the time taken from a request to be made to a response to be received. In this scenario, it measures the data processing time from start to finish for a given ticker. We have implemented the backend with various AWS solutions, and despite some spike processing time of up to 2,000ms to 4,000ms, the average data processing time is now 100-200ms. Processing latency, on the other hand, measures the time it takes for the system to process a specific input or request. In this scenario, it measures the steps of backend data processing: storing a ticker data in the database and notifying the client in real-time. The average processing time for storing a ticker data is 30ms to 100ms, while notifying the client in real-time is 70ms to 120ms. We will also be able to evaluate the "real-time" performance of our system using these two different backend implementations once the second implementation is completed.
To ensure the accuracy and reliability of our system, we need to consistently provide correct and current data in an optimized way, especially in the face of client changes or malicious attacks. Our backend system should be able to handle and optimize for these scenarios without being affected.