Back to News & Research

The Python Stack Behind Our Trading Research

EngineeringNovember 1, 2024

Building machine learning infrastructure for quantitative trading research means picking tools that work well together. Here's what we use and why.

Data Manipulation: Pandas & NumPy

Everything starts with data. Pandas handles our OHLCV (open, high, low, close, volume) market data, making it easy to resample timeframes, calculate rolling statistics, and align multiple data sources.

import pandas as pd
 
# Resample minute data to daily
daily = df.resample('1D').agg({
    'open': 'first',
    'high': 'max',
    'low': 'min',
    'close': 'last',
    'volume': 'sum'
})
 
# Calculate 20-day rolling volatility
df['volatility'] = df['close'].pct_change().rolling(20).std()

NumPy handles the heavy numerical work underneath—array operations, linear algebra, and fast statistical calculations.

Technical Indicators: ta

The ta library provides ready-made technical indicators without reinventing the wheel. RSI, MACD, Bollinger Bands, ATR—all the classics are there.

from ta.momentum import RSIIndicator
from ta.volatility import BollingerBands
 
# Add RSI
df['rsi'] = RSIIndicator(df['close'], window=14).rsi()
 
# Add Bollinger Bands
bb = BollingerBands(df['close'], window=20, window_dev=2)
df['bb_upper'] = bb.bollinger_hband()
df['bb_lower'] = bb.bollinger_lband()

These indicators become features for ML models or signals for rule-based strategies.

Machine Learning: scikit-learn & TensorFlow

For classical ML, scikit-learn provides a consistent API across algorithms. Train a model, make predictions, evaluate performance—the interface stays the same whether you're using a simple classifier or an ensemble method.

from sklearn.preprocessing import RobustScaler
from sklearn.model_selection import train_test_split
 
# Scale features (RobustScaler handles outliers well)
scaler = RobustScaler()
X_scaled = scaler.fit_transform(features)
 
# Train/test split
X_train, X_test, y_train, y_test = train_test_split(
    X_scaled, labels, test_size=0.2, shuffle=False
)

For sequence modeling and deep learning, we use TensorFlow/Keras. It's particularly useful when temporal patterns matter—feeding the model a window of recent data rather than just the current bar.

Market Data & Execution: Alpaca

Alpaca provides both historical data and live trading through a clean Python API. We use it for stocks and crypto.

from alpaca.data import StockHistoricalDataClient
from alpaca.data.requests import StockBarsRequest
from alpaca.data.timeframe import TimeFrame
 
client = StockHistoricalDataClient(api_key, secret_key)
 
request = StockBarsRequest(
    symbol_or_symbols="AAPL",
    timeframe=TimeFrame.Day,
    start="2023-01-01"
)
bars = client.get_stock_bars(request).df

The same library handles order submission when strategies go live—market orders, limit orders, position management.

Visualization: Plotly

Plotly creates interactive charts that make it easy to explore strategy performance. Zoom into specific periods, hover for exact values, toggle series on and off.

import plotly.graph_objects as go
 
fig = go.Figure()
fig.add_trace(go.Scatter(
    x=df.index,
    y=df['equity'],
    name='Equity Curve'
))
fig.update_layout(title='Strategy Performance')
fig.show()

We use it for equity curves, drawdown analysis, and comparing multiple strategies.

Deployment: the0

Once a strategy is ready, the0 handles deployment. It's an open-source bot scheduler I built specifically for running trading strategies. Define your bot configuration, and the0 handles scheduling, monitoring, and keeping things running. It currently supports docker compose as well as kubernetes deployments. We are currently at the initial beta v0.1.0 release but we are about to release a more stripped down version v1.0.0 that focuses on ease of use and quick deployments.

Putting It Together

The tools fit together naturally: Pandas loads and cleans data, ta adds technical features, scikit-learn or TensorFlow trains models, Alpaca executes trades as well as acquires market data, Plotly visualizes results, and the0 keeps everything running in production.

Each tool does one thing well, and they compose cleanly. That modularity makes it easy to swap components—try a different ML algorithm, switch brokers, or add new indicators—without rewriting everything.