Posted on February 18, 2018

CryptoCurrency Data Analysis & Primitive Trading

Dataset: https://www.kaggle.com/natehenderson/top-100-cryptocurrency-historical-data

This post is purely for fun/demonstrative purposes… nobody should actually take any investment advice or purchase cryptocurrency based on primitive trading strategies/analysis.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import glob
import datetime as dt

import seaborn as sns

import warnings
warnings.filterwarnings('ignore')
path =r'D:\Downloads\Coding\Datasets\CryptoCurrencyHistoricalData' # use your path
allFiles = glob.glob(path + "/*.csv")
frame = pd.DataFrame()
list_ = []
for file_ in allFiles:
    df = pd.read_csv(file_, delimiter=";")
    list_.append(df)
frame = pd.concat(list_)
close_df = pd.merge(list_[0].iloc[:,[0,4]], list_[1].iloc[:,[0,4]], how = "outer", on="Date")

for i in range(2,len(list_)):
    close_df = pd.merge(close_df, list_[i].iloc[:,[0,4]], how = "outer", on="Date")

names = []
for i in range(0, len(allFiles)):
    names.append(allFiles[i].split(path+"\\")[1].split(".csv")[0])
names.insert(0, "Date")
close_df.columns = names
close_df = close_df.fillna(0)  

close_df.Date[96] = '18/08/2017'
close_df.Date[672] = '18/08/2016'
close_df.Date[827] = '18/08/2015'
close_df.Date[1192] = '18/08/2014'
close_df.Date[1557] = '18/08/2013'

close_df.set_index("Date", inplace=True)
close_df.index = pd.to_datetime(close_df.index, format="%d/%m/%Y")
close_df.sort_index(ascending=True, inplace=True)
mcap_df = pd.merge(list_[0].iloc[:,[0,6]], list_[1].iloc[:,[0,6]], how = "outer", on="Date")

for i in range(2,len(list_)):
    mcap_df = pd.merge(mcap_df, list_[i].iloc[:,[0,6]], how = "outer", on="Date")

mcap_df.columns = names

mcap_df.Date[96] = '18/08/2017'
mcap_df.Date[672] = '18/08/2016'
mcap_df.Date[827] = '18/08/2015'
mcap_df.Date[1192] = '18/08/2014'
mcap_df.Date[1557] = '18/08/2013'

mcap_df.set_index("Date", inplace=True)
mcap_df.index = pd.to_datetime(mcap_df.index)

Not unexpectedly, the price profiles of most cryptocurrencies are quite similar…virtually nothing for a while and then explosive growth.

close_df.loc[:,np.max(close_df, axis=0) < 200].plot(figsize=(10,10))
plt.legend(loc="center right", bbox_to_anchor=[1.5, 0.5],
           ncol=2,title="CryptoCurrency")
plt.ylabel("Currency Price (/USD)")
plt.show()

png

close_df.loc[:,np.max(close_df, axis=0) > 100].plot(figsize=(10,10))
plt.legend(loc="center right", bbox_to_anchor=[1.2, 0.5],
           ncol=1,title="CryptoCurrency")
plt.ylabel("Currency Price (/USD)")
plt.show()

png

temp_df = pd.concat([close_df['bitcoin'], close_df.loc[:, close_df.columns != 'bitcoin'].shift(1)])

Also not unexpected, we observe some very strong correlation across a large cross-section of the cryptocurrency markets.

corr = temp_df.corr()

mask = np.zeros_like(corr, dtype=np.bool)
mask[np.triu_indices_from(mask)] = True

fig, ax = plt.subplots(figsize=(20,15))
cmap = sns.diverging_palette(220, 10, as_cmap=True)

sns.heatmap(corr, mask=mask)
plt.show()

png

Basic Trading Strategy

Here we implement a pretty basic trading strategy, where we look at the difference in BTC prices over a certain time-horizon and either buy/sell depending on the direction.

The basic premise is:

  • Calculate 10d and 40d average BTC price.
  • Choose some target delta between Price_Diff = Price(10d) - Price(40d), and whenever Price_Diff > delta we buy, and when the delta closes we sell.
btc = pd.DataFrame(close_df['bitcoin'], index=close_df.index)
btc.columns = ["bitcoin"]
btc['10d'] = np.round(btc["bitcoin"].rolling(window=10).mean(),2)
btc['40d'] = np.round(btc["bitcoin"].rolling(window=40).mean(),2)
btc[['bitcoin','10d','40d']].plot(grid=True,figsize=(8,5))
plt.show()

png

We compare our strategy against a simple buy and hold approach. Naturally given the volatility in bitcoin and the astronomical growth, we’d expect that the buy and hold would be incredibly difficult to beat… especially if we started to take into account transaction costs (and potential FX issues if purchasing via a non-USD currency).

Given the huge shift in prices of BTC, we also would prefer to have a dynamically adjusting delta… as using a constant one over even a 6-12 month period will likely result in poor outcomes.

btc = btc[btc.index > pd.datetime(2017,1,1)]
btc['10-40'] = btc['10d'] - btc['40d']
X =  50
btc['Stance'] = np.where(btc['10-40'] > X, 1, 0)
btc['Stance'] = np.where(btc['10-40'] < X, -1, btc['Stance'])
btc['Stance'].value_counts()

btc['Stance'].plot(lw=1.5,ylim=[-1.1,1.1])
plt.show()

btc['Market Returns'] = np.log(np.float64(btc['bitcoin'] / btc['bitcoin'].shift(1)))
btc['Strategy'] = btc['Market Returns'] * btc['Stance'].shift(1)

btc[['Market Returns','Strategy']].cumsum().plot(grid=True,figsize=(8,5))

plt.show()


png

png

Optimisation

As above, we see that our strategy substantially underperforms the simple buy and hold strategy. Now for fun, we can generalise our approach and do some basic optimisation/variable mining.

def annualised_sharpe(returns, N=252):
    return np.sqrt(N) * (returns[~np.isnan(returns)].mean() / returns[~np.isnan(returns)].std())

def bitcoin_sim(btc, shift_time, price_tol, forecast_length):
    btc['diff'] = btc["bitcoin"] - btc["bitcoin"].shift(shift_time)

    btc['Stance'] = np.where(btc['diff'] < -3*price_tol, -1, 0)
    btc['Stance'] = np.where(btc['diff'] > price_tol, 1, btc['Stance'])
   # btc['Stance'].value_counts()

    #btc['Stance'].plot(lw=1.5,ylim=[-1.1,1.1])
    #plt.show()

    btc['Market Returns'] = np.log(np.float64(btc['bitcoin'] / btc['bitcoin'].shift(1)))
    btc['Strategy'] = btc['Market Returns'] * btc['Stance'].shift(forecast_length)

    #btc[['Market Returns','Strategy']].cumsum().plot(grid=True,figsize=(8,5))

    
    return (btc['Strategy'].cumsum().tail(1), annualised_sharpe(btc['Strategy']))
    
bitcoin_sim(btc, 1, 1, 1)
(Date
 2017-11-22    0.220979
 Name: Strategy, dtype: float64, 0.24081979662549807)
shift_time = np.linspace(1, 200, 10, dtype=int)
price_tol = np.linspace(-200, 300, 10, dtype=int)
#forecast_length = np.linspace(1, 30, 5, dtype=int)

results_pnl = np.zeros((len(shift_time), len(price_tol)))
results_sharpe = np.zeros((len(shift_time), len(price_tol)))

market_returns = btc['Market Returns'].cumsum().tail(1)
market_sharpe = annualised_sharpe(btc['Market Returns'])


for i, shift_t in enumerate(shift_time):
    for j, price_t in enumerate(price_tol):
        pandl, risk = bitcoin_sim(btc, shift_t, price_t, 1)
        results_pnl[i, j] = pandl - market_returns
        results_sharpe[i, j] = risk - market_sharpe
max(results_pnl.flatten())
0.16149115817834314
sns.kdeplot(pd.DataFrame(results_pnl), shade=True)
plt.show()

png

sns.kdeplot(pd.DataFrame(results_sharpe), shade=True)
plt.show()

png

We can take the results out of the above… and look at what the returns profile looks now. We see that our strategy does quite well early on but has substantially lagged the market. Nonetheless, we have a decent framework for investigating very primitive and basic trading strategies.

price_diff_time = 5
price_tol = 50
forecast_horizon = 2

btc['diff'] = btc["bitcoin"] - btc["bitcoin"].shift(price_diff_time)

btc['Stance'] = np.where(btc['diff'] < -3*price_tol, -1, 0)
btc['Stance'] = np.where(btc['diff'] > price_tol, 1, btc['Stance'])
btc['Stance'].value_counts()

btc['Stance'].plot(lw=1.5,ylim=[-1.1,1.1])
plt.show()

btc['Market Returns'] = np.log(np.float64(btc['bitcoin'] / btc['bitcoin'].shift(1)))
btc['Strategy'] = btc['Market Returns'] * btc['Stance'].shift(forecast_horizon)

btc[['Market Returns','Strategy']].cumsum().plot(grid=True,figsize=(8,5))

plt.show()

png

png

A key feature of a trading strategy that needs to be considered is both turnover and the flowon effect into transaction costs. As expected, because we’re using a small delta we our executing A LOT of buy/sell orders… and naturally we would expect our underperformance to be even worse.

So in conclusion… most people would have likely been better off just buying and holding BTC as opposed to trying to trade based off of any form of technical analysis.

plt.plot(btc.index, btc['bitcoin'], label="Close Price")
plt.plot(btc.ix[btc.Stance == 1]['bitcoin'].index, btc.ix[btc.Stance == 1]['bitcoin'], '^', markersize=10, color='g')
plt.plot(btc.ix[btc.Stance == -1]['bitcoin'].index, btc.ix[btc.Stance == -1]['bitcoin'], 'v', markersize=10, color='r')
plt.plot(btc.index, btc["diff"], label="pricediff")
plt.show()

png

Extra data to explore

from ggplot import *
df = pd.read_csv(r"D:\Downloads\crypto-markets.csv")
df.date = pd.to_datetime(df.date)
df.head()
slug symbol name date ranknow open high low close volume market close_ratio spread
0 bitcoin BTC Bitcoin 2013-04-28 1 135.30 135.98 132.10 134.21 0 1500520000 0.5438 3.88
1 bitcoin BTC Bitcoin 2013-04-29 1 134.44 147.49 134.00 144.54 0 1491160000 0.7813 13.49
2 bitcoin BTC Bitcoin 2013-04-30 1 144.00 146.93 134.05 139.00 0 1597780000 0.3843 12.88
3 bitcoin BTC Bitcoin 2013-05-01 1 139.00 139.89 107.72 116.99 0 1542820000 0.2882 32.17
4 bitcoin BTC Bitcoin 2013-05-02 1 116.38 125.60 92.28 105.21 0 1292190000 0.3881 33.32

ggplot(aes(x='date', y='close', color='name'), data=df[df.ranknow <= 10]) + geom_line() 

png

<ggplot: (166606426937)>
* Shifted correlation i.e. what happens to price of other currencies in days after bitocin has large increases