Gearing up for returns forecasting with VAR
After going through Tweets mining as explained in my first article, gathering cryptocurrency returns wasn’t difficult at all. I had decided to use pycoingecko, which is a Python wrapper around the CoinGecko API.
The key 2 things to note are:
- You need the id of the cryptocurrency in order to download the market data
- the granularity of the data is automatically determined by the number of days you are downloading for
Getting id of cryptocurrency
Go to the CoinGecko API and execute the “/coin/list”.
Get the ids of the coins required. In my case, I have the list below:
gecko_list = [
"bitcoin",
"ethereum",
"ripple", # xrp
"tether",
"bitcoin-cash",
"cardano",
"bitcoin-cash-sv",
"litecoin",
"chainlink",
"binancecoin",
"eos",
"tron",
]
Downloading historical market data
I used get_coin_market_chart_by_id, the wrapper around /coins/{id}/market_chart to get my historical market data. The granularity of the data returned depends on the number of days we are getting:
- minutely data for duration within 1 day
- hourly data will be used for duration between 1 day and 90 days
- daily data will be used for duration above 90 days
Since I do not have the luxury to collect hourly data beyond 90 days, I can only stick to daily market data. The code snippet below returns 300 days worth of historical market data against USD. I’m only going to store the prices returned for the given timestamp.
cg = CoinGeckoAPI()
timePeriod = 300
data = {}
for coin in gecko_list:
try:
nested_lists = cg.get_coin_market_chart_by_id(
id=coin, vs_currency="usd", days=timePeriod
)["prices"]
data[coin] = {}
data[coin]["timestamps"], data[coin]["values"] = zip(*nested_lists)
except Exception as e:
print(e)
print("coin: " + coin)
frame_list = [
pd.DataFrame(data[coin]["values"], index=data[coin]["timestamps"], columns=[coin])
for coin in gecko_list
if coin in data
]
Let’s convert the timestamp into a user friendly format:
df_cyptocurrency["datetime"] = pd.to_datetime(df_cyptocurrency.index, unit="ms")
df_cyptocurrency["date"] = df_cyptocurrency["datetime"].dt.date
df_cyptocurrency["hour"] = df_cyptocurrency["datetime"].dt.hour
Let’s align the data into a format that is more generic:
df_cyptocurrency = df_cyptocurrency.melt(
id_vars=["datetime", "date", "hour"], var_name="currency_name", ignore_index=True
)
df_cyptocurrency.head(5)
Now, that was not too bad right? Gathering cryptocurrency returns was way more simple than mining tweets. The only confusing part during data collection was the granularity of the data as I overlooked the API method definitions. My notebook is available in the Atoti notebook gallery for your reference.
Now, let’s take a breather before I start performing my time-series analysis.