The Block recently looked at the relationship between white paper length and dollar-amount raised via ICO. Here, contributor Michael Rosenberg and The Block’s own Mike McCaffrey begin to explore the relationship between the two variables more formally.
Within the cryptocurrency markets, there has been a recent uptick in white-paper length. Between Q1 2016 and Q4 2018, the word count has increased from an average of around 3,000 words per paper to 9,000 words. Many ICO projects slated to launch in 2019 look to be continuing this trend.
We are interested in seeing if this increase in length informs the amount of money raised by projects that held an ICO. In particular, does white paper length predict a higher dollar amount raised by close date?
Our intuition suggests that white paper length might, on one hand, be an indication of project complexity, while on the other hand, may be a result of additional graphics, stylistic differences, and general verboseness.
To analyze this question, we collected dollar-amount raised at close date per cryptocurrency via Coindesk’s ICO Tracker. We then manually looked up each cryptocurrency’s white paper and identified the page count on those papers. Due to the time-intensive nature of that manual process, we decided to start by analyzing ICOs between January 2018 and July 2018. We will discuss the implications of this data subsetting in our next steps.
Within our dataset, there are 439 ICOs between January and July of this year. This dataset size is relatively small, which suggests that we may not be powered to see statistically significant results with a large feature set. We may be able to do a more in-depth analysis when we consider earlier years in our future work.
For our analysis, we will be predicting the dollar-amount raised in ICO (in millions) per cryptocurrency using page count. Let’s take a look at our variables of interest.
Figure 1: Distribution of Amount Raised in ICO ($M). The regular amount raised is on the top, while the log-amount raised is on the bottom.
We see that the raw amount raised is very right-skewed (top). This is not uncommon of financial data; there are many projects whose ICOs have raised relatively little by their close date and a handful of ICOs that have raised a huge amount of money. For reference, the median amount raised is around $12.2M while the max is around $4,200M. While this is perfectly reasonable as a financial process, it is often difficult for simple predictive models to fit right-skewed variables. Because the natural logarithm of dollar-amount raised (bottom chart) is much more normally distributed (which tends to be easier to predict with simple regression methods), we will aim to predict the log-transformed version of our amount raised in our methodology.
Figure 2: Distribution of Page Count per ICO white paper.
Like amount raised, page count is also a right-skewed variable. On average, white papers tend to be 38 pages long, but the longest white paper in our dataset is 132 pages. Since most simple regression methods make no normality assumptions about explanatory variables, we are not too concerned about this. However, the sparsity of the page count distribution above 70 pages suggests that we may not currently be able to make statistically meaningful statements about very long white papers.
Figure 3: Log-Amount Raised ($M) on Page Count (teal). We have removed the page count outlier (Page Count = 132) from annotation within the plot. The blue line represents the linear trend for the core ICO set. The dashed black line represents the page count of 54, while the red dashed lines represent the mean log-Amount raised pre-cutoff and post-cutoff.
When plotting log-amount raised on page count, there is a very clear outlier at 132 pages. Given that the second largest page count is only 105 pages and the amount raised at the 132-page white paper is very high, we feel uncomfortable interpolating the page count effect within this gap. Thus, we are going to remove the 132-page white paper from our analysis.
By the linear trend (blue), we see that there is a key positive relationship between increased page counts and amount raised. However, our observations (teal) make it clear that the noise around the linear trend is non-constant. In particular, it looks like the variation of log-amount raised decreases after around 54 pages. This heteroscedasticity may violate some of the assumptions around statistically testing the relationship between log-amount raised and page count. For now, we leave this robustness check outside the scope of this analysis. We will review the implications in our next steps.
When analyzing this relationship, we also noticed a clear conditional lift in log-amount raised at around the 54-page cutoff (black, dashed). If we just analyze mean dollar-amount raised pre-cutoff and post-cutoff, we measure around a 52.7% lift in dollar-amount raised. While this measurement will likely be dampened when controlling for other sources of variation (see methodology), this lift seems substantial enough to be considered as an alternative predictive hypothesis to a linear trend (blue). While the 54 number is relatively arbitrary, it corresponds with the 84th percentile of the page count distribution. Thus, we will consider a model that represents the page count effect on log-amount raised as a lift for cryptocurrencies with white papers in the top 16% for page count.
For transparency on our analysis, we would like to give a deep dive into our methodology for predicting dollar-amount raised in ICO. However, we realize that this description can be quite verbose. Thus, we will provide a quick summary here and provide the deep dive of our approach in a separate link.
Due to right-skewness of dollar-amount raised in ICO, we plan to predict log-amount raised to lighten the prediction problem for simple regression methods. We consider both a linear model and a cutoff model (see Figure 3) that control for month-based seasonality in log-amount raised while estimating a page count effect. Since we are predicting log-amount raised, we interpret our effects on dollar-amount raised in ICO as multipliers rather than linear changes in money.
We select the model that minimizes cross-validated root mean squared error (CV-RMSE) under a 5-fold simulation. Root mean squared error (RMSE) is an error metric that measures, on average, how off our model predictions are from actual dollar-amount raised in ICO. The cross-validated version of this metric measures the performance of our model predictions on out-of-sample ICOs. In this regard, the model in our consideration set that minimizes CV-RMSE is expected to be best (in said consideration set) at generalizing predictions to new cryptocurrencies. For details on how we construct CV-RMSE via simulation, see our appendix.
We see that the cross-validated RMSE for the linear and percentile effect models are 111.84 and 111.9 respectively. While these RMSEs are very close, we will select the linear model (model 1 in the methodology section) since its cross-validated RMSE is slightly smaller than the percentile model’s cross-validated RMSE.
That being said, this RMSE is concerning from a fit perspective. The linear model implies that, on average, our model is off by around $112M for each cryptocurrency’s ICO. This is pretty severe underfitting of the fundraising process, and we think it is worthwhile to consider a more feature-dense model in our next steps.
|Coefficient||Std. Error||P-Value||Percent Change|
|Month Of Close = 2 (February)||-0.113||0.205||0.583||-10.685%|
|Month Of Close = 3 (March)||-0.564||0.203||0.006||-43.107%|
|Month Of Close = 4 (April)||-1.228||0.197||0.000||-70.712%|
|Month Of Close = 5 (May)||-0.607||0.202||0.003||-45.502%|
|Month Of Close = 6 (June)||-0.821||0.209||0.000||-56.001%|
|Month Of Close = 7 (July)||-1.140||0.198||0.000||-68.018%|
Table 1: The coefficient table from our selected regression. “Percent Change” is the expected percent change in amount raised ($M) implied by the coefficient estimates.
We see that when we control for seasonality, increasing the length of a white paper by 1 page is predicted to increase amount raised by around 1%. This is also very statistically significant, with a p-value below .01. This means there is a statistically significant likelihood that is having some effect on amount raised. That being said, there are still open questions on the narrative of the effect. On one end, page count might be simply a form of obfuscation; there might not be major differences in the qualities of different cryptocurrencies, but white paper length might give an impression of complexity and due-diligence for an ICO that causes investors to provide more fundraising. On the other hand, there might be genuine content differences that is informing both the length of white papers and the general dollar-amount raised in ICO (e.g. new technological breakthroughs, ambitious designs). In this regard, it will be important to further analyze the language content of these white papers in our next steps.
While there is varying statistical significance of our month indicators, their negative coefficients make it clear that there is a general decline in dollar-amount raised via ICO post-February 2018. It may be the case that enthusiasm around cryptocurrency has declined over the year, which could be informing lower amounts raised via ICO post-February.
Figure 4: Page Count on Month of Close for ICOs in our modeling dataset (teal). The page count means per month of close are indicated by the blue line.
As a robustness check, we wanted to make sure there was little collinearity between page count and month of close. If there was, it would make it difficult to interpret the page count effect on dollar-amount raised when controlling for seasonality. Thankfully, it looks like we will not have to worry substantially about this issue. Across the months in our dataset, page count hovers between 32 and 36 pages (blue line). Since this is very little variation in mean across months, we would argue that we do not need to be concerned about this multicollinearity when interpreting the effect of page count on amount raised.
In this analysis, we identified a statistically significant relationship between a cryptocurrency’s white paper page count and amount raised in ICO. In particular, our model suggests that an additional page to a white paper is predicted to increase dollar-amount raised in ICO by around 1%. This relationship could potentially impact the way that analysts reflect on cryptocurrency white papers from a surface-level perspective. However, we have a few next steps in mind to improve the robustness of our current model and better understand the mechanisms of how white papers affect dollar-amount raised in ICO.
- CV-RMSE suggests that our current model is off on average by around $112M per ICO. This is severe underfitting, and it suggests that we should consider a more feature-dense approach to predicting dollar-amount raised. This will require us to think more deeply about the mechanisms that affect fundraising per ICO and collect features that will capture those mechanisms within our current modeling process.
- If we want to consider a more feature-rich regression model, we would statistically benefit from introducing ICOs from prior years within our dataset. Given that we only have 438 cryptocurrencies within our final modeling dataset, we will lose statistical significance quickly if we overload features for modeling on this 2018 dataset. We can probably offset increased dimension to our model if we introduce the large number of ICOs that occurred in 2017 in our modeling dataset. On a more secondary note, we will also be able to control for more seasonal variations when we introduce earlier time points within our dataset.
- For a causal narrative, we are interested in spending more time mining the true mechanisms for how white paper length informs fundraising per ICO. In particular, we are interested in using natural language processing to see if the language content of the white papers informs the dollar-amount raised in ICO to any degree. Since the language content is directly informing how long these white papers are, identifying this confound will present a more nuanced narrative on how communication on cryptocurrencies affects fundraising in ICO. If language itself is not presenting meaningful signal to amount raised, it could be the case that speculation on these cryptocurrencies is based more on perceived complexity (i.e. white paper length) than on communicated content.
To view our code and assets for this analysis, visit this git repo.
Michael Rosenberg is a CMU alum and is currently working at Wayfair as a Data Scientist. Michael is interested in problems related to data and quantitative social sciences. Feel free to connect with him on LinkedIn or Github.
The post Does white paper page count affect the amount of money raised via ICO? appeared first on The Block.
Does white paper page count affect the amount of money raised via ICO? written by Michael Rosenberg @ https://www.theblockcrypto.com/2018/12/17/does-white-paper-page-count-affect-the-amount-of-money-raised-via-ico/ December 17, 2018 Michael Rosenberg