I wish I’d had a little more panache in 2013. I’d call it something like “Crowdfunding Secrets: Revealed by Machine Learning!” or “Unlocking your Kickstarter’s Potential!”
Recovered: Anniversary and Crowdfunding Analysis
Crowdfunding Secrets: Exposed by Big Data!
I apparently missed my one-year anniversary, but the blog’s been slow recently. A lot has happened in the last half-year: I’ve graduated, I’ve moved, I’ve found employment. Unfortunately, I may not be gaming as much any more, but we’ll see how that plays out long-term.
Just because I wasn’t blogging doesn’t mean I haven’t been busy, and one of the things I worked on was an analysis of Kickstarter data. Because it was for a class, it assumes a certain vocabulary, has some weird stylistic artifacts, and has some persistent errors that weren’t severe enough to merit fixing at the time. Eventually I would like to revisit this more completely, but until then I may as well “publish” it:
2023: Summarizing
I’m not going to make you wade through my decade-old academic paper for the conclusions. Here’s a rough summary.
- There are three main ways projects go: failure, success, and “runaway” success.
- The strongest predictor of success under a creator’s control is a low goal. Obvious in retrospect, but nice to have the data.
- The strongest predictor of success to an outside observer is the number of backers.
- The strongest predictor of “runaway” success under a creator’s control is number of updates. Be careful here: this is still a correlation.
- The strongest predictor of “runaway” success to an outside observer is the number of comments. This is also a much stronger predictor than number of updates.
- “Percent funded” is useless as a metric (compared to, e.g. “dollars funded”). I speculate that this is because Kickstarter is used almost completely as a pre-order (ex post facto) platform and not a “funding” (ex ante) platform. Which is to say, people want stuff more than they want the project to succeed independently, so the target is largely irrelevant to their decision.
2023: The Data
The crowdfunding landscape is greatly changed in the last decade. Dan Misener’s The Kickback Machine broke sometime after my project, but I have a copy of the data he shared with me for it. I’m sharing it here, in case someone else might get some useful insight from it. It contains information about ~31,200 Kickstarter campaigns, from July 2012 to April 2013.
I’ll see if I can’t provide some notes about it:
- There’s some wonky campaigns at the beginning and end with incomplete data. Probably best to exclude these.
- There may also have been some duplicate entries. Worth looking for before crunching.
- In
reward-level-list
, pledge levels with an “L” afterwards are “limited quantities.” For example, “only one pledge at this level.” The limits themselves are not recorded. - I believe
kickstarter_fee
is a simple percentage. - Where
currency
is blank, assume USD. I do not recall the calculation ofpledged_in_usd
, so maybe it’s best to look up historical conversion rates, for consistency. - Many of these values are not linearly independent.
percent_raised
is a product ofgoal
andpledged
.success
is also, butproject_state
is not.
I believe at the time, I had been playing around with what is now OpenRefine to clean up the data, as well as the built-in tools of SIMCA-P+. SIMCA is fine, but it looks like the strongest parts of it are in R now.
This post was first shared August 7, 2013.