Introducing the CFBD Blog...

I'm not quite sure how to begin this initial post. Creating an outlet like this has been on my radar for some time now. Well, it's been on my radar for the past year and a half at the very least. Since starting CollegeFootballData.com, I've always envisioned a place to discuss various topics that are important to the community. And it wouldn't just be me leading the discussion, but hopefully others would join in as well. There's a lot of untapped wisdom in this community and it would be great if we could spread some of it around.

Let me be clear, I'm no expert nor am I a data scientist and have never claimed to be either of these things. I am a software engineer by trade with a strong background in math (over a decade removed). Some of my more analytics focused work has seemingly gained some traction, so I must be doing something right. Hopefully my experiences (and those of others in the community) can help others out in some way, shape, or form.

There's truly some amazing work being done in the analytics community and it's a great time to be involved. I had been wanting to mess around with some CFB-related data and statistics for some time when I became frustrated at the lack of available resources. Well, there were certainly resources out there but most everything of value was behind a large paywall. I happened to stumble upon the r/CFBAnalysis subreddit around this time and noticed there were various people on there collecting and sharing disparate sets of data, so I decided to get involved.

My focus from the beginning was on making data more accessible, especially in a programmatic fashion. My first foray was prior to the start of the 2016 season with the creation of the cfb-data and ncaa-stats JavaScript packages. I soon discovered, however, that this is a very Python- and R-centric community. While not of much direct use to the greater community, you can really call the creation of the packages the foundation of CollegeFootballData.com. Not only do I still maintain them, but much of the infrastructure used to support the site is built off of the cfb-data package.

The package is super convenient for pulling live, raw data. For example, grabbing live scores is as simple as:

const cfb = require('cfb');
const result = await cfb.scoreboard.getScoreboard({ groups: 80 });

Anyway, this wasn't of much use to the bulk of people who were more familiar with Python and R as well as those working mainly in spreadsheets. The next season, I put a dump of 15 years worth of play data on a Google Drive and shared it on r/CFBAnalysis. I also created a very rudimentary application for uploading this data in realtime as games were completed. If the cfb-data package marked the foundation upon which CFBD was built, then this application marked the beginnings of the site.

Everything was a gradual process. I started sharing other data on the Google Drive, like recruiting data and raw season stats. The next year, I created a database and shared its backups on the subreddit. But I still wasn't satisfied with the accessibility of everything. A year later, prior to the start of the 2018 season, I released the first iteration of the CFBD API, built on top of the database I had been sharing. It only had a fraction of the endpoints it boasts barely a year later, but it was a huge step. Lastly, the main CFBD website went live that same bowl season as an easier means to access the API without having to do any coding.

So, this has basically turned into a short history lesson on the origins of the site. The point I'm trying to make is, this has all been a gradual process. While it's come a long way from where it started, it's still very much in its infancy. And there are huge plans ahead. I'm now at the point where I get all sorts of solicitations for my advice and insight on various things. Like I said above, I am far from an expert in this area but I have learned a lot through this whole journey.

What can you expect from this blog? Well for one, I'm hoping to find other contributors. So if you are interested, then hit me up. As for me, I have a lot of ideas. If you follow me on Twitter, then you're used to see me posting brief analysis on games, players, previous, and various things:

- Pitt with a 73% postgame win expectancy
- Pitt w/ 60% success on standard downs to only 11% success on passing downs
- Mike Glass with 81% usage for EMU, would've been higher if not for a late ejection
- Both QBs had really good gameshttps://t.co/4N56M85krf
— CollegeFootballData.com (@CFB_Data) December 27, 2019

Joe Burrow edges out Jalen Hurts in QB efficiency headed into the postseason. Hurts had a sizable gap for much of the year.

Trevor Lawrence and Justin Fields are in the second tier, with a huge gap between them and the other two.#CFBPlayoff https://t.co/T3g3w2ee1J pic.twitter.com/SDksCF84Su
— CollegeFootballData.com (@CFB_Data) December 9, 2019

I'm hoping to do more in-depth analysis on here. I'm also planning on writing about new enhancements to the site and various things I am working on. Perhaps most importantly, I'd like to get in the weeds technically, so you can expect to see some code at some point. The plan right now is for a series of posts detailing how to build various models. I'm starting to dive into some Python and plan on writing some articles on how to build models for various things like EPA, win probability, etc. This series will be predominantly focused on machine learning and neural networks.

In short, there will be many focuses, some more technical than others. There should be a little something for everyone. If you follow me on Twitter, you can expect more in-depth discussion of what you see on there. Anyway, stay tuned!