<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:media="http://search.yahoo.com/mrss/"><channel><title><![CDATA[CFBD Blog]]></title><description><![CDATA[College football data, analytics, and musings]]></description><link>https://blog.collegefootballdata.com/</link><image><url>https://blog.collegefootballdata.com/favicon.png</url><title>CFBD Blog</title><link>https://blog.collegefootballdata.com/</link></image><generator>Ghost 5.58</generator><lastBuildDate>Mon, 06 Apr 2026 22:04:14 GMT</lastBuildDate><atom:link href="https://blog.collegefootballdata.com/rss/" rel="self" type="application/rss+xml"/><ttl>60</ttl><item><title><![CDATA[🏈 Revamping Win Probability for 2025]]></title><description><![CDATA[Win probability on CollegeFootballData.com has been completely overhauled for 2025. The new models are smarter, clutch-aware, and better calibrated—plus there’s a new calculator to test game situations yourself.]]></description><link>https://blog.collegefootballdata.com/revamping-win-probability-2025/</link><guid isPermaLink="false">68abca00fbdcf500010a5d13</guid><category><![CDATA[college football analytics]]></category><category><![CDATA[analytics]]></category><category><![CDATA[win probability]]></category><category><![CDATA[college football]]></category><category><![CDATA[machine learning]]></category><category><![CDATA[2025 season]]></category><category><![CDATA[cfbd]]></category><dc:creator><![CDATA[Bill Radjewski]]></dc:creator><pubDate>Thu, 18 Sep 2025 20:00:28 GMT</pubDate><media:content url="https://images.unsplash.com/photo-1543286386-2e659306cd6c?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wxMTc3M3wwfDF8c2VhcmNofDJ8fGxpbmUlMjBjaGFydHxlbnwwfHx8fDE3NTgyMjM0NDd8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=2000" medium="image"/><content:encoded><![CDATA[<img src="https://images.unsplash.com/photo-1543286386-2e659306cd6c?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wxMTc3M3wwfDF8c2VhcmNofDJ8fGxpbmUlMjBjaGFydHxlbnwwfHx8fDE3NTgyMjM0NDd8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=2000" alt="&#x1F3C8; Revamping Win Probability for 2025"><p>Imagine this: it&#x2019;s the fourth quarter, tie game, your team has the ball on the opponent&#x2019;s 1-yard line with one second left. What are the odds they actually win?</p><p>That&#x2019;s the kind of question a <strong>win probability model</strong> is built to answer and it&#x2019;s one I&#x2019;ve been calculating for years. But until now, the system powering those numbers was showing its age. For 2025, I&#x2019;ve completely overhauled win probability on CollegeFootballData.com, replacing outdated models with a modernized, better-calibrated engine that understands not just the flow of regulation, but also the unique dynamics of <strong>clutch time</strong> and <strong>overtime</strong>.</p><hr><h2 id="the-old-way-retired-models">The Old Way (Retired Models)</h2><p>The previous version of win probability was powered by two models: one for regulation and one for overtime. These were built years ago using a now-obsolete JavaScript library called <strong>SynapticJS</strong>. They&apos;ve generally worked well enough, but they had serious drawbacks:</p><ol><li>They were essentially <strong>black boxes</strong>, with no good way to measure calibration or error.</li><li>There was no mechanism for handling <strong>rare, high-leverage scenarios</strong>, like the one-second, goal-line situation above.</li><li>And practically, SynapticJS is no longer maintained, making the models brittle and hard to improve.</li><li>Lastly, how many people are training machine learning models on JavaScript? There&apos;s a reason JS is used primarily for web while most ML happens in the Python (and R) ecosystem.</li></ol><p>In short, they were due for replacement.</p><hr><h2 id="the-new-models-2025-revamp">The New Models (2025 Revamp)</h2><p>For the new season, I&#x2019;ve rebuilt the system from the ground up in <strong>Python using XGBoost</strong>, a modern machine learning library that&#x2019;s fast, well-supported, and ideal for structured sports data.</p><p>Instead of two opaque models, there are now <strong>three specialized models</strong>:</p><ul><li><strong>Regulation Model</strong> &#x2013; trained on all non-overtime plays, handles the bulk of game situations.</li><li><strong>Clutch Time Model</strong> &#x2013; trained specifically on close games in the final minutes, where every play can swing the outcome.</li><li><strong>Overtime Model</strong> &#x2013; trained only on overtime possessions, which are fundamentally different because of college football&#x2019;s unique rules.</li></ul><p>The regulation and clutch models are combined into a <strong>blended approach</strong>: the regulation model drives most of the game, while the clutch model gradually takes over in high-leverage late situations. This way, the system is both broadly calibrated and sharply tuned for the moments that matter most.</p><hr><h2 id="calibration-and-results">Calibration and Results</h2><p>A major advantage of the new models is that they can be tested and evaluated, which is something the old SynapticJS models couldn&#x2019;t do.</p><p>For each of the three models, I generated <strong>calibration curves</strong> that compare predicted probabilities to actual outcomes. The closer the line is to the diagonal, the better calibrated the model is. The results show:</p><h4 id="regulation-model">Regulation Model</h4><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2025/08/image-4.png" class="kg-image" alt="&#x1F3C8; Revamping Win Probability for 2025" loading="lazy" width="790" height="590" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2025/08/image-4.png 600w, https://blog.collegefootballdata.com/content/images/2025/08/image-4.png 790w" sizes="(min-width: 720px) 720px"></figure><h4 id="clutch-model">Clutch Model</h4><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2025/08/image-2.png" class="kg-image" alt="&#x1F3C8; Revamping Win Probability for 2025" loading="lazy" width="790" height="590" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2025/08/image-2.png 600w, https://blog.collegefootballdata.com/content/images/2025/08/image-2.png 790w" sizes="(min-width: 720px) 720px"></figure><h4 id="overtime-model">Overtime Model</h4><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2025/08/image-3.png" class="kg-image" alt="&#x1F3C8; Revamping Win Probability for 2025" loading="lazy" width="690" height="590" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2025/08/image-3.png 600w, https://blog.collegefootballdata.com/content/images/2025/08/image-3.png 690w"></figure><p>The bottom line: the new system doesn&#x2019;t just look smarter; it actually measures smarter.</p><hr><h2 id="clutch-time-in-action">Clutch Time in Action</h2><p>One of the biggest improvements comes from how the new system handles endgame situations.</p><p>In the old model, a tie game with one second left and the ball on the opponent&#x2019;s 1-yard line might have been treated like a coin flip with ~50% win probability. That never felt right.</p><p>With the new blended approach, the clutch model takes over, recognizing this as a near-certain win for the offense. Scenarios that used to break the model now produce realistic, intuitive results.</p><p>This &#x201C;clutch awareness&#x201D; makes the new win probability charts much more believable, especially in the final minutes of close games.</p><!--kg-card-begin: html--><script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-3674605305984905" crossorigin="anonymous"></script>
<ins class="adsbygoogle" style="display:block; text-align:center;" data-ad-layout="in-article" data-ad-format="fluid" data-ad-client="ca-pub-3674605305984905" data-ad-slot="7107763740"></ins>
<script>
     (adsbygoogle = window.adsbygoogle || []).push({});
</script><!--kg-card-end: html--><hr><h2 id="%F0%9F%9A%80-new-tools-on-the-site">&#x1F680; New Tools on the Site</h2><p>Along with the revamped models, I&#x2019;ve added a brand-new <strong><a href="https://collegefootballdata.com/win-probability?ref=blog.collegefootballdata.com">Win Probability Calculator</a></strong> to the site. This tool lets you plug in the game situation (score, time remaining, down, distance, and field position) and instantly see the home team&#x2019;s win probability. Behind the scenes, it uses the new <strong>regulation + clutch blended model</strong>, so the numbers reflect both the general flow of a game <em>and</em> the pressure of high-leverage moments.</p><p>Advanced box scores and data for all <strong>2025 matchups and beyond</strong> have been using this new blended model. And every <strong>win probability chart</strong> you see during the season will now run on the new models. You&#x2019;ll notice smoother, more realistic shifts, especially late in games where the old system struggled.</p><p>Finally, the <strong>Excitement Index</strong>, a measure of how thrilling a game is based on swings in win probability, has been using the updated engine during the 2025 season. Because the clutch model is sharper, excitement ratings will better capture the drama of close finishes.</p><hr><h2 id="overtime-model-1">Overtime Model</h2><p>Overtime in college football is a world of its own: possessions starting at the opponent&#x2019;s 25, alternating turns, and since 2021, two-point shootouts. That structure makes overtime play fundamentally different from regulation, which is why a dedicated model was necessary.</p><p>The new overtime model is trained only on overtime possessions and captures those dynamics directly. Its calibration curve shows a solid fit, giving confidence in the numbers when games head into extra frames.</p><hr><h2 id="takeaways-what%E2%80%99s-next">Takeaways &amp; What&#x2019;s Next</h2><p>The old SynapticJS models got us this far, but they were opaque and unmeasurable. The new system is:</p><p><strong>Transparent</strong> &#x2013; feature sets are clear, and the models can be tested.</p><p><strong>Calibrated</strong> &#x2013; probabilities better match reality across all game states.</p><p><strong>Clutch-aware</strong> &#x2013; no more 36% win probability in a one-yard, one-second tie.</p><p><strong>Specialized</strong> &#x2013; overtime handled with its own dedicated model.</p><p>This overhaul powers not just the charts you see on the site, but also the new calculator and updated Excitement Index.</p><p>Looking ahead, I hope to extend these improvements into other areas like live win probability updates during games, deeper situational models (e.g., 4th-down decisions), and expanded API access for developers.</p><hr><h2 id="closing">Closing</h2><p>The 2025 season marks a new era for win probability on CollegeFootballData.com. Whether you&#x2019;re following along live, exploring charts after the fact, or testing &#x201C;what if&#x201D; scenarios in the calculator, the numbers you see are powered by smarter, sharper, clutch-ready models.</p><p><strong>2025 is here, and win probability just got a whole lot smarter.</strong></p><!--kg-card-begin: html--><script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-3674605305984905" crossorigin="anonymous"></script>
<ins class="adsbygoogle" style="display:block; text-align:center;" data-ad-layout="in-article" data-ad-format="fluid" data-ad-client="ca-pub-3674605305984905" data-ad-slot="7107763740"></ins>
<script>
     (adsbygoogle = window.adsbygoogle || []).push({});
</script><!--kg-card-end: html-->]]></content:encoded></item><item><title><![CDATA[Submitting CFBD predictions with HTTP requests]]></title><description><![CDATA[<p>A week ago on the <a href="https://discord.gg/Eb3ex5a?ref=blog.collegefootballdata.com">College Football Data Discord</a>, some folks were discussing the difficulties of updating their predictions for the <a href="https://predictions.collegefootballdata.com/?ref=blog.collegefootballdata.com">CFBD Model Pick&apos;em Contest</a>:</p><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2025/09/image.png" class="kg-image" alt="A screenshot of a discord discussion thread. User &quot;Miles&quot; writes, &quot;For the Pick ems, why don&#x2019;t you list every game? Just curious&quot;. User &quot;Bill&quot; replies, &quot;I list every game for which there is a spread&quot;. User &quot;Miles&quot; replies, &quot;That means some games just don&#x2019;t have spreads on whatever you pull it from? Interesting&quot; User &quot;Danger Mouse&quot; replies, &quot;At least not yet. Sometimes you&apos;ll see games on there late, which is annoying if you try and get your picks in on Monday or Tuesday. So check frequently&quot;. User &quot;Miles&quot; replies, &quot;Yeah I&#x2019;ve been checking every day and I noticed some get added later. Not a big deal just keeping that in mind&quot;. User &quot;Stodge&quot; replies, &quot;Yeah, lots of spreads get added the day of the game&quot;. " loading="lazy" width="1216" height="849" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2025/09/image.png 600w, https://blog.collegefootballdata.com/content/images/size/w1000/2025/09/image.png 1000w, https://blog.collegefootballdata.com/content/images/2025/09/image.png 1216w" sizes="(min-width: 720px) 720px"></figure><p>I see this complaint a fair amount&#x2013;it is difficult to track all of the games that are available to pick, not</p>]]></description><link>https://blog.collegefootballdata.com/submitting-predictions-with/</link><guid isPermaLink="false">68c09066cbf61e00019ce2e6</guid><dc:creator><![CDATA[John Edwards]]></dc:creator><pubDate>Sat, 13 Sep 2025 14:04:09 GMT</pubDate><content:encoded><![CDATA[<p>A week ago on the <a href="https://discord.gg/Eb3ex5a?ref=blog.collegefootballdata.com">College Football Data Discord</a>, some folks were discussing the difficulties of updating their predictions for the <a href="https://predictions.collegefootballdata.com/?ref=blog.collegefootballdata.com">CFBD Model Pick&apos;em Contest</a>:</p><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2025/09/image.png" class="kg-image" alt="A screenshot of a discord discussion thread. User &quot;Miles&quot; writes, &quot;For the Pick ems, why don&#x2019;t you list every game? Just curious&quot;. User &quot;Bill&quot; replies, &quot;I list every game for which there is a spread&quot;. User &quot;Miles&quot; replies, &quot;That means some games just don&#x2019;t have spreads on whatever you pull it from? Interesting&quot; User &quot;Danger Mouse&quot; replies, &quot;At least not yet. Sometimes you&apos;ll see games on there late, which is annoying if you try and get your picks in on Monday or Tuesday. So check frequently&quot;. User &quot;Miles&quot; replies, &quot;Yeah I&#x2019;ve been checking every day and I noticed some get added later. Not a big deal just keeping that in mind&quot;. User &quot;Stodge&quot; replies, &quot;Yeah, lots of spreads get added the day of the game&quot;. " loading="lazy" width="1216" height="849" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2025/09/image.png 600w, https://blog.collegefootballdata.com/content/images/size/w1000/2025/09/image.png 1000w, https://blog.collegefootballdata.com/content/images/2025/09/image.png 1216w" sizes="(min-width: 720px) 720px"></figure><p>I see this complaint a fair amount&#x2013;it is difficult to track all of the games that are available to pick, not to mention significant changes to the outlook of different games (a team&apos;s starting QB being scratched with injury, for instance). That&apos;s why one of my top tips for doing well in the CFBD Model Pick&apos;em is to <a href="https://blog.collegefootballdata.com/lessons-from-picking-the-2022-cfb-season/#automate-everything">automate everything!</a> This does not just mean how you make your predictions, but how you <em>submit</em> your predictions as well.</p><p>Thanks to some new features implemented by Bill, I have since moved beyond the <a href="https://www.selenium.dev/documentation/webdriver/?ref=blog.collegefootballdata.com">Selenium-based</a> pipeline I implemented a few years ago, and now my entire CFBD Model Pick&apos;em pipeline relies on a series of simple HTTP calls. In this post, I will demonstrate how to format and execute these calls using cURL. cURL is an open-source library for uploading and downloading data from websites. </p><p>It is extremely unlikely that you will write these pipelines exclusively in cURL&#x2013;rather, you will likely use a cURL wrapper library in your language of choice. Fortunately, the fantastic free website <a href="https://curlconverter.com/?ref=blog.collegefootballdata.com">curlconverter.com</a> will allow you to copy and paste valid cURL commands and convert them to the language of your choice (R, Python, etc.)</p><h2 id="obtaining-your-token">Obtaining your token</h2><p>To begin, we will need to obtain a token for submitting to the game. We will first need to sign up for the predictions game if we have not done so already, then log in with our account. Visit <a href="https://predictions.collegefootballdata.com/?ref=blog.collegefootballdata.com">predictions.collegefootballdata.com</a> and sign in with one of the available options.</p><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2025/09/image-1.png" class="kg-image" alt="A screenshot of predictions.collegefootballdata.com. The user is logged out, and the page is prompting them to log in with one of Twitter or Reddit." loading="lazy" width="2000" height="569" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2025/09/image-1.png 600w, https://blog.collegefootballdata.com/content/images/size/w1000/2025/09/image-1.png 1000w, https://blog.collegefootballdata.com/content/images/size/w1600/2025/09/image-1.png 1600w, https://blog.collegefootballdata.com/content/images/2025/09/image-1.png 2219w" sizes="(min-width: 720px) 720px"></figure><p>Once logged in, go to <a href="https://predictions.collegefootballdata.com/api/auth/token?ref=blog.collegefootballdata.com">predictions.collegefootballdata.com/api/auth/token</a>. You will see a long string of characters&#x2013;this is your <strong>prediction token</strong>. This is a unique identifier that the CFBD Model Pick&apos;em API will use to check that it is genuinely you submitting your picks, and not someone else. </p><p>Two very important notes: </p><ol><li>This token <strong>is different from your basic CFBD API key</strong><em><strong> </strong></em>and these cannot be used interchangeably! So do not swap them around&#x2013;you cannot use your car key to open your house and vice versa! </li><li><strong>Do not share this token with anyone!</strong> If you give this token to someone else, they can log into your account and access your predictions and information.</li></ol><p>This token will work for one month. You can simply set a reminder for yourself once a month to update the prediction token when convenient.</p><h2 id="getting-games-to-pick">Getting games to pick</h2><p>Now that we have our token, we can begin to make HTTP requests to the site using cURL.</p><p>The most basic HTTP request is a <code>GET</code> request&#x2013;when we make a <code>GET</code> request, we are asking the url we are querying to <em>get</em> us data and return it to us in some format. We first need to specify the web url we are trying to query, which is the <code>picks</code> endpoint of the CFBD Model Pick&apos;em API. This API contains the list of games for which we can submit picks in a given week.</p><pre><code class="language-bash">curl &apos;https://predictionsapi.collegefootballdata.com/api/picks&apos;</code></pre><pre><code class="language-bash">{&quot;error&quot;:&quot;Unauthorized&quot;}</code></pre><p>Bummer! We cannot see the games to pick unless we can prove we have a CFBD Model Pick&apos;em account. No matter, we will just need to give it our token. To do this, we will need to pass in our token as a header, which is specified with an <code>-H</code> tag. Note the backslashes (<code>\</code>) in our request&#x2013;they allow us to put parts of our command on different lines, which allow us to make our requests more readable.</p><p>Much like querying the CFBD API, we simply pass the header <code>&apos;authorization: Bearer {your token here--no brackets!}</code> into our request as a header to our basic request. </p><pre><code class="language-bash">curl &apos;https://predictionsapi.collegefootballdata.com/api/picks&apos; \
  -H &apos;authorization: Bearer {your token here!}&apos;</code></pre><pre><code class="language-bash">[{&quot;id&quot;:401754531,&quot;season&quot;:2025,&quot;seasonType&quot;:&quot;regular&quot;,&quot;week&quot;:3,&quot;homeId&quot;:154,&quot;homeTeam&quot;:&quot;Wake Forest&quot;,&quot;awayId&quot;:152,&quot;awayTeam&quot;:&quot;NC State&quot;,&quot;spread&quot;:7.5,&quot;pickId&quot;:120172,&quot;pick&quot;:[REDACTED]},
...
{&quot;id&quot;:401752921,&quot;season&quot;:2025,&quot;seasonType&quot;:&quot;regular&quot;,&quot;week&quot;:14,&quot;homeId&quot;:130,&quot;homeTeam&quot;:&quot;Michigan&quot;,&quot;awayId&quot;:194,&quot;awayTeam&quot;:&quot;Ohio State&quot;,&quot;spread&quot;:5.5,&quot;pickId&quot;:104248,&quot;pick&quot;:[REDACTED]}]</code></pre><p>With this request, we have raw JSON data representing all of the games we have to pick! The <code>id</code> for each game returned by your request is identical to the <code>id</code> for games returned by requests to the CFBD API, so you can easily determine which games you need to predict for the contest.</p><h2 id="submitting-predictions">Submitting predictions</h2><p>Suppose we have our prediction for the Michigan/Ohio State game for the end of the season&#x2013;we predict Michigan will win by 3.5 points (Bill will not let me publish this blog post if I do not have Michigan winning). We want to submit our prediction to the site. How can we? We have three options:</p><ol><li>We can manually submit our prediction on the website.</li><li>We can use the CSV import button on the website to submit our prediction for the game and any other games we want to make predictions for.</li><li>We can use cURL to make another HTTP request and submit our predictions algorithmically!</li></ol><p>The third option is going to integrate most seamlessly into any prediction pipeline we build. To do this, we can craft another cURL, this time making a <code>POST</code> request.</p><p>A <code>POST</code> request is kind of like sending a letter&#x2013;you put what you want to send in your envelope, address it, and then <code>POST</code> it in the mail.</p><p>Just like before, we will need to include authorization for our request. Then, as a second header, we will need to tell cURL what format the data we are sending it is in&#x2013;in this case, we are sending it some JSON. Finally, we send it some formatted JSON to reflect the pick we are submitting:</p><pre><code class="language-bash">curl &apos;https://predictionsapi.collegefootballdata.com/api/picks&apos; \
  -H &apos;authorization: Bearer {your token here!}&apos; \
  -H &apos;content-type: application/json&apos; \
  --data-raw &apos;{&quot;gameId&quot;:401752921,&quot;pick&quot;:-3.5}&apos;</code></pre><p>We don&apos;t get any output to our console with this request, but if we check the website, we can see that our submission went through to the predictions page!</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://blog.collegefootballdata.com/content/images/2025/09/image-3.png" class="kg-image" alt loading="lazy" width="1085" height="86" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2025/09/image-3.png 600w, https://blog.collegefootballdata.com/content/images/size/w1000/2025/09/image-3.png 1000w, https://blog.collegefootballdata.com/content/images/2025/09/image-3.png 1085w" sizes="(min-width: 720px) 720px"><figcaption>Our final prediction</figcaption></figure><h2 id="wrapping-it-up">Wrapping it up</h2><p>Keep in mind to use HTTP requests responsibly&#x2013;you do not want to spam a website with HTTP requests, as this can cause an unintentional denial of service or &quot;DoS&quot; attack or cause your IP to be limited (or even banned!) if you are not careful. Make sure you put adequate time in between HTTP requests to allow the website enough time to process your requests.</p><p>This should arm you with the tools to quickly pull in, predict, and submit your forecasts to the CFBD Model Pick&apos;em! If you have data structured with a prediction for each CFBD game ID, submitting your predictions becomes a cinch. And because of how many languages allow you to submit HTTP requests, it should take very little work to submit predictions automatically using whatever language you use to generate predictions! Enjoy and best of luck in the prediction contest!</p>]]></content:encoded></item><item><title><![CDATA[10 Data-Driven Visualizations That Will Change the Way You Watch College Football]]></title><description><![CDATA[Discover 10 jaw-dropping visualizations powered by real college football data. These charts reveal hidden trends, game-changing stats, and new ways to watch the game you love.]]></description><link>https://blog.collegefootballdata.com/data-driven-college-football-visualizations/</link><guid isPermaLink="false">686b46a32a659a00015cb497</guid><category><![CDATA[college-football]]></category><category><![CDATA[data-visualization]]></category><category><![CDATA[advanced-stats]]></category><category><![CDATA[cfb-analysis]]></category><dc:creator><![CDATA[Bill Radjewski]]></dc:creator><pubDate>Wed, 10 Sep 2025 19:30:23 GMT</pubDate><media:content url="https://images.unsplash.com/photo-1488229297570-58520851e868?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wxMTc3M3wwfDF8c2VhcmNofDN8fGRhdGF8ZW58MHx8fHwxNzUxNzQ4MDM4fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=2000" medium="image"/><content:encoded><![CDATA[<img src="https://images.unsplash.com/photo-1488229297570-58520851e868?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wxMTc3M3wwfDF8c2VhcmNofDN8fGRhdGF8ZW58MHx8fHwxNzUxNzQ4MDM4fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=2000" alt="10 Data-Driven Visualizations That Will Change the Way You Watch College Football"><p>We all love the scoreboard, but sometimes it doesn&apos;t tell the whole story. That&#x2019;s where data visualizations come in. They bring out the trends, the truths, and the surprises that raw box scores can&#x2019;t capture.</p><p>Here are ten of my favorite charts, built from opponent-adjusted metrics and team-level data, that offer a deeper look into how the game is really played from both sides of the ball.</p><hr><h2 id="1-success-rate-standard-downs-vs-passing-downs">1. Success Rate: Standard Downs vs. Passing Downs</h2><p>This pair of charts shows how teams perform in different game situations. Offensively, it&apos;s about staying efficient whether you&apos;re ahead of schedule or in a hole. Defensively, it&#x2019;s about getting stops when it matters most.</p><h3 id="offense">Offense</h3><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2025/07/AdjustedSuccessRates-1.png" class="kg-image" alt="10 Data-Driven Visualizations That Will Change the Way You Watch College Football" loading="lazy" width="2000" height="1500" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2025/07/AdjustedSuccessRates-1.png 600w, https://blog.collegefootballdata.com/content/images/size/w1000/2025/07/AdjustedSuccessRates-1.png 1000w, https://blog.collegefootballdata.com/content/images/size/w1600/2025/07/AdjustedSuccessRates-1.png 1600w, https://blog.collegefootballdata.com/content/images/2025/07/AdjustedSuccessRates-1.png 2000w" sizes="(min-width: 720px) 720px"></figure><p>Teams in the top right are effective on both standard and passing downs. The bottom left highlights units that struggle to stay on track or recover from setbacks.</p><h3 id="defense">Defense</h3><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2025/07/AdjustedSuccessRatesDefense-2.png" class="kg-image" alt="10 Data-Driven Visualizations That Will Change the Way You Watch College Football" loading="lazy" width="2000" height="1500" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2025/07/AdjustedSuccessRatesDefense-2.png 600w, https://blog.collegefootballdata.com/content/images/size/w1000/2025/07/AdjustedSuccessRatesDefense-2.png 1000w, https://blog.collegefootballdata.com/content/images/size/w1600/2025/07/AdjustedSuccessRatesDefense-2.png 1600w, https://blog.collegefootballdata.com/content/images/2025/07/AdjustedSuccessRatesDefense-2.png 2000w" sizes="(min-width: 720px) 720px"></figure><p>On the defensive side, top-left teams shut down early-down runs and force passing situations but then struggle. Those in the bottom right may clean up on 3rd-and-long but struggle to contain base plays.</p><hr><h2 id="2-line-yards-vs-epa-per-rush">2. Line Yards vs. EPA per Rush</h2><p>How much push does your line get, and what are your backs doing with it? And on defense, are you stonewalling rushers or getting gashed despite contact?</p><h3 id="offense-1">Offense</h3><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2025/07/AdjustedLineYardsVsEPA.png" class="kg-image" alt="10 Data-Driven Visualizations That Will Change the Way You Watch College Football" loading="lazy" width="2000" height="1500" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2025/07/AdjustedLineYardsVsEPA.png 600w, https://blog.collegefootballdata.com/content/images/size/w1000/2025/07/AdjustedLineYardsVsEPA.png 1000w, https://blog.collegefootballdata.com/content/images/size/w1600/2025/07/AdjustedLineYardsVsEPA.png 1600w, https://blog.collegefootballdata.com/content/images/2025/07/AdjustedLineYardsVsEPA.png 2000w" sizes="(min-width: 720px) 720px"></figure><p>This chart compares line yards (blocking effectiveness) to rushing EPA (actual value). Teams in the top-left are relying on their playmakers to bail out their lethargic run game. Teams on the bottom-left are getting consistent push but not enough to spring explosive plays.</p><h3 id="defense-1">Defense</h3><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2025/07/AdjustedLineYardsVsEPAAllowed.png" class="kg-image" alt="10 Data-Driven Visualizations That Will Change the Way You Watch College Football" loading="lazy" width="2000" height="1500" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2025/07/AdjustedLineYardsVsEPAAllowed.png 600w, https://blog.collegefootballdata.com/content/images/size/w1000/2025/07/AdjustedLineYardsVsEPAAllowed.png 1000w, https://blog.collegefootballdata.com/content/images/size/w1600/2025/07/AdjustedLineYardsVsEPAAllowed.png 1600w, https://blog.collegefootballdata.com/content/images/2025/07/AdjustedLineYardsVsEPAAllowed.png 2000w" sizes="(min-width: 720px) 720px"></figure><p>Defensively, it&#x2019;s about limiting both initial yardage and big-play potential. Teams in the top-right are stone walls, stuffing runs and denying explosive plays. Top-left teams are your classic bend-don&apos;t-break defenses.</p><hr><h2 id="3-3rd-down-success-vs-average-distance">3. 3rd Down Success vs. Average Distance</h2><p>Success on 3rd down isn&#x2019;t just about execution, it&#x2019;s also about setting yourself up with manageable situations. These charts break down how offenses and defenses handle the money down.</p><h3 id="offense-2">Offense</h3><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2025/07/3rdDownSuccess-1.png" class="kg-image" alt="10 Data-Driven Visualizations That Will Change the Way You Watch College Football" loading="lazy" width="2000" height="1200" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2025/07/3rdDownSuccess-1.png 600w, https://blog.collegefootballdata.com/content/images/size/w1000/2025/07/3rdDownSuccess-1.png 1000w, https://blog.collegefootballdata.com/content/images/size/w1600/2025/07/3rdDownSuccess-1.png 1600w, https://blog.collegefootballdata.com/content/images/size/w2400/2025/07/3rdDownSuccess-1.png 2400w" sizes="(min-width: 720px) 720px"></figure><p>Elite teams convert often and avoid long-yardage scenarios. High success, low distance is the sweet spot. Teams above the trendline are what you would call clutch. They convert more often than you would expect given the average distance to go.</p><h3 id="defense-2">Defense</h3><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2025/07/3rdDownSuccessAllowed.png" class="kg-image" alt="10 Data-Driven Visualizations That Will Change the Way You Watch College Football" loading="lazy" width="2000" height="1200" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2025/07/3rdDownSuccessAllowed.png 600w, https://blog.collegefootballdata.com/content/images/size/w1000/2025/07/3rdDownSuccessAllowed.png 1000w, https://blog.collegefootballdata.com/content/images/size/w1600/2025/07/3rdDownSuccessAllowed.png 1600w, https://blog.collegefootballdata.com/content/images/size/w2400/2025/07/3rdDownSuccessAllowed.png 2400w" sizes="(min-width: 720px) 720px"></figure><p>Strong defenses force longer 3rd downs and keep conversion rates low. Teams above the trendline hold firm on 3rd down more often than expected. These defensive coordinators are earning their paycheck.</p><!--kg-card-begin: html--><script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-3674605305984905" crossorigin="anonymous"></script>
<ins class="adsbygoogle" style="display:block; text-align:center;" data-ad-layout="in-article" data-ad-format="fluid" data-ad-client="ca-pub-3674605305984905" data-ad-slot="7107763740"></ins>
<script>
     (adsbygoogle = window.adsbygoogle || []).push({});
</script><!--kg-card-end: html--><hr><h2 id="4-rushing-style-line-yards-vs-highlight-yards">4. Rushing Style: Line Yards vs. Highlight Yards</h2><p>These charts reflect rushing identity. Offenses may grind out consistent gains or rely on splash plays. Defenses may force teams into low-efficiency runs or give up explosive gains.</p><h3 id="offense-3">Offense</h3><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2025/07/LineYardsVsHighlightYards.png" class="kg-image" alt="10 Data-Driven Visualizations That Will Change the Way You Watch College Football" loading="lazy" width="2000" height="1500" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2025/07/LineYardsVsHighlightYards.png 600w, https://blog.collegefootballdata.com/content/images/size/w1000/2025/07/LineYardsVsHighlightYards.png 1000w, https://blog.collegefootballdata.com/content/images/size/w1600/2025/07/LineYardsVsHighlightYards.png 1600w, https://blog.collegefootballdata.com/content/images/2025/07/LineYardsVsHighlightYards.png 2000w" sizes="(min-width: 720px) 720px"></figure><p>Teams in the top right have both push and explosiveness. Upper-left teams are high-risk, high-reward. Lower-right teams grind it out but lack big-play potential.</p><h3 id="defense-3">Defense</h3><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2025/07/LineYardsVsHighlightYardsAllowed-1.png" class="kg-image" alt="10 Data-Driven Visualizations That Will Change the Way You Watch College Football" loading="lazy" width="2000" height="1500" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2025/07/LineYardsVsHighlightYardsAllowed-1.png 600w, https://blog.collegefootballdata.com/content/images/size/w1000/2025/07/LineYardsVsHighlightYardsAllowed-1.png 1000w, https://blog.collegefootballdata.com/content/images/size/w1600/2025/07/LineYardsVsHighlightYardsAllowed-1.png 1600w, https://blog.collegefootballdata.com/content/images/2025/07/LineYardsVsHighlightYardsAllowed-1.png 2000w" sizes="(min-width: 720px) 720px"></figure><p>Great defenses show up in the top right, limiting both consistent gains and explosive plays. Penn State was fantastic last season against the run. Struggling units trend toward the bottom right.</p><hr><h2 id="5-dominating-the-trenches">5. Dominating the Trenches</h2><p>Winning up front still wins games. This chart shows which teams are physically controlling the line of scrimmage on both sides of the ball.</p><h3 id="offense-vs-defense">Offense vs. Defense</h3><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2025/07/DominationInTheTrenches.png" class="kg-image" alt="10 Data-Driven Visualizations That Will Change the Way You Watch College Football" loading="lazy" width="2000" height="1500" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2025/07/DominationInTheTrenches.png 600w, https://blog.collegefootballdata.com/content/images/size/w1000/2025/07/DominationInTheTrenches.png 1000w, https://blog.collegefootballdata.com/content/images/size/w1600/2025/07/DominationInTheTrenches.png 1600w, https://blog.collegefootballdata.com/content/images/2025/07/DominationInTheTrenches.png 2000w" sizes="(min-width: 720px) 720px"></figure><p>This combo plot shows offensive line yards gained vs. defensive line yards allowed. Top-right teams are trench kings who win both sides of the battle. Bottom-left teams are getting bullied around on both sides of the ball and may need to rethink their physical identity.</p><hr><h2 id="6-recruiting-vs-nfl-draft-output">6. Recruiting vs. NFL Draft Output</h2><p>Having top talent is great. Developing it into draft picks is even better. This chart doesn&#x2019;t break down any game statistics or metrics, but it tells a powerful story.</p><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2025/07/draft_recruiting-1.png" class="kg-image" alt="10 Data-Driven Visualizations That Will Change the Way You Watch College Football" loading="lazy" width="2000" height="1000" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2025/07/draft_recruiting-1.png 600w, https://blog.collegefootballdata.com/content/images/size/w1000/2025/07/draft_recruiting-1.png 1000w, https://blog.collegefootballdata.com/content/images/size/w1600/2025/07/draft_recruiting-1.png 1600w, https://blog.collegefootballdata.com/content/images/2025/07/draft_recruiting-1.png 2000w" sizes="(min-width: 720px) 720px"></figure><p>Some programs overachieve and produce pros from modest classes. Others underdeliver despite recruiting success. Michigan and Georgia stand out as elite in both talent acquisition and development. Texas A&amp;M and Clemson stand out for quite different reasons.</p><hr><h2 id="7-net-success-rate-by-half">7. Net Success Rate by Half</h2><p>Who gets better as the game goes on? This charts capture how teams perform before and after halftime, showing coaching adjustments, depth, and late-game execution.</p><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2025/07/NetSuccessByHalf-1.png" class="kg-image" alt="10 Data-Driven Visualizations That Will Change the Way You Watch College Football" loading="lazy" width="2000" height="1200" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2025/07/NetSuccessByHalf-1.png 600w, https://blog.collegefootballdata.com/content/images/size/w1000/2025/07/NetSuccessByHalf-1.png 1000w, https://blog.collegefootballdata.com/content/images/size/w1600/2025/07/NetSuccessByHalf-1.png 1600w, https://blog.collegefootballdata.com/content/images/size/w2400/2025/07/NetSuccessByHalf-1.png 2400w" sizes="(min-width: 720px) 720px"></figure><p>Top-right teams are consistently good all game. Upper-left teams improve throughout the game. Lower-right teams start strong but fade.</p><!--kg-card-begin: html--><script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-3674605305984905" crossorigin="anonymous"></script>
<ins class="adsbygoogle" style="display:block; text-align:center;" data-ad-layout="in-article" data-ad-format="fluid" data-ad-client="ca-pub-3674605305984905" data-ad-slot="7107763740"></ins>
<script>
     (adsbygoogle = window.adsbygoogle || []).push({});
</script><!--kg-card-end: html--><hr><h2 id="8-average-starting-field-position">8. Average Starting Field Position</h2><p>It&#x2019;s not just about scoring, it&#x2019;s about controlling the field. Field position tells the hidden story of efficiency and control. These charts map where teams tend to spend their time on both sides of the ball.</p><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2025/07/AveragePlayYardline.png" class="kg-image" alt="10 Data-Driven Visualizations That Will Change the Way You Watch College Football" loading="lazy" width="2000" height="1200" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2025/07/AveragePlayYardline.png 600w, https://blog.collegefootballdata.com/content/images/size/w1000/2025/07/AveragePlayYardline.png 1000w, https://blog.collegefootballdata.com/content/images/size/w1600/2025/07/AveragePlayYardline.png 1600w, https://blog.collegefootballdata.com/content/images/size/w2400/2025/07/AveragePlayYardline.png 2400w" sizes="(min-width: 720px) 720px"></figure><p>Top-right teams spend the bulk of their time on the opponent&apos;s side of the field when they have the ball and far away from their own end zone when they don&apos;t. Teams in the bottom left are usually pinned up against their own goal line, whether they have the ball or not.</p><hr><h2 id="9-field-goal-expected-points">9. Field Goal Expected Points</h2><p>This chart shows the <strong>expected point value</strong> of a field goal attempt by distance, based on outcomes for a <strong>replacement-level kicker</strong>. Short kicks (under 30 yards) are nearly automatic, but value drops quickly beyond 40 yards and attempts beyond 50 often return less than 2 points on average.</p><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2025/07/fg_expected_points-1.png" class="kg-image" alt="10 Data-Driven Visualizations That Will Change the Way You Watch College Football" loading="lazy" width="2000" height="1000" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2025/07/fg_expected_points-1.png 600w, https://blog.collegefootballdata.com/content/images/size/w1000/2025/07/fg_expected_points-1.png 1000w, https://blog.collegefootballdata.com/content/images/size/w1600/2025/07/fg_expected_points-1.png 1600w, https://blog.collegefootballdata.com/content/images/2025/07/fg_expected_points-1.png 2000w" sizes="(min-width: 720px) 720px"></figure><p>It&#x2019;s a powerful reminder that not all &#x201C;field goal range&#x201D; is created equal. Coaches must weigh field position and down-distance against the <em>real</em> expected return, not just the hope of three. Kicking talent matters as well, as the curve for an above-average kicker will be more elongated than this one. For a below-average kicker, the curve will drop off much sooner and harsher.</p><hr><h2 id="10-returning-production-usage-vs-epa">10. Returning Production: Usage vs. EPA</h2><p>This one stands on its own. While we don&#x2019;t have a defensive counterpart, it&#x2019;s still a powerful preseason predictor.</p><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2025/09/image-2.png" class="kg-image" alt="10 Data-Driven Visualizations That Will Change the Way You Watch College Football" loading="lazy" width="1395" height="1317" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2025/09/image-2.png 600w, https://blog.collegefootballdata.com/content/images/size/w1000/2025/09/image-2.png 1000w, https://blog.collegefootballdata.com/content/images/2025/09/image-2.png 1395w" sizes="(min-width: 720px) 720px"></figure><p>We chart returning usage (volume) and total EPA (impact) from last season. Teams high in both are not just experienced, they&#x2019;re returning proven performers.</p><hr><p>All of the charts above are based on data from last season (2024), using opponent-adjusted metrics to give a clearer picture of team performance. As the 2025 season unfolds, I&#x2019;ll be posting updated versions of many of these visuals, along with some others, each week.</p><p>You can follow along on <a href="https://twitter.com/CFB_Data?ref=blog.collegefootballdata.com">Twitter/X</a> and <a href="https://bsky.app/profile/collegefootballdata.com?ref=blog.collegefootballdata.com">Bluesky</a>, where I share fresh charts, insights, and data stories throughout the season.</p><hr><h2 id="dig-deeper">Dig Deeper</h2><p>These charts are just a sample of what&#x2019;s possible with the tools at <a href="https://collegefootballdata.com/?ref=blog.collegefootballdata.com">CollegeFootballData.com</a>. Whether you&apos;re building models, prepping picks, or just watching smarter, we&#x2019;ve got the data to give you an edge.</p><ul><li>Explore more visuals and tools</li><li>Try the free or paid tiers of the API</li><li>Join <a href="https://discord.gg/YOURINVITE?ref=blog.collegefootballdata.com">Discord</a> to share your own charts and nerd out</li><li><a href="https://patreon.com/YOURPATREON?ref=blog.collegefootballdata.com">Subscribe on Patreon</a> to unlock more API calls and features</li></ul><hr><p><em>Built with curiosity. Powered by data.</em></p><!--kg-card-begin: html--><script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-3674605305984905" crossorigin="anonymous"></script>
<ins class="adsbygoogle" style="display:block" data-ad-format="autorelaxed" data-ad-client="ca-pub-3674605305984905" data-ad-slot="3470056234"></ins>
<script>
     (adsbygoogle = window.adsbygoogle || []).push({});
</script><!--kg-card-end: html-->]]></content:encoded></item><item><title><![CDATA[From Model Training Pack to Predictions: How to Use Your Model]]></title><description><![CDATA[Got the Model Training Pack but not sure how to use it? This step-by-step guide shows you how to load your saved model, get the right data from the CFBD API, and start making predictions, plus how Tier 3 weekly CSV drops can save you hours.]]></description><link>https://blog.collegefootballdata.com/model-training-pack-how-to-make-predictions/</link><guid isPermaLink="false">689a184bfbdcf500010a5b7a</guid><category><![CDATA[model training pack]]></category><category><![CDATA[predictive modeling]]></category><category><![CDATA[cfb-analysis]]></category><category><![CDATA[college football]]></category><category><![CDATA[college-football]]></category><category><![CDATA[college football analytics]]></category><category><![CDATA[data science]]></category><category><![CDATA[machine learning]]></category><category><![CDATA[python]]></category><category><![CDATA[sports analytics]]></category><category><![CDATA[sports data science]]></category><category><![CDATA[sports modeling]]></category><dc:creator><![CDATA[Bill Radjewski]]></dc:creator><pubDate>Tue, 12 Aug 2025 01:07:50 GMT</pubDate><media:content url="https://images.unsplash.com/photo-1722080826167-4ea87368cbc5?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wxMTc3M3wwfDF8c2VhcmNofDV8fGxhcHRvcCUyMGNvZGluZyUyMGRhcmt8ZW58MHx8fHwxNzU0OTYwNzQ0fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=2000" medium="image"/><content:encoded><![CDATA[<img src="https://images.unsplash.com/photo-1722080826167-4ea87368cbc5?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wxMTc3M3wwfDF8c2VhcmNofDV8fGxhcHRvcCUyMGNvZGluZyUyMGRhcmt8ZW58MHx8fHwxNzU0OTYwNzQ0fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=2000" alt="From Model Training Pack to Predictions: How to Use Your Model"><p>Since launching the <strong>Model Training Pack</strong>, one of the top questions I&apos;ve heard is:</p><blockquote>&#x201C;I&#x2019;ve got the trained model&#x2026; now what?&#x201D;</blockquote><p>If that&#x2019;s you, this guide is for you.<br>We&#x2019;ll walk through:</p><ul><li>What kind of data your model needs to make predictions.</li><li>Where to get that data from the CollegeFootballData API.</li><li>How to load different types of models from the pack and run predictions.</li><li>How to skip the data prep entirely with <strong>Tier 3 weekly CSV drops</strong>.</li></ul><hr><h2 id="1-what-your-model-needs">1. What Your Model Needs</h2><p>The models in the training pack were all built on <strong>feature-ready CSV files</strong>.</p><p>That means:</p><ul><li>The CSV has <strong>the exact same columns</strong> as the training data.</li><li>The columns are in the <strong>same order</strong>.</li><li>The numbers are calculated the same way (e.g., using stats from games <em>before</em> the game you&#x2019;re trying to predict).</li><li>If your CSV doesn&#x2019;t match, your model will throw errors or give bad predictions.</li></ul><hr><h2 id="2-two-ways-to-get-the-data">2. Two Ways to Get the Data</h2><h3 id="option-1-build-it-yourself">Option 1: Build it yourself</h3><p>You can pull comparable data from the CollegeFootballData API.<br>Here are the endpoints you&#x2019;d use, at a high level:</p><!--kg-card-begin: markdown--><table>
<thead>
<tr>
<th style="text-align:left">Feature Group</th>
<th style="text-align:left">API Endpoint</th>
<th style="text-align:left">Key Fields</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left">Opponent-adjusted team metrics</td>
<td style="text-align:left"><code>/wepa/team/season</code></td>
<td style="text-align:left"><code>epa.*</code>, <code>epa_allowed.*</code>, <code>successRate.*</code>, <code>successRateAllowed.*</code></td>
</tr>
<tr>
<td style="text-align:left">Advanced team metrics (non-opponent-adjusted)</td>
<td style="text-align:left"><code>/stats/season/advanced</code></td>
<td style="text-align:left"><code>havoc</code>, <code>fieldPosition</code>, <code>pointsPerOpportunity</code></td>
</tr>
<tr>
<td style="text-align:left">Game metadata</td>
<td style="text-align:left"><code>/games</code></td>
<td style="text-align:left"><code>week</code>, <code>homeTeam</code>, <code>awayTeam</code>, <code>neutralSite</code></td>
</tr>
<tr>
<td style="text-align:left">Betting data</td>
<td style="text-align:left"><code>/lines</code></td>
<td style="text-align:left"><code>lines[*].spread</code></td>
</tr>
<tr>
<td style="text-align:left">Talen composite</td>
<td style="text-align:left"><code>/talent</code></td>
<td style="text-align:left"><code>talent</code></td>
</tr>
</tbody>
</table>
<!--kg-card-end: markdown--><blockquote><strong>Note:</strong> If you build your own CSV, you&#x2019;ll need to join these datasets together and make sure your stats only include games before the prediction week.</blockquote><hr><h3 id="option-2-skip-the-work">Option 2: Skip the work</h3><p>Starting <strong>Week 5</strong>, Tier 3 patrons will get a <strong>weekly CSV</strong> that already:</p><ul><li>Has all the right columns.</li><li>Is in the correct order.</li><li>Uses stats from games before that week.</li></ul><p>With that file, you can go straight to loading your model and running predictions.</p><hr><h2 id="3-loading-and-predicting">3. Loading and Predicting</h2><p>Once you have your CSV, here&#x2019;s how to use it with each type of model from the pack.<br>Replace <code>&quot;week5_features.csv&quot;</code> with your file and <code>&quot;path/to/model&quot;</code> with your model file.</p><hr><h3 id="random-forest-regression-scikit-learn">Random Forest / Regression (scikit-learn)</h3><!--kg-card-begin: markdown--><pre><code class="language-python">import pandas as pd, joblib

# Load your features
X_live = pd.read_csv(&quot;week5_features.csv&quot;)

# Load your model
model = joblib.load(&quot;models/sklearn_rf.pkl&quot;)

# Make predictions
preds = model.predict(X_live)
X_live[&apos;prediction&apos;] = preds
</code></pre>
<!--kg-card-end: markdown--><hr><h3 id="xgboost">XGBoost</h3><!--kg-card-begin: markdown--><pre><code class="language-python">import pandas as pd, xgboost as xgb

# Load your features
X_live = pd.read_csv(&quot;week5_features.csv&quot;)

# Load your model
model = joblib.load(&quot;models/xgb_model.pkl&quot;)

# Make predictions
preds = model.predict_proba(X_live)[:, 1]
X_live[&apos;prediction&apos;] = preds</code></pre>
<!--kg-card-end: markdown--><hr><h3 id="fastai-tabular">fastai (tabular)</h3><!--kg-card-begin: markdown--><pre><code class="language-python">import pandas as pd
from fastai.tabular.all import load_learner

# Load your features
cat_features = [...] # list out categorical features
cont_features = [...] # list out continuous features

X_live = pd.read_csv(&quot;week5_features.csv&quot;)
X_live = X_live[cat_features + cont_features]


# Load your model
learn = load_learner(&quot;models/fastai_model.pkl&quot;)
dls = learn.dls.test_dl(X_live)

# Make predictions
batch_preds = learn.get_preds(dl=dls)[0].numpy()
X_live[&apos;prediction&apos;] = batch_preds
</code></pre>
<!--kg-card-end: markdown--><hr><h2 id="4-common-gotchas">4. Common Gotchas</h2><p><strong>Wrong column order</strong> &#x2192; reorder to match your training data before predicting.</p><p><strong>Missing columns</strong> &#x2192; make sure your CSV includes everything from training.</p><p><strong>Wrong data types</strong> &#x2192; convert strings to numbers where needed.</p><p><strong>fastai category mismatch</strong> &#x2192; your categories must match what the model was trained on.</p><hr><h2 id="5-the-fast-lane">5. The Fast Lane</h2><p>If you want to:</p><ul><li>Avoid merging multiple datasets,</li><li>Skip figuring out lag logic, and</li><li>Be sure your columns match perfectly&#x2026;</li></ul><p>&#x2026;join <strong>Tier 3</strong> on Patreon.<br>Every week starting in Week 5, you&#x2019;ll get a CSV that&#x2019;s ready to feed directly into your model.</p><p><strong><a href="https://www.patreon.com/collegefootballdata?ref=blog.collegefootballdata.com">Join Tier 3 here &#x2192;</a></strong></p><hr><h2 id="6-your-next-steps">6. Your Next Steps</h2><ol><li>Pick one of your models from the pack.</li><li>Grab a CSV, either your own or from the pack, and run the code above to test.</li><li>Get your hands on current-season features (DIY or Tier 3) and start making real predictions.</li></ol><hr><hr><p><strong>Bottom line:</strong><br>If you can build the CSV yourself, great. You now know exactly what your model needs.<br>If you want to skip the grunt work and start predicting in minutes, <a href="https://patreon.com/collegefootballdata?ref=blog.collegefootballdata.com">Tier 3&#x2019;s weekly CSV drops</a> are your fastest path.</p>]]></content:encoded></item><item><title><![CDATA[🧠 10 Tips for Building a College Football Predictive Model Without the Pain]]></title><description><![CDATA[Want to build a college football predictive model without getting buried in data cleanup or modeling dead ends? Here are 10 practical tips to get started, plus how to shortcut the process using the CFBD Starter and Model Training Packs.]]></description><link>https://blog.collegefootballdata.com/college-football-modeling-tips/</link><guid isPermaLink="false">688189e8647cae0001e37c5d</guid><category><![CDATA[college football analytics]]></category><category><![CDATA[sports modeling]]></category><category><![CDATA[machine learning]]></category><category><![CDATA[predictive modeling]]></category><category><![CDATA[starter pack]]></category><category><![CDATA[model training pack]]></category><category><![CDATA[sports data science]]></category><dc:creator><![CDATA[Bill Radjewski]]></dc:creator><pubDate>Mon, 04 Aug 2025 14:00:01 GMT</pubDate><media:content url="https://images.unsplash.com/photo-1566577739112-5180d4bf9390?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wxMTc3M3wwfDF8c2VhcmNofDExfHxmb290YmFsbHxlbnwwfHx8fDE3NTQyNjUyMjh8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=2000" medium="image"/><content:encoded><![CDATA[<img src="https://images.unsplash.com/photo-1566577739112-5180d4bf9390?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wxMTc3M3wwfDF8c2VhcmNofDExfHxmb290YmFsbHxlbnwwfHx8fDE3NTQyNjUyMjh8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=2000" alt="&#x1F9E0; 10 Tips for Building a College Football Predictive Model Without the Pain"><p>So you want to build a college football predictive model. Maybe you&apos;re tired of guessing spreads, or you want to enter a pick&apos;em contest with actual math behind your picks. Great news: you&apos;re not alone and you&apos;re definitely not crazy.</p><p>But here&apos;s the catch.</p><p>Most beginners hit a wall not because they can&apos;t model, but because they can&apos;t get to the modeling stage at all. Data is messy. College football is chaotic. And feature selection? That&#x2019;s a minefield.</p><p>This post will walk you through 10 hard-earned tips for building your first (or better) college football model, faster, cleaner, and smarter. Whether you&apos;re a student learning sports analytics or a fan trying to sharpen your edge, these tips are for you.</p><p>Let&#x2019;s dive in.</p><hr><h2 id="1-start-with-clean-structured-data">1. Start With Clean, Structured Data</h2><p>College football data is notoriously inconsistent across sources. Team names vary, game records are incomplete, and drive data is messy. Cleaning this yourself can take hours or even days.</p><p><strong>Skip that headache.</strong></p><p>Start with a clean dataset like the <a href="https://collegefootballdata.gumroad.com/l/starter-pack?ref=blog.collegefootballdata.com">College Football Starter Pack</a>, which includes structured CSVs for games, drives, plays, advanced stats, and team metadata. It&apos;s all ready for analysis or modeling.</p><p>&#x1F4CC; <em>Bonus: No API calls or rate limits required.</em></p><hr><h2 id="2-wait-a-few-weeks-into-the-season">2. Wait a Few Weeks Into the Season</h2><p>Early-season games (especially Weeks 0&#x2013;4) are notoriously unpredictable. There&#x2019;s simply not enough data to go on and teams are still figuring things out. Sure, you <em>can</em> model these games, but doing it well usually requires a separate approach tailored for low-information scenarios.</p><p>For most use cases, it&#x2019;s better to wait.</p><p>Start your training set in <strong>Week 5</strong>, when team identities begin to solidify, metrics stabilize, and opponent strength becomes more meaningful.</p><p>That&#x2019;s the exact approach I use in the <a href="https://collegefootballdata.gumroad.com/l/model-training-pack?ref=blog.collegefootballdata.com">Model Training Pack</a>, which includes a full training dataset filtered for Week 5 and beyond.</p><!--kg-card-begin: html--><script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-3674605305984905" crossorigin="anonymous"></script>
<ins class="adsbygoogle" style="display:block; text-align:center;" data-ad-layout="in-article" data-ad-format="fluid" data-ad-client="ca-pub-3674605305984905" data-ad-slot="7107763740"></ins>
<script>
     (adsbygoogle = window.adsbygoogle || []).push({});
</script><!--kg-card-end: html--><hr><h2 id="3-opponent-adjustment-isn%E2%80%99t-optional">3. Opponent Adjustment Isn&#x2019;t Optional</h2><p>Raw stats lie.</p><p>Team A&#x2019;s EPA might look elite until you realize they played three bottom-20 defenses. If you&apos;re not adjusting for opponent strength, you&apos;re modeling schedule, not skill.</p><p>Use opponent-adjusted metrics like:</p><ul><li>Adjusted EPA per play metrics</li><li>Adjusted success rates</li><li>Adjusted rushing stats like adjusted line yards</li></ul><p>These are included and ready-to-use in the Model Training Pack. No need to build your own adjustment pipeline (unless you really want to).</p><hr><h2 id="4-margin-first-win-probability-second">4. Margin First, Win Probability Second</h2><p>A lot of beginners jump straight to win/loss prediction. That&#x2019;s fine&#x2014;but you lose granularity. Modeling final score margin gives you much more:</p><p>&#x2705; Win probability<br>&#x2705; Cover probability<br>&#x2705; Total predictions<br>&#x2705; Confidence rankings</p><p>Start by modeling score margin as a regression task, then derive win/loss from it. More signal, more flexibility.</p><hr><h2 id="5-use-features-that-actually-predict-outcomes">5. Use Features That Actually Predict Outcomes</h2><p>More features &#x2260; better model. You want features that have signal, not just noise.</p><p>Some high-value features:</p><ul><li>Opponent-adjusted efficiency stats</li><li>Team talent composite</li><li>Run/pass ratio</li><li>Havoc metrics</li><li>Explosive play rate</li></ul><p>Both the Starter Pack and Model Pack highlight the best ones and show how to use them in sample notebooks.</p><hr><h2 id="6-talent-isn%E2%80%99t-everything-but-it-matters">6. Talent Isn&#x2019;t Everything, But It Matters</h2><p>Talent composite rankings (from 247Sports or similar) are sticky over time. They don&#x2019;t predict game-to-game variance, but they help explain <em>why</em> certain teams outperform models built only on stats.</p><p>Include talent as a prior, especially early in the season.</p><p>We&#x2019;ve already merged talent data into the Model Training Pack so you don&#x2019;t have to track it down or clean it yourself.</p><hr><h2 id="7-don%E2%80%99t-skip-cross-validation">7. Don&#x2019;t Skip Cross-Validation</h2><p>It&#x2019;s tempting to train on one season and test on another, but that won&#x2019;t catch overfitting. Instead:</p><ul><li>Use <strong>k-fold cross-validation</strong></li><li>Shuffle by week or game ID</li><li>Be mindful of data leakage (especially with team-specific stats)</li></ul><p>Even basic models benefit from good validation hygiene.</p><!--kg-card-begin: html--><script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-3674605305984905" crossorigin="anonymous"></script>
<ins class="adsbygoogle" style="display:block; text-align:center;" data-ad-layout="in-article" data-ad-format="fluid" data-ad-client="ca-pub-3674605305984905" data-ad-slot="7107763740"></ins>
<script>
     (adsbygoogle = window.adsbygoogle || []).push({});
</script><!--kg-card-end: html--><hr><h2 id="8-build-a-baseline-before-you-get-fancy">8. Build a Baseline Before You Get Fancy</h2><p>Don&#x2019;t jump straight to neural nets or ensemble methods.</p><p>Start with:</p><ul><li>Linear regression for margin</li><li>Logistic regression for win probability</li><li>Decision trees for feature importance</li></ul><p>Once you&#x2019;ve got a strong baseline, experiment with:</p><ul><li>XGBoost</li><li>Random Forest</li><li>Tabular neural networks (like fastai)</li></ul><p>The Model Training Pack includes working examples of each so you can see how models evolve.</p><hr><h2 id="9-visualize-your-errors">9. Visualize Your Errors</h2><p>Don&#x2019;t just trust metrics like MAE or RMSE. Visualize:</p><ul><li>Predicted vs. actual margin</li><li>Residuals by team</li><li>Over/under predictions by spread</li></ul><p>You&#x2019;ll catch trends you&#x2019;d never spot in raw numbers (e.g., your model consistently underrates service academies or overweights garbage time stats).</p><p>All notebooks included in the Model Training Pack feature error visualization examples to help you troubleshoot fast.</p><hr><h2 id="10-use-prebuilt-tools-to-learn-faster">10. Use Prebuilt Tools to Learn Faster</h2><p>The biggest bottleneck in building a model isn&#x2019;t modeling. It&#x2019;s everything before that:</p><ul><li>Data cleaning</li><li>Feature selection</li><li>Normalization</li><li>Debugging</li></ul><p>The <a href="https://collegefootballdata.gumroad.com/l/starter-pack?ref=blog.collegefootballdata.com">Starter Pack</a> and <a href="https://collegefootballdata.gumroad.com/l/model-training-pack?ref=blog.collegefootballdata.com">Model Training Pack</a> are designed to eliminate those barriers so you can focus on building, testing, and improving your model.</p><p>No gatekeeping. No fluff. Just clean data and working code examples.</p><hr><h2 id="%F0%9F%9A%80-ready-to-get-started">&#x1F680; Ready to Get Started?</h2><p>Here&#x2019;s how to level up your college football modeling journey today:</p><p>&#x1F3AF; <a href="https://collegefootballdata.gumroad.com/l/starter-pack?ref=blog.collegefootballdata.com">Grab the Starter Pack</a> - Ideal for exploring and building your first dashboard or basic model.<br>&#x1F4CA; <a href="https://collegefootballdata.gumroad.com/l/model-training-pack?ref=blog.collegefootballdata.com">Grab the Model Training Pack</a> - Perfect for jumpstarting predictive modeling with ready-to-use training data and sample models.</p><p>Together, they give you everything you need, from structured data to proven code, so you can focus on what matters: building smarter models.</p><hr><h2 id="%F0%9F%93%AC-want-more-tips-like-this">&#x1F4EC; Want More Tips Like This?</h2><p>Follow <a href="https://twitter.com/CFB_Data?ref=blog.collegefootballdata.com">@CFB_Data</a> on Twitter, <a href="https://bsky.app/profile/collegefootballdata.com?ref=blog.collegefootballdata.com">@collegefootballdata.com</a> on Bluesky, and <a href="https://collegefootballdata.com/?ref=blog.collegefootballdata.com">CollegeFootballData.com</a> for more guides, tools, and insights all season long.</p><!--kg-card-begin: html--><script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-3674605305984905" crossorigin="anonymous"></script>
<ins class="adsbygoogle" style="display:block; text-align:center;" data-ad-layout="in-article" data-ad-format="fluid" data-ad-client="ca-pub-3674605305984905" data-ad-slot="7107763740"></ins>
<script>
     (adsbygoogle = window.adsbygoogle || []).push({});
</script><!--kg-card-end: html-->]]></content:encoded></item><item><title><![CDATA[So You Got the Starter Pack. Now What?]]></title><description><![CDATA[Just downloaded the CFBD Starter Pack? Here's how to go beyond the CSVs using the official Python client to pull live and recent data from the CollegeFootballData API.]]></description><link>https://blog.collegefootballdata.com/starter-pack-next-steps/</link><guid isPermaLink="false">687809702a659a00015cb5fd</guid><category><![CDATA[cfbd]]></category><category><![CDATA[college football]]></category><category><![CDATA[python]]></category><category><![CDATA[sports analytics]]></category><category><![CDATA[data science]]></category><category><![CDATA[api]]></category><category><![CDATA[starter pack]]></category><dc:creator><![CDATA[Bill Radjewski]]></dc:creator><pubDate>Thu, 17 Jul 2025 15:00:04 GMT</pubDate><media:content url="https://images.unsplash.com/photo-1543000968-1fe3fd3b714e?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wxMTc3M3wwfDF8c2VhcmNofDE2fHxyb2NrZXR8ZW58MHx8fHwxNzUyNjk3MjY3fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=2000" medium="image"/><content:encoded><![CDATA[<img src="https://images.unsplash.com/photo-1543000968-1fe3fd3b714e?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wxMTc3M3wwfDF8c2VhcmNofDE2fHxyb2NrZXR8ZW58MHx8fHwxNzUyNjk3MjY3fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=2000" alt="So You Got the Starter Pack. Now What?"><p>First off, thanks for picking up the <strong>CFBD Starter Pack</strong>! It gives you cleaned, historical data across several seasons and is perfect for building models, dashboards, and analytics workflows.</p><p>&#x1F449; Don&#x2019;t have the Starter Pack yet? <a href="https://collegefootballdata.gumroad.com/l/starter-pack?ref=blog.collegefootballdata.com">Grab it now</a> and follow along.</p><p>But what if you want to pull in more recent or live data? That&#x2019;s where the <a href="https://api.collegefootballdata.com/?ref=blog.collegefootballdata.com">CollegeFootballData API</a> and <a href="https://github.com/CFBD/cfbd-python?ref=blog.collegefootballdata.com">official Python client</a> come in.</p><p>Let&#x2019;s walk through how to set it up and fetch new data.</p><!--kg-card-begin: html--><script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-3674605305984905" crossorigin="anonymous"></script>
<ins class="adsbygoogle" style="display:block" data-ad-format="autorelaxed" data-ad-client="ca-pub-3674605305984905" data-ad-slot="3470056234"></ins>
<script>
     (adsbygoogle = window.adsbygoogle || []).push({});
</script><!--kg-card-end: html--><hr><h2 id="%F0%9F%94%A7-step-1-install-the-python-client">&#x1F527; Step 1: Install the Python Client</h2><p>Install the package:</p><pre><code class="language-bash">pip install cfbd</code></pre><hr><h2 id="%F0%9F%94%90-step-2-set-your-api-key">&#x1F510; Step 2: Set Your API Key</h2><p>You&#x2019;ll need an API key (free or Patreon tier) from <a href="https://collegefootballdata.com/key?ref=blog.collegefootballdata.com">the CFBD website</a>. Once you have it, set it as an environment variable:</p><pre><code class="language-bash">export BEARER_TOKEN=&quot;your_api_key_here&quot;</code></pre><p>Then set up the configuration in your Python code:</p><pre><code class="language-python">import cfbd
import os

configuration = cfbd.Configuration(
    access_token=os.environ[&quot;BEARER_TOKEN&quot;]
)</code></pre><hr><h2 id="%F0%9F%9A%80-step-3-fetch-data-using-an-api-client">&#x1F680; Step 3: Fetch Data Using an API Client</h2><p>The Python client uses context managers to handle the API session. Here&apos;s how to fetch adjusted player passing stats:</p><pre><code class="language-python">with cfbd.ApiClient(configuration) as api_client:
    api_instance = cfbd.StatsApi(api_client)

    # Example: get advanced game stats for Michigan in 2023
    response = api_instance.get_advanced_game_stats(
        year=2023,
        team=&quot;Michigan&quot;
    )

    print(response)
</code></pre><p>This same pattern works for all endpoints.</p><!--kg-card-begin: html--><script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-3674605305984905" crossorigin="anonymous"></script>
<ins class="adsbygoogle" style="display:block" data-ad-format="autorelaxed" data-ad-client="ca-pub-3674605305984905" data-ad-slot="3470056234"></ins>
<script>
     (adsbygoogle = window.adsbygoogle || []).push({});
</script><!--kg-card-end: html--><hr><h2 id="%F0%9F%93%98-examples-you-can-try">&#x1F4D8; Examples You Can Try</h2><p>Here are a few practical snippets to get started:</p><p><strong>Recent Games</strong></p><pre><code class="language-python">with cfbd.ApiClient(configuration) as api_client:
    games_api = cfbd.GamesApi(api_client)
    games = games_api.get_games(year=2024, week=13)
    for g in games:
        print(f&quot;{g.away_team} at {g.home_team}: {g.away_points}-{g.home_points}&quot;)
</code></pre><p><strong>Team Box Scores</strong></p><pre><code class="language-python">with cfbd.ApiClient(configuration) as api_client:
    stats_api = cfbd.GamesApi(api_client)
    box = stats_api.get_game_team_stats(year=2024, week=13)
    for game in box:
        print(game)
</code></pre><p><strong>Historical Betting Lines</strong></p><pre><code class="language-python">with cfbd.ApiClient(configuration) as api_client:
    betting_api = cfbd.BettingApi(api_client)
    games = betting_api.get_lines(year=2024, week=13)
    for game in games:
        for line in game.lines:
            print(f&quot;{game.away_team} @ {game.home_team}: {line.formatted_spread} ({line.provider})&quot;)
</code></pre><hr><h2 id="%F0%9F%A7%A0-combine-with-the-starter-pack">&#x1F9E0; Combine with the Starter Pack</h2><p>The Starter Pack has historical EPA, recruiting, and drive/play-level data. You can extend it by:</p><ul><li>Merging recent API data with your historical CSVs</li><li>Running your models on up-to-date weekly metrics</li><li>Building dashboards multiple types of data</li></ul><hr><h2 id="%F0%9F%9B%91-watch-your-limits">&#x1F6D1; Watch Your Limits</h2><p>If you&#x2019;re using the Free Tier, you&#x2019;ll be capped at 1,000 calls/month. Consider bumping to a <a href="https://www.patreon.com/c/collegefootballdata?ref=blog.collegefootballdata.com">Patreon plan</a> for more access (and goodies like weather, advanced metrics, and the GraphQL API).</p><p>You can check your remaining calls at any time either via the <code>X-CallLimit-Remaining</code> HTTP header returned with all responses or via the <code>info</code> endpoint (does not count against limits):</p><pre><code class="language-python">with cfbd.ApiClient(configuration) as api_client:
    api_instance = cfbd.InfoApi(api_client)
    api_response = api_instance.get_user_info()
    
    print(api_response)</code></pre><!--kg-card-begin: html--><script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-3674605305984905" crossorigin="anonymous"></script>
<ins class="adsbygoogle" style="display:block" data-ad-format="autorelaxed" data-ad-client="ca-pub-3674605305984905" data-ad-slot="3470056234"></ins>
<script>
     (adsbygoogle = window.adsbygoogle || []).push({});
</script><!--kg-card-end: html--><hr><h2 id="%F0%9F%92%AC-questions-or-feedback">&#x1F4AC; Questions or Feedback?</h2><p>Join the community on <a rel="noopener">Discord</a> or check out the <a href="https://api.collegefootballata.com/?ref=blog.collegefootballdata.com">interactive API docs</a> to explore every endpoint.</p>]]></content:encoded></item><item><title><![CDATA[Talking Tech: Building a March Madness Model using XGBoost]]></title><description><![CDATA[In this edition of Talking Tech, we'll be building our first basketball model. Specifically, we'll use XGBoost to predict games for March Madness.]]></description><link>https://blog.collegefootballdata.com/talking-tech-march-madness-xgboost/</link><guid isPermaLink="false">64d4f2f800396c00013fe1ca</guid><category><![CDATA[Talking Tech]]></category><category><![CDATA[Programming]]></category><dc:creator><![CDATA[Bill Radjewski]]></dc:creator><pubDate>Sat, 15 Mar 2025 01:59:48 GMT</pubDate><media:content url="https://images.unsplash.com/photo-1612048405411-f71a1b9d3e8a?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wxMTc3M3wwfDF8c2VhcmNofDY0fHxncmFkaWVudHxlbnwwfHx8fDE3NDE5OTgxMzd8MA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=2000" medium="image"/><content:encoded><![CDATA[<img src="https://images.unsplash.com/photo-1612048405411-f71a1b9d3e8a?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wxMTc3M3wwfDF8c2VhcmNofDY0fHxncmFkaWVudHxlbnwwfHx8fDE3NDE5OTgxMzd8MA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=2000" alt="Talking Tech: Building a March Madness Model using XGBoost"><p>In one of the earliest iterations of Talking Tech, we built <a href="https://blog.collegefootballdata.com/talking-tech-predicting-play-calls-using-a-random-forest-classifier">a random forest classifier to predict play calls</a> for college football. In another edition, we used my personally preffered method of <a href="https://blog.collegefootballdata.com/talking-tech-building-an-artifical-neural-network-to/">building an artificial neural network</a> to predict college football games. In this edition, we&apos;re going to dive into another type of machine learning method. &#xA0;Like the earlier walkthrough using a random forest classifier, we&apos;ll look at another type of <a href="https://en.wikipedia.org/wiki/Ensemble_learning?ref=blog.collegefootballdata.com">ensemble method</a>. An ensemble method builds numerous disparate models and relies on strength through sheer numbers. In the random forest method, a multitude of decision trees is generated and their outputs all gathered together in the final output. In this post, we&apos;re going to use an ensemble method that&apos;s a little less, well, random.</p><!--kg-card-begin: markdown--><h1 id="gradientboosting">Gradient Boosting</h1>
<p><a href="https://en.wikipedia.org/wiki/Gradient_boosting?ref=blog.collegefootballdata.com">Gradient boosting</a> is similar in many ways to random forest methods. Both are ensemble models. Both typically make use of decision trees. Both also can be used for either classification or regression. So what sets them apart? If you remember, random forest methods typically generate a multitude of decision trees at random, counting on the erroneous trees to cancel each other out, more or less, while the stronger trees rise to the top. Gradient boosting, on the other hand, will start with one decision tree, evaluate it, and then use resulting error to generate another decision tree that is incrementally more accurate. Rinse and repeat.</p>
<p>Eventually, this results in a multitude of trees all chained together, each one using the insights from its predecessors to make itself more accurate. But you don&apos;t simply discard the older models. All generated trees make up the final model, which makes this another ensemble method. You can see how this method might perform much better than random forests. In fact, gradient boosted decision tree models are usually some of the top performing in Kaggle competitions and the like.</p>
<p>When it comes to gradient boosting in Python, there are two libraries with which I am familiar: <a href="https://github.com/dmlc/xgboost?ref=blog.collegefootballdata.com">XGBoost</a> and <a href="https://github.com/microsoft/LightGBM?ref=blog.collegefootballdata.com">LightGBM</a>. While both libraries are solid options, we&apos;re going to be using XGBoost in this post. However, I do recommend going back and giving LightGBM a look at some point.</p>
<!--kg-card-end: markdown--><hr><!--kg-card-begin: html--><script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-3674605305984905" crossorigin="anonymous"></script>
<ins class="adsbygoogle" style="display:block; text-align:center;" data-ad-layout="in-article" data-ad-format="fluid" data-ad-client="ca-pub-3674605305984905" data-ad-slot="7107763740"></ins>
<script>
     (adsbygoogle = window.adsbygoogle || []).push({});
</script><!--kg-card-end: html--><hr><h1 id="gathering-data">Gathering Data</h1><p>We will be using the CBBD Python library to pull data from the CollegeBasketballData.com REST API. In total, we will be using these packages: <code>cbbd</code>, <code>pandas</code>, <code>sklearn</code>, <code>xgboost</code>. Be sure to have those all installed via <code>pip</code> or however you manage your Python dependencies. We will start importing everything we need up front. We will also set up our CBBD API key so enter your into the placeholder below. If you need a key, you can acquire one from the <a href="https://collegebasketballdata.com/keys?ref=blog.collegefootballdata.com">main CBBD site</a>.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
import cbbd
import pandas as pd
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import train_test_split
from xgboost import XGBRegressor

configuration = cbbd.Configuration(
    access_token = &apos;your_api_key_here&apos;
)
</code></pre><!--kg-card-end: html--><p>I should also note that we will be making a total of 22 API calls, well within the free tier of 1000 monthly calls provided by CBBD and enough to rerun this model many times over.</p><p>Next, we will compile all NCAA tournament games from 2013 to 2024. You can go further back if you desire. Note that we are passing in a parameter of <code>tournament=&apos;NCAA&apos;</code>. This allows us to conveniently query all tournament games for a given year.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
games = []
with cbbd.ApiClient(configuration) as api_client:
    games_api = cbbd.GamesApi(api_client)
    for season in range(2024, 2013, -1):
        results = games_api.get_games(season=season, tournament=&apos;NCAA&apos;)
        games += results
len(games)
</code></pre><!--kg-card-end: html--><p>That returned 686 games. Let&apos;s see what data is included in a game record.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
games[0]
</code></pre><!--kg-card-end: html--><pre><code class="language-bash">GameInfo(id=12010, source_id=&apos;401638579&apos;, season_label=&apos;20232024&apos;, season=2024, season_type=&lt;SeasonType.POSTSEASON: &apos;postseason&apos;&gt;, start_date=datetime.datetime(2024, 3, 19, 18, 40, tzinfo=datetime.timezone.utc), start_time_tbd=False, neutral_site=True, conference_game=False, game_type=&apos;TRNMNT&apos;, tournament=&apos;NCAA&apos;, game_notes=&quot;Men&apos;s Basketball Championship - West Region - First Four&quot;, status=&lt;GameStatus.FINAL: &apos;final&apos;&gt;, attendance=0, home_team_id=114, home_team=&apos;Howard&apos;, home_conference_id=18, home_conference=&apos;MEAC&apos;, home_seed=16, home_points=68, home_period_points=[27, 41], home_winner=False, away_team_id=341, away_team=&apos;Wagner&apos;, away_conference_id=21, away_conference=&apos;NEC&apos;, away_seed=16, away_points=71, away_period_points=[38, 33], away_winner=True, excitement=4.7, venue_id=76, venue=&apos;UD Arena&apos;, city=&apos;Dayton&apos;, state=&apos;OH&apos;)</code></pre><p>Now we need to load up some stats to incorporate as features into our model. We will use the CBBD Stats API to query for team season stats for the same years for which we queried tournament game data. Note that we are passing in a <code>season_type=&apos;regular&apos;</code> parameter. THIS IS IMPORTANT. We want to ONLY grab statistics for the regular season. In other words, stats that were available prior to the start of the tournament in a given year. Failing to pass in the filter will result in a model that is not <em>predictive</em>, but <em>retrodictive</em>. This is a VERY common mistake people make including data and statistics that were not available at the time of the games they are seeking to predict.</p><p>Anyway, run the code below to grab team season stats.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
stats = []
with cbbd.ApiClient(configuration) as api_client:
    stats_api = cbbd.StatsApi(api_client)
    for season in range(2024, 2013, -1):
        results = stats_api.get_team_season_stats(season=season, season_type=&apos;regular&apos;)
        stats += results
len(stats)
</code></pre><!--kg-card-end: html--><p>And we&apos;ll also check out the contents of the stats records.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
stats[0]
</code></pre><!--kg-card-end: html--><pre><code>TeamSeasonStats(season=2024, season_label=&apos;20232024&apos;, team_id=1, team=&apos;Abilene Christian&apos;, conference=&apos;WAC&apos;, games=32, wins=15, losses=17, total_minutes=1325, pace=61.1, team_stats=TeamSeasonUnitStats(field_goals=TeamSeasonUnitStatsFieldGoals(pct=43.2, attempted=1877, made=811), two_point_field_goals=TeamSeasonUnitStatsFieldGoals(pct=46.4, attempted=1393, made=646), three_point_field_goals=TeamSeasonUnitStatsFieldGoals(pct=34.1, attempted=484, made=165), free_throws=TeamSeasonUnitStatsFieldGoals(pct=73.1, attempted=729, made=533), rebounds=TeamSeasonUnitStatsRebounds(total=1070, defensive=756, offensive=314), turnovers=TeamSeasonUnitStatsTurnovers(team_total=12, total=404), fouls=TeamSeasonUnitStatsFouls(flagrant=0, technical=6, total=635), points=TeamSeasonUnitStatsPoints(fast_break=319, off_turnovers=466, in_paint=1138, total=2320), four_factors=TeamSeasonUnitStatsFourFactors(free_throw_rate=38.8, offensive_rebound_pct=29.3, turnover_ratio=0.2, effective_field_goal_pct=47.6), assists=405, blocks=65, steals=253, possessions=2028, rating=114.4, true_shooting=52.8), opponent_stats=TeamSeasonUnitStats(field_goals=TeamSeasonUnitStatsFieldGoals(pct=46.5, attempted=1792, made=833), two_point_field_goals=TeamSeasonUnitStatsFieldGoals(pct=52.6, attempted=1227, made=645), three_point_field_goals=TeamSeasonUnitStatsFieldGoals(pct=33.3, attempted=565, made=188), free_throws=TeamSeasonUnitStatsFieldGoals(pct=68.7, attempted=723, made=497), rebounds=TeamSeasonUnitStatsRebounds(total=1171, defensive=859, offensive=312), turnovers=TeamSeasonUnitStatsTurnovers(team_total=23, total=478), fouls=TeamSeasonUnitStatsFouls(flagrant=0, technical=6, total=619), points=TeamSeasonUnitStatsPoints(fast_break=316, off_turnovers=411, in_paint=1120, total=2351), four_factors=TeamSeasonUnitStatsFourFactors(free_throw_rate=40.3, offensive_rebound_pct=26.6, turnover_ratio=0.2, effective_field_goal_pct=51.7), assists=388, blocks=108, steals=206, possessions=2023, rating=116.2, true_shooting=55.7))</code></pre><p>That&apos;s a lot of stats! The final step here is to match the team statistics with each game record and put those into a data frame. We are going to create a list of <code>dict</code> objects to combine this data, which will be pretty easy to load up into <code>pandas</code>.</p><p>In the code below, we are converting each game objet into a <code>dict</code>, querying team stats for the home and away team, and then loading up data points from each stats object into the dict. You can completely change these up if you desire or add different stats. I am not trying to build the most comprehensive or accurate model in this exercise. I am merely trying to give you a good idea of how to combine the data and get it into the correct format.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
records = []
for game in games:
    record = game.to_dict()
    home_stats = [stat for stat in stats if stat.team_id == game.home_team_id and stat.season == game.season][0]
    away_stats = [stat for stat in stats if stat.team_id == game.away_team_id and stat.season == game.season][0]
    record[&apos;home_pace&apos;] = home_stats.pace
    record[&apos;home_o_rating&apos;] = home_stats.team_stats.rating
    record[&apos;home_d_rating&apos;] = home_stats.opponent_stats.rating
    record[&apos;home_free_throw_rate&apos;] = home_stats.team_stats.four_factors.free_throw_rate
    record[&apos;home_offensive_rebound_rate&apos;] = home_stats.team_stats.four_factors.offensive_rebound_pct
    record[&apos;home_turnover_ratio&apos;] = home_stats.team_stats.four_factors.turnover_ratio
    record[&apos;home_efg&apos;] = home_stats.team_stats.four_factors.effective_field_goal_pct
    record[&apos;home_free_throw_rate_allowed&apos;] = home_stats.opponent_stats.four_factors.free_throw_rate
    record[&apos;home_offensive_rebound_rate_allowed&apos;] = home_stats.opponent_stats.four_factors.offensive_rebound_pct
    record[&apos;home_turnover_ratio_forced&apos;] = home_stats.opponent_stats.four_factors.turnover_ratio
    record[&apos;home_efg_allowed&apos;] = home_stats.opponent_stats.four_factors.effective_field_goal_pct
    record[&apos;away_pace&apos;] = away_stats.pace
    record[&apos;away_o_rating&apos;] = away_stats.team_stats.rating
    record[&apos;away_d_rating&apos;] = away_stats.opponent_stats.rating
    record[&apos;away_free_throw_rate&apos;] = away_stats.team_stats.four_factors.free_throw_rate
    record[&apos;away_offensive_rebound_rate&apos;] = away_stats.team_stats.four_factors.offensive_rebound_pct
    record[&apos;away_turnover_ratio&apos;] = away_stats.team_stats.four_factors.turnover_ratio
    record[&apos;away_efg&apos;] = away_stats.team_stats.four_factors.effective_field_goal_pct
    record[&apos;away_free_throw_rate_allowed&apos;] = away_stats.opponent_stats.four_factors.free_throw_rate
    record[&apos;away_offensive_rebound_rate_allowed&apos;] = away_stats.opponent_stats.four_factors.offensive_rebound_pct
    record[&apos;away_turnover_ratio_forced&apos;] = away_stats.opponent_stats.four_factors.turnover_ratio
    record[&apos;away_efg_allowed&apos;] = away_stats.opponent_stats.four_factors.effective_field_goal_pct
    records.append(record)
len(records)
</code></pre><!--kg-card-end: html--><p>All that&apos;s left to do is load this into a data frame. Once loaded up, I am going to compute a new column for the final score margin based on the home and away score columns.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
df = pd.DataFrame(records)
df[&apos;margin&apos;] = df.homePoints - df.awayPoints
df.head()
</code></pre><!--kg-card-end: html--><!--kg-card-begin: html--><div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>id</th>
      <th>sourceId</th>
      <th>seasonLabel</th>
      <th>season</th>
      <th>seasonType</th>
      <th>startDate</th>
      <th>startTimeTbd</th>
      <th>neutralSite</th>
      <th>conferenceGame</th>
      <th>gameType</th>
      <th>...</th>
      <th>away_d_rating</th>
      <th>away_free_throw_rate</th>
      <th>away_offensive_rebound_rate</th>
      <th>away_turnover_ratio</th>
      <th>away_efg</th>
      <th>away_free_throw_rate_allowed</th>
      <th>away_offensive_rebound_rate_allowed</th>
      <th>away_turnover_ratio_forced</th>
      <th>away_efg_allowed</th>
      <th>margin</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>12010</td>
      <td>401638579</td>
      <td>20232024</td>
      <td>2024</td>
      <td>SeasonType.POSTSEASON</td>
      <td>2024-03-19 18:40:00+00:00</td>
      <td>False</td>
      <td>True</td>
      <td>False</td>
      <td>TRNMNT</td>
      <td>...</td>
      <td>98.3</td>
      <td>26.2</td>
      <td>31.4</td>
      <td>0.2</td>
      <td>45.4</td>
      <td>29.1</td>
      <td>25.4</td>
      <td>0.2</td>
      <td>47.9</td>
      <td>-3</td>
    </tr>
    <tr>
      <th>1</th>
      <td>12009</td>
      <td>401638580</td>
      <td>20232024</td>
      <td>2024</td>
      <td>SeasonType.POSTSEASON</td>
      <td>2024-03-19 21:10:00+00:00</td>
      <td>False</td>
      <td>True</td>
      <td>False</td>
      <td>TRNMNT</td>
      <td>...</td>
      <td>102.0</td>
      <td>32.4</td>
      <td>23.5</td>
      <td>0.2</td>
      <td>55.4</td>
      <td>31.4</td>
      <td>28.4</td>
      <td>0.2</td>
      <td>48.8</td>
      <td>-25</td>
    </tr>
    <tr>
      <th>2</th>
      <td>12023</td>
      <td>401638581</td>
      <td>20232024</td>
      <td>2024</td>
      <td>SeasonType.POSTSEASON</td>
      <td>2024-03-20 18:40:00+00:00</td>
      <td>False</td>
      <td>True</td>
      <td>False</td>
      <td>TRNMNT</td>
      <td>...</td>
      <td>114.5</td>
      <td>39.1</td>
      <td>29.7</td>
      <td>0.2</td>
      <td>48.9</td>
      <td>32.6</td>
      <td>32.2</td>
      <td>0.2</td>
      <td>49.0</td>
      <td>-7</td>
    </tr>
    <tr>
      <th>3</th>
      <td>12022</td>
      <td>401638582</td>
      <td>20232024</td>
      <td>2024</td>
      <td>SeasonType.POSTSEASON</td>
      <td>2024-03-20 21:28:00+00:00</td>
      <td>False</td>
      <td>True</td>
      <td>False</td>
      <td>TRNMNT</td>
      <td>...</td>
      <td>102.7</td>
      <td>35.3</td>
      <td>27.0</td>
      <td>0.2</td>
      <td>55.3</td>
      <td>28.1</td>
      <td>29.1</td>
      <td>0.2</td>
      <td>49.3</td>
      <td>-7</td>
    </tr>
    <tr>
      <th>4</th>
      <td>12022</td>
      <td>401638582</td>
      <td>20232024</td>
      <td>2024</td>
      <td>SeasonType.POSTSEASON</td>
      <td>2024-03-20 21:28:00+00:00</td>
      <td>False</td>
      <td>True</td>
      <td>False</td>
      <td>TRNMNT</td>
      <td>...</td>
      <td>102.7</td>
      <td>35.3</td>
      <td>27.0</td>
      <td>0.2</td>
      <td>55.3</td>
      <td>28.1</td>
      <td>29.1</td>
      <td>0.2</td>
      <td>49.3</td>
      <td>-7</td>
    </tr>
  </tbody>
</table>
<p>5 rows &#xD7; 58 columns</p>
</div><!--kg-card-end: html--><hr><!--kg-card-begin: html--><script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-3674605305984905" crossorigin="anonymous"></script>
<ins class="adsbygoogle" style="display:block; text-align:center;" data-ad-layout="in-article" data-ad-format="fluid" data-ad-client="ca-pub-3674605305984905" data-ad-slot="7107763740"></ins>
<script>
     (adsbygoogle = window.adsbygoogle || []).push({});
</script><!--kg-card-end: html--><hr><h1 id="training-the-model">Training the Model</h1><p>The first step here is feature selection. Let&apos;s see what columns are currently included in the data frame.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
df.columns
</code></pre><!--kg-card-end: html--><pre><code>Index([&apos;id&apos;, &apos;sourceId&apos;, &apos;seasonLabel&apos;, &apos;season&apos;, &apos;seasonType&apos;, &apos;startDate&apos;,
       &apos;startTimeTbd&apos;, &apos;neutralSite&apos;, &apos;conferenceGame&apos;, &apos;gameType&apos;,
       &apos;tournament&apos;, &apos;gameNotes&apos;, &apos;status&apos;, &apos;attendance&apos;, &apos;homeTeamId&apos;,
       &apos;homeTeam&apos;, &apos;homeConferenceId&apos;, &apos;homeConference&apos;, &apos;homeSeed&apos;,
       &apos;homePoints&apos;, &apos;homePeriodPoints&apos;, &apos;homeWinner&apos;, &apos;awayTeamId&apos;,
       &apos;awayTeam&apos;, &apos;awayConferenceId&apos;, &apos;awayConference&apos;, &apos;awaySeed&apos;,
       &apos;awayPoints&apos;, &apos;awayPeriodPoints&apos;, &apos;awayWinner&apos;, &apos;excitement&apos;, &apos;venueId&apos;,
       &apos;venue&apos;, &apos;city&apos;, &apos;state&apos;, &apos;home_pace&apos;, &apos;home_o_rating&apos;, &apos;home_d_rating&apos;,
       &apos;home_free_throw_rate&apos;, &apos;home_offensive_rebound_rate&apos;,
       &apos;home_turnover_ratio&apos;, &apos;home_efg&apos;, &apos;home_free_throw_rate_allowed&apos;,
       &apos;home_offensive_rebound_rate_allowed&apos;, &apos;home_turnover_ratio_forced&apos;,
       &apos;home_efg_allowed&apos;, &apos;away_pace&apos;, &apos;away_o_rating&apos;, &apos;away_d_rating&apos;,
       &apos;away_free_throw_rate&apos;, &apos;away_offensive_rebound_rate&apos;,
       &apos;away_turnover_ratio&apos;, &apos;away_efg&apos;, &apos;away_free_throw_rate_allowed&apos;,
       &apos;away_offensive_rebound_rate_allowed&apos;, &apos;away_turnover_ratio_forced&apos;,
       &apos;away_efg_allowed&apos;, &apos;margin&apos;],
      dtype=&apos;object&apos;)</code></pre><p>We are going to pull out the columns we will be using, namely the feature for training and the output we will be training against (margin).</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
features = [
    &apos;home_o_rating&apos;,
    &apos;home_d_rating&apos;,
    &apos;home_pace&apos;,
    &apos;home_free_throw_rate&apos;,
    &apos;home_offensive_rebound_rate&apos;,
    &apos;home_turnover_ratio&apos;,
    &apos;home_efg&apos;,
    &apos;home_free_throw_rate_allowed&apos;,
    &apos;home_offensive_rebound_rate_allowed&apos;,
    &apos;home_turnover_ratio_forced&apos;,
    &apos;home_efg_allowed&apos;,
    &apos;away_o_rating&apos;,
    &apos;away_d_rating&apos;,
    &apos;away_pace&apos;,
    &apos;away_free_throw_rate&apos;,
    &apos;away_offensive_rebound_rate&apos;,
    &apos;away_turnover_ratio&apos;,
    &apos;away_efg&apos;,
    &apos;away_free_throw_rate_allowed&apos;,
    &apos;away_offensive_rebound_rate_allowed&apos;,
    &apos;away_turnover_ratio_forced&apos;,
    &apos;away_efg_allowed&apos;,
    &apos;homeSeed&apos;,
    &apos;awaySeed&apos;
]

outputs = [&apos;margin&apos;]

df[features + outputs]
</code></pre><!--kg-card-end: html--><!--kg-card-begin: html--><table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right">
      <th></th>
      <th>home_o_rating</th>
      <th>home_d_rating</th>
      <th>home_pace</th>
      <th>home_free_throw_rate</th>
      <th>home_offensive_rebound_rate</th>
      <th>home_turnover_ratio</th>
      <th>home_efg</th>
      <th>home_free_throw_rate_allowed</th>
      <th>home_offensive_rebound_rate_allowed</th>
      <th>home_turnover_ratio_forced</th>
      <th>...</th>
      <th>away_offensive_rebound_rate</th>
      <th>away_turnover_ratio</th>
      <th>away_efg</th>
      <th>away_free_throw_rate_allowed</th>
      <th>away_offensive_rebound_rate_allowed</th>
      <th>away_turnover_ratio_forced</th>
      <th>away_efg_allowed</th>
      <th>homeSeed</th>
      <th>awaySeed</th>
      <th>margin</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>107.8</td>
      <td>106.2</td>
      <td>67.4</td>
      <td>41.9</td>
      <td>31.0</td>
      <td>0.2</td>
      <td>52.4</td>
      <td>39.2</td>
      <td>33.5</td>
      <td>0.2</td>
      <td>...</td>
      <td>31.4</td>
      <td>0.2</td>
      <td>45.4</td>
      <td>29.1</td>
      <td>25.4</td>
      <td>0.2</td>
      <td>47.9</td>
      <td>16</td>
      <td>16</td>
      <td>-3</td>
    </tr>
    <tr>
      <th>1</th>
      <td>103.6</td>
      <td>96.8</td>
      <td>59.4</td>
      <td>25.1</td>
      <td>26.9</td>
      <td>0.1</td>
      <td>49.3</td>
      <td>25.7</td>
      <td>27.2</td>
      <td>0.2</td>
      <td>...</td>
      <td>23.5</td>
      <td>0.2</td>
      <td>55.4</td>
      <td>31.4</td>
      <td>28.4</td>
      <td>0.2</td>
      <td>48.8</td>
      <td>10</td>
      <td>10</td>
      <td>-25</td>
    </tr>
    <tr>
      <th>2</th>
      <td>111.7</td>
      <td>109.8</td>
      <td>65.2</td>
      <td>29.7</td>
      <td>22.2</td>
      <td>0.2</td>
      <td>54.5</td>
      <td>35.9</td>
      <td>26.5</td>
      <td>0.2</td>
      <td>...</td>
      <td>29.7</td>
      <td>0.2</td>
      <td>48.9</td>
      <td>32.6</td>
      <td>32.2</td>
      <td>0.2</td>
      <td>49.0</td>
      <td>16</td>
      <td>16</td>
      <td>-7</td>
    </tr>
    <tr>
      <th>3</th>
      <td>113.6</td>
      <td>101.3</td>
      <td>65.2</td>
      <td>36.8</td>
      <td>30.7</td>
      <td>0.2</td>
      <td>52.2</td>
      <td>31.9</td>
      <td>24.8</td>
      <td>0.2</td>
      <td>...</td>
      <td>27.0</td>
      <td>0.2</td>
      <td>55.3</td>
      <td>28.1</td>
      <td>29.1</td>
      <td>0.2</td>
      <td>49.3</td>
      <td>10</td>
      <td>10</td>
      <td>-7</td>
    </tr>
    <tr>
      <th>4</th>
      <td>113.6</td>
      <td>101.3</td>
      <td>65.2</td>
      <td>36.8</td>
      <td>30.7</td>
      <td>0.2</td>
      <td>52.2</td>
      <td>31.9</td>
      <td>24.8</td>
      <td>0.2</td>
      <td>...</td>
      <td>27.0</td>
      <td>0.2</td>
      <td>55.3</td>
      <td>28.1</td>
      <td>29.1</td>
      <td>0.2</td>
      <td>49.3</td>
      <td>10</td>
      <td>10</td>
      <td>-7</td>
    </tr>
    <tr>
      <th>...</th>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
    </tr>
    <tr>
      <th>681</th>
      <td>118.4</td>
      <td>96.6</td>
      <td>59.2</td>
      <td>43.4</td>
      <td>32.5</td>
      <td>0.2</td>
      <td>52.7</td>
      <td>32.6</td>
      <td>31.7</td>
      <td>0.2</td>
      <td>...</td>
      <td>28.5</td>
      <td>0.2</td>
      <td>51.4</td>
      <td>35.5</td>
      <td>36.4</td>
      <td>0.2</td>
      <td>43.9</td>
      <td>1</td>
      <td>7</td>
      <td>-10</td>
    </tr>
    <tr>
      <th>682</th>
      <td>118.4</td>
      <td>96.6</td>
      <td>59.2</td>
      <td>43.4</td>
      <td>32.5</td>
      <td>0.2</td>
      <td>52.7</td>
      <td>32.6</td>
      <td>31.7</td>
      <td>0.2</td>
      <td>...</td>
      <td>28.5</td>
      <td>0.2</td>
      <td>51.4</td>
      <td>35.5</td>
      <td>36.4</td>
      <td>0.2</td>
      <td>43.9</td>
      <td>1</td>
      <td>7</td>
      <td>-10</td>
    </tr>
    <tr>
      <th>683</th>
      <td>120.4</td>
      <td>105.2</td>
      <td>61.2</td>
      <td>44.1</td>
      <td>26.5</td>
      <td>0.1</td>
      <td>53.1</td>
      <td>25.9</td>
      <td>29.1</td>
      <td>0.2</td>
      <td>...</td>
      <td>35.6</td>
      <td>0.2</td>
      <td>49.7</td>
      <td>37.8</td>
      <td>36.1</td>
      <td>0.2</td>
      <td>45.0</td>
      <td>2</td>
      <td>8</td>
      <td>-1</td>
    </tr>
    <tr>
      <th>684</th>
      <td>115.2</td>
      <td>101.1</td>
      <td>61.7</td>
      <td>38.6</td>
      <td>28.5</td>
      <td>0.2</td>
      <td>51.4</td>
      <td>35.5</td>
      <td>36.4</td>
      <td>0.2</td>
      <td>...</td>
      <td>35.6</td>
      <td>0.2</td>
      <td>49.7</td>
      <td>37.8</td>
      <td>36.1</td>
      <td>0.2</td>
      <td>45.0</td>
      <td>7</td>
      <td>8</td>
      <td>6</td>
    </tr>
    <tr>
      <th>685</th>
      <td>115.2</td>
      <td>101.1</td>
      <td>61.7</td>
      <td>38.6</td>
      <td>28.5</td>
      <td>0.2</td>
      <td>51.4</td>
      <td>35.5</td>
      <td>36.4</td>
      <td>0.2</td>
      <td>...</td>
      <td>35.6</td>
      <td>0.2</td>
      <td>49.7</td>
      <td>37.8</td>
      <td>36.1</td>
      <td>0.2</td>
      <td>45.0</td>
      <td>7</td>
      <td>8</td>
      <td>6</td>
    </tr>
  </tbody>
</table><!--kg-card-end: html--><p>686 rows &#xD7; 25 columns</p><p>Again, you can feel free to mix that up. If you added or changed any of the statistics in the prior section, this is where you will need to incorporate them.</p><p>We will now split our data set into training data and testing data. Training data will be used in training the model. Testing data is pulled back to test out the model once it&apos;s ready to go. In this example, I am pulling 2024 tournament games as my test set. If you are running through this looking to make predictions on tourney games that are in the future, you can pull those games instead (assuming you pulled games and statistics for that season into the data set).</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
training = df.query(&quot;season != 2024&quot;).copy()
testing = df.query(&quot;season == 2024&quot;).copy()
</code></pre><!--kg-card-end: html--><p>We are going to further split out the training data into training and validation sets. Both of these sets will be used in training the model. The training set is what is actually fed into the model whereas the validation set is what the model uses in training to validate whether it is actually improving. This mechanism mitigates overfitting onto the training data.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
X_train, X_valid, y_train, y_valid = train_test_split(training[features], training[outputs], train_size=0.8, test_size=0.2, random_state=0)
</code></pre><!--kg-card-end: html--><p>Note that this splits the training features (X) out from the expected outputs (y). In the example above, we are randomly holding back 20% of the dataset to be used for validation.</p><p>We are ready to train! We will be using <code>XGBRegressor</code> to use our gradient boosting model for regression. If we were doing classification, we would use <code>XGBClassifier</code>.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
model = XGBRegressor(random_state=0)
model.fit(X_train, y_train)
</code></pre><!--kg-card-end: html--><!--kg-card-begin: html--><style>#sk-container-id-2 {
  /* Definition of color scheme common for light and dark mode */
  --sklearn-color-text: #000;
  --sklearn-color-text-muted: #666;
  --sklearn-color-line: gray;
  /* Definition of color scheme for unfitted estimators */
  --sklearn-color-unfitted-level-0: #fff5e6;
  --sklearn-color-unfitted-level-1: #f6e4d2;
  --sklearn-color-unfitted-level-2: #ffe0b3;
  --sklearn-color-unfitted-level-3: chocolate;
  /* Definition of color scheme for fitted estimators */
  --sklearn-color-fitted-level-0: #f0f8ff;
  --sklearn-color-fitted-level-1: #d4ebff;
  --sklearn-color-fitted-level-2: #b3dbfd;
  --sklearn-color-fitted-level-3: cornflowerblue;

  /* Specific color for light theme */
  --sklearn-color-text-on-default-background: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, black)));
  --sklearn-color-background: var(--sg-background-color, var(--theme-background, var(--jp-layout-color0, white)));
  --sklearn-color-border-box: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, black)));
  --sklearn-color-icon: #696969;

  @media (prefers-color-scheme: dark) {
    /* Redefinition of color scheme for dark theme */
    --sklearn-color-text-on-default-background: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, white)));
    --sklearn-color-background: var(--sg-background-color, var(--theme-background, var(--jp-layout-color0, #111)));
    --sklearn-color-border-box: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, white)));
    --sklearn-color-icon: #878787;
  }
}

#sk-container-id-2 {
  color: var(--sklearn-color-text);
}

#sk-container-id-2 pre {
  padding: 0;
}

#sk-container-id-2 input.sk-hidden--visually {
  border: 0;
  clip: rect(1px 1px 1px 1px);
  clip: rect(1px, 1px, 1px, 1px);
  height: 1px;
  margin: -1px;
  overflow: hidden;
  padding: 0;
  position: absolute;
  width: 1px;
}

#sk-container-id-2 div.sk-dashed-wrapped {
  border: 1px dashed var(--sklearn-color-line);
  margin: 0 0.4em 0.5em 0.4em;
  box-sizing: border-box;
  padding-bottom: 0.4em;
  background-color: var(--sklearn-color-background);
}

#sk-container-id-2 div.sk-container {
  /* jupyter's `normalize.less` sets `[hidden] { display: none; }`
     but bootstrap.min.css set `[hidden] { display: none !important; }`
     so we also need the `!important` here to be able to override the
     default hidden behavior on the sphinx rendered scikit-learn.org.
     See: https://github.com/scikit-learn/scikit-learn/issues/21755 */
  display: inline-block !important;
  position: relative;
}

#sk-container-id-2 div.sk-text-repr-fallback {
  display: none;
}

div.sk-parallel-item,
div.sk-serial,
div.sk-item {
  /* draw centered vertical line to link estimators */
  background-image: linear-gradient(var(--sklearn-color-text-on-default-background), var(--sklearn-color-text-on-default-background));
  background-size: 2px 100%;
  background-repeat: no-repeat;
  background-position: center center;
}

/* Parallel-specific style estimator block */

#sk-container-id-2 div.sk-parallel-item::after {
  content: "";
  width: 100%;
  border-bottom: 2px solid var(--sklearn-color-text-on-default-background);
  flex-grow: 1;
}

#sk-container-id-2 div.sk-parallel {
  display: flex;
  align-items: stretch;
  justify-content: center;
  background-color: var(--sklearn-color-background);
  position: relative;
}

#sk-container-id-2 div.sk-parallel-item {
  display: flex;
  flex-direction: column;
}

#sk-container-id-2 div.sk-parallel-item:first-child::after {
  align-self: flex-end;
  width: 50%;
}

#sk-container-id-2 div.sk-parallel-item:last-child::after {
  align-self: flex-start;
  width: 50%;
}

#sk-container-id-2 div.sk-parallel-item:only-child::after {
  width: 0;
}

/* Serial-specific style estimator block */

#sk-container-id-2 div.sk-serial {
  display: flex;
  flex-direction: column;
  align-items: center;
  background-color: var(--sklearn-color-background);
  padding-right: 1em;
  padding-left: 1em;
}


/* Toggleable style: style used for estimator/Pipeline/ColumnTransformer box that is
clickable and can be expanded/collapsed.
- Pipeline and ColumnTransformer use this feature and define the default style
- Estimators will overwrite some part of the style using the `sk-estimator` class
*/

/* Pipeline and ColumnTransformer style (default) */

#sk-container-id-2 div.sk-toggleable {
  /* Default theme specific background. It is overwritten whether we have a
  specific estimator or a Pipeline/ColumnTransformer */
  background-color: var(--sklearn-color-background);
}

/* Toggleable label */
#sk-container-id-2 label.sk-toggleable__label {
  cursor: pointer;
  display: flex;
  width: 100%;
  margin-bottom: 0;
  padding: 0.5em;
  box-sizing: border-box;
  text-align: center;
  align-items: start;
  justify-content: space-between;
  gap: 0.5em;
}

#sk-container-id-2 label.sk-toggleable__label .caption {
  font-size: 0.6rem;
  font-weight: lighter;
  color: var(--sklearn-color-text-muted);
}

#sk-container-id-2 label.sk-toggleable__label-arrow:before {
  /* Arrow on the left of the label */
  content: "▸";
  float: left;
  margin-right: 0.25em;
  color: var(--sklearn-color-icon);
}

#sk-container-id-2 label.sk-toggleable__label-arrow:hover:before {
  color: var(--sklearn-color-text);
}

/* Toggleable content - dropdown */

#sk-container-id-2 div.sk-toggleable__content {
  max-height: 0;
  max-width: 0;
  overflow: hidden;
  text-align: left;
  /* unfitted */
  background-color: var(--sklearn-color-unfitted-level-0);
}

#sk-container-id-2 div.sk-toggleable__content.fitted {
  /* fitted */
  background-color: var(--sklearn-color-fitted-level-0);
}

#sk-container-id-2 div.sk-toggleable__content pre {
  margin: 0.2em;
  border-radius: 0.25em;
  color: var(--sklearn-color-text);
  /* unfitted */
  background-color: var(--sklearn-color-unfitted-level-0);
}

#sk-container-id-2 div.sk-toggleable__content.fitted pre {
  /* unfitted */
  background-color: var(--sklearn-color-fitted-level-0);
}

#sk-container-id-2 input.sk-toggleable__control:checked~div.sk-toggleable__content {
  /* Expand drop-down */
  max-height: 200px;
  max-width: 100%;
  overflow: auto;
}

#sk-container-id-2 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {
  content: "▾";
}

/* Pipeline/ColumnTransformer-specific style */

#sk-container-id-2 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {
  color: var(--sklearn-color-text);
  background-color: var(--sklearn-color-unfitted-level-2);
}

#sk-container-id-2 div.sk-label.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {
  background-color: var(--sklearn-color-fitted-level-2);
}

/* Estimator-specific style */

/* Colorize estimator box */
#sk-container-id-2 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {
  /* unfitted */
  background-color: var(--sklearn-color-unfitted-level-2);
}

#sk-container-id-2 div.sk-estimator.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {
  /* fitted */
  background-color: var(--sklearn-color-fitted-level-2);
}

#sk-container-id-2 div.sk-label label.sk-toggleable__label,
#sk-container-id-2 div.sk-label label {
  /* The background is the default theme color */
  color: var(--sklearn-color-text-on-default-background);
}

/* On hover, darken the color of the background */
#sk-container-id-2 div.sk-label:hover label.sk-toggleable__label {
  color: var(--sklearn-color-text);
  background-color: var(--sklearn-color-unfitted-level-2);
}

/* Label box, darken color on hover, fitted */
#sk-container-id-2 div.sk-label.fitted:hover label.sk-toggleable__label.fitted {
  color: var(--sklearn-color-text);
  background-color: var(--sklearn-color-fitted-level-2);
}

/* Estimator label */

#sk-container-id-2 div.sk-label label {
  font-family: monospace;
  font-weight: bold;
  display: inline-block;
  line-height: 1.2em;
}

#sk-container-id-2 div.sk-label-container {
  text-align: center;
}

/* Estimator-specific */
#sk-container-id-2 div.sk-estimator {
  font-family: monospace;
  border: 1px dotted var(--sklearn-color-border-box);
  border-radius: 0.25em;
  box-sizing: border-box;
  margin-bottom: 0.5em;
  /* unfitted */
  background-color: var(--sklearn-color-unfitted-level-0);
}

#sk-container-id-2 div.sk-estimator.fitted {
  /* fitted */
  background-color: var(--sklearn-color-fitted-level-0);
}

/* on hover */
#sk-container-id-2 div.sk-estimator:hover {
  /* unfitted */
  background-color: var(--sklearn-color-unfitted-level-2);
}

#sk-container-id-2 div.sk-estimator.fitted:hover {
  /* fitted */
  background-color: var(--sklearn-color-fitted-level-2);
}

/* Specification for estimator info (e.g. "i" and "?") */

/* Common style for "i" and "?" */

.sk-estimator-doc-link,
a:link.sk-estimator-doc-link,
a:visited.sk-estimator-doc-link {
  float: right;
  font-size: smaller;
  line-height: 1em;
  font-family: monospace;
  background-color: var(--sklearn-color-background);
  border-radius: 1em;
  height: 1em;
  width: 1em;
  text-decoration: none !important;
  margin-left: 0.5em;
  text-align: center;
  /* unfitted */
  border: var(--sklearn-color-unfitted-level-1) 1pt solid;
  color: var(--sklearn-color-unfitted-level-1);
}

.sk-estimator-doc-link.fitted,
a:link.sk-estimator-doc-link.fitted,
a:visited.sk-estimator-doc-link.fitted {
  /* fitted */
  border: var(--sklearn-color-fitted-level-1) 1pt solid;
  color: var(--sklearn-color-fitted-level-1);
}

/* On hover */
div.sk-estimator:hover .sk-estimator-doc-link:hover,
.sk-estimator-doc-link:hover,
div.sk-label-container:hover .sk-estimator-doc-link:hover,
.sk-estimator-doc-link:hover {
  /* unfitted */
  background-color: var(--sklearn-color-unfitted-level-3);
  color: var(--sklearn-color-background);
  text-decoration: none;
}

div.sk-estimator.fitted:hover .sk-estimator-doc-link.fitted:hover,
.sk-estimator-doc-link.fitted:hover,
div.sk-label-container:hover .sk-estimator-doc-link.fitted:hover,
.sk-estimator-doc-link.fitted:hover {
  /* fitted */
  background-color: var(--sklearn-color-fitted-level-3);
  color: var(--sklearn-color-background);
  text-decoration: none;
}

/* Span, style for the box shown on hovering the info icon */
.sk-estimator-doc-link span {
  display: none;
  z-index: 9999;
  position: relative;
  font-weight: normal;
  right: .2ex;
  padding: .5ex;
  margin: .5ex;
  width: min-content;
  min-width: 20ex;
  max-width: 50ex;
  color: var(--sklearn-color-text);
  box-shadow: 2pt 2pt 4pt #999;
  /* unfitted */
  background: var(--sklearn-color-unfitted-level-0);
  border: .5pt solid var(--sklearn-color-unfitted-level-3);
}

.sk-estimator-doc-link.fitted span {
  /* fitted */
  background: var(--sklearn-color-fitted-level-0);
  border: var(--sklearn-color-fitted-level-3);
}

.sk-estimator-doc-link:hover span {
  display: block;
}

/* "?"-specific style due to the `<a>` HTML tag */

#sk-container-id-2 a.estimator_doc_link {
  float: right;
  font-size: 1rem;
  line-height: 1em;
  font-family: monospace;
  background-color: var(--sklearn-color-background);
  border-radius: 1rem;
  height: 1rem;
  width: 1rem;
  text-decoration: none;
  /* unfitted */
  color: var(--sklearn-color-unfitted-level-1);
  border: var(--sklearn-color-unfitted-level-1) 1pt solid;
}

#sk-container-id-2 a.estimator_doc_link.fitted {
  /* fitted */
  border: var(--sklearn-color-fitted-level-1) 1pt solid;
  color: var(--sklearn-color-fitted-level-1);
}

/* On hover */
#sk-container-id-2 a.estimator_doc_link:hover {
  /* unfitted */
  background-color: var(--sklearn-color-unfitted-level-3);
  color: var(--sklearn-color-background);
  text-decoration: none;
}

#sk-container-id-2 a.estimator_doc_link.fitted:hover {
  /* fitted */
  background-color: var(--sklearn-color-fitted-level-3);
}
</style><div id="sk-container-id-2" class="sk-top-container"><div class="sk-text-repr-fallback"><pre>XGBRegressor(base_score=None, booster=None, callbacks=None,
             colsample_bylevel=None, colsample_bynode=None,
             colsample_bytree=None, device=None, early_stopping_rounds=None,
             enable_categorical=False, eval_metric=None, feature_types=None,
             gamma=None, grow_policy=None, importance_type=None,
             interaction_constraints=None, learning_rate=None, max_bin=None,
             max_cat_threshold=None, max_cat_to_onehot=None,
             max_delta_step=None, max_depth=None, max_leaves=None,
             min_child_weight=None, missing=nan, monotone_constraints=None,
             multi_strategy=None, n_estimators=None, n_jobs=None,
             num_parallel_tree=None, random_state=0, ...)</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br>On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class="sk-container" hidden><div class="sk-item"><div class="sk-estimator fitted sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-2" type="checkbox" checked><label for="sk-estimator-id-2" class="sk-toggleable__label fitted sk-toggleable__label-arrow"><div><div>XGBRegressor</div></div><div><span class="sk-estimator-doc-link fitted">i<span>Fitted</span></span></div></label><div class="sk-toggleable__content fitted"><pre>XGBRegressor(base_score=None, booster=None, callbacks=None,
             colsample_bylevel=None, colsample_bynode=None,
             colsample_bytree=None, device=None, early_stopping_rounds=None,
             enable_categorical=False, eval_metric=None, feature_types=None,
             gamma=None, grow_policy=None, importance_type=None,
             interaction_constraints=None, learning_rate=None, max_bin=None,
             max_cat_threshold=None, max_cat_to_onehot=None,
             max_delta_step=None, max_depth=None, max_leaves=None,
             min_child_weight=None, missing=nan, monotone_constraints=None,
             multi_strategy=None, n_estimators=None, n_jobs=None,
             num_parallel_tree=None, random_state=0, ...)</pre></div> </div></div></div></div><!--kg-card-end: html--><p>And just like that, we have a trained model! We can make predictions against our validation set.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
predictions = model.predict(X_valid)
predictions
</code></pre><!--kg-card-end: html--><pre><code>array([-1.87790477e+00,  7.16752386e+00,  1.32060270e+01,  6.78795004e+00,
        1.44662819e+01, -2.85689831e+00, -8.69423985e-01,  8.75045967e+00,
        3.85790849e+00, -6.43919373e+00, -8.83276880e-01,  6.97011662e+00,
        4.38355398e+00,  8.06833267e+00, -8.77752018e+00,  5.22899723e+00,
        2.80364990e+00,  3.31810045e+00, -9.09639931e+00, -1.38665593e+00,
        4.66550255e+00,  3.16841202e+01,  9.18671894e+00, -2.34628081e+00,
        1.58264847e+01,  9.93082142e+00,  9.44772053e+00,  1.88728504e+01,
        2.87765160e+01,  3.31487012e+00,  1.30118427e+01, -1.30986392e-01,
        5.33917189e+00,  8.50678921e+00, -3.34483713e-01,  2.57094145e+00,
        1.66184235e+01,  5.99199915e+00, -2.74236417e+00,  1.33841276e+00,
       -5.50944662e+00, -8.56299973e+00,  9.36406422e+00,  1.27445345e+01,
       -5.79891968e+00,  9.32999039e+00,  4.99850559e+00,  1.41290035e+01,
        1.27072744e+01,  5.49775696e+00,  2.92133301e-01,  2.85389748e+01,
       -2.77683735e+00,  1.41666784e+01,  1.65023022e+01,  6.03557158e+00,
        2.24876385e+01, -5.69163513e+00,  5.78824818e-01,  2.18679352e+01,
        1.81881466e+01,  6.27820158e+00, -3.48073578e+00, -2.05786265e-02,
        2.38070393e+01,  7.80937290e+00,  2.68855405e+00,  1.00340958e+01,
        1.03051748e+01,  6.70673037e+00, -4.66818810e+00,  1.42929211e+01,
        5.93736887e+00,  2.18488560e+01, -3.96203065e+00, -6.01904249e+00,
        1.15123062e+01,  1.06525719e+00, -5.60221529e+00, -2.91650534e+00,
        8.13025475e+00, -2.16232657e+00, -7.38539994e-02, -7.47696776e-03,
        6.57202673e+00,  3.21248150e+00,  3.89195323e-01,  2.67519027e-01,
       -1.49262440e+00, -5.93076229e+00,  1.55619888e+01, -9.42352295e-01,
        6.86150503e+00,  2.09990826e+01, -2.62024927e+00, -3.10824728e+00,
        1.55272758e+00,  6.41326475e+00,  2.17659950e+00,  2.06855249e+00,
        1.48680840e+01,  3.38636231e+00,  1.16376562e+01, -1.75216424e+00,
        1.12170439e+01,  1.02640734e+01,  1.19243898e+01,  6.55053318e-01,
        1.79168587e+01,  1.12861748e+01,  1.15750656e+01, -1.21279058e+01,
       -6.30171585e+00,  2.97097254e+00,  5.94197321e+00, -1.26525140e+00,
        1.78847879e-01,  1.99955502e+01,  1.16229486e+01,  9.16914749e+00,
        1.56323729e+01,  2.16536427e+01,  4.01582432e+00,  2.84138560e-01],
      dtype=float32)</code></pre><p>If your validation set contains games that have already been played, we can use this to calculate the mean absolute error (or any other metric) of our model.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
mae = mean_absolute_error(predictions, y_valid)
mae
</code></pre><!--kg-card-end: html--><p>7.965800762176514</p><p>I got a MAE of ~7.96. I&apos;ll be honest, I have no idea how good that is since I&apos;m a bit newer to basketball modeling. Based on my reading, a MAE of around 6.5 is pretty good. So, this is perhaps not great but a good starting point. My goal is not to have the best model but to walk you through this. It will be up to you to make changes and get and get better predictions.</p><p>What might fine tuning look like? For one, we can update the parameters on the model. The below code snippet runs through the same process as above bu explicitly sets the number of estimators, the learning rate, and the number of jobs for the model.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
model = XGBRegressor(n_estimators=100, learning_rate=0.05, n_jobs=4)
model.fit(X_train, y_train)
predictions = model.predict(X_valid)
mae = mean_absolute_error(predictions, y_valid)
mae
</code></pre><!--kg-card-end: html--><p>7.976924419403076</p><p>As you can see, my MAE is not any better, but you can play around with those parameters and see if you get anything different. The best way to improve this will likely come from tweaking the input features and adding more stats.</p><p>Let&apos;s go back to our testing set, generate predictions, and compare them to actual results from the 2024 NCAA Tournament.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
predictions = model.predict(testing[features])
testing[&apos;prediction&apos;] = predictions
testing[[&apos;homeSeed&apos;, &apos;homeTeam&apos;, &apos;awaySeed&apos;, &apos;awayTeam&apos;, &apos;margin&apos;, &apos;prediction&apos;]]
</code></pre><!--kg-card-end: html--><!--kg-card-begin: html--><table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right">
      <th></th>
      <th>homeSeed</th>
      <th>homeTeam</th>
      <th>awaySeed</th>
      <th>awayTeam</th>
      <th>margin</th>
      <th>prediction</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>16</td>
      <td>Howard</td>
      <td>16</td>
      <td>Wagner</td>
      <td>-3</td>
      <td>4.429741</td>
    </tr>
    <tr>
      <th>1</th>
      <td>10</td>
      <td>Virginia</td>
      <td>10</td>
      <td>Colorado State</td>
      <td>-25</td>
      <td>0.494260</td>
    </tr>
    <tr>
      <th>2</th>
      <td>16</td>
      <td>Montana State</td>
      <td>16</td>
      <td>Grambling</td>
      <td>-7</td>
      <td>-0.163861</td>
    </tr>
    <tr>
      <th>3</th>
      <td>10</td>
      <td>Boise State</td>
      <td>10</td>
      <td>Colorado</td>
      <td>-7</td>
      <td>0.399193</td>
    </tr>
    <tr>
      <th>4</th>
      <td>10</td>
      <td>Boise State</td>
      <td>10</td>
      <td>Colorado</td>
      <td>-7</td>
      <td>0.399193</td>
    </tr>
    <tr>
      <th>...</th>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
    </tr>
    <tr>
      <th>65</th>
      <td>1</td>
      <td>Purdue</td>
      <td>2</td>
      <td>Tennessee</td>
      <td>6</td>
      <td>-4.878470</td>
    </tr>
    <tr>
      <th>66</th>
      <td>4</td>
      <td>Duke</td>
      <td>11</td>
      <td>NC State</td>
      <td>-12</td>
      <td>0.975319</td>
    </tr>
    <tr>
      <th>67</th>
      <td>1</td>
      <td>Purdue</td>
      <td>11</td>
      <td>NC State</td>
      <td>13</td>
      <td>12.650157</td>
    </tr>
    <tr>
      <th>68</th>
      <td>1</td>
      <td>UConn</td>
      <td>4</td>
      <td>Alabama</td>
      <td>14</td>
      <td>6.204337</td>
    </tr>
    <tr>
      <th>69</th>
      <td>1</td>
      <td>UConn</td>
      <td>1</td>
      <td>Purdue</td>
      <td>15</td>
      <td>0.927093</td>
    </tr>
  </tbody>
</table><!--kg-card-end: html--><p>70 rows &#xD7; 6 columns</p><p>Let&apos;s calculate the actual percentage of games our model correctly picked straight up.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
testing.query(&quot;(margin &lt; 0 and prediction &lt; 0) or (margin &gt; 0 and prediction &gt; 0)&quot;).shape[0] / testing.shape[0]
</code></pre><!--kg-card-end: html--><p>0.6428571428571429</p><p>My model correctly predicted all game in the 2024 Tournament at a 64.3% clip. Let&apos;s look at just the first round. I&apos;m going use the <code>gameNotes</code> property (which contains round information) to filter down to first round games.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
testing[testing[&apos;gameNotes&apos;].str.contains(&apos;1st&apos;)].query(&quot;(margin &lt; 0 and prediction &lt; 0) or (margin &gt; 0 and prediction &gt; 0)&quot;).shape[0] / testing[testing[&apos;gameNotes&apos;].str.contains(&apos;1st&apos;)].shape[0]
</code></pre><!--kg-card-end: html--><p>0.696969696969697</p><p>For the first round, I&apos;m at a slightly better 69.696969% clip (nice).</p><p>At this point, we should save our model so that we can load it up and use it at a later time.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
model.save_model(&apos;xgboostmodel&apos;)
</code></pre><!--kg-card-end: html--><p>This exports the model into a file. Replace <code>xgboostmodel</code> above with a filename of your choosing, especially if you want to train and save multiple models. If we want to use our model later on to make predictions, we can load it up as follows.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
model = XGBRegressor()
model.load_model(&apos;xgboostmodel&apos;)
</code></pre><!--kg-card-end: html--><p>Let&apos;s say I wanted to predict a hypothetical matchup that hasn&apos;t yet occurred and isn&apos;t even scheduled. This would be useful in, for example, filling out a bracket. Here is an example of how I might do that with a reusable method.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
stats = stats_api.get_team_season_stats(season=2025, season_type=&apos;regular&apos;)
    
def predict_game(model, stats, projected_home_seed, home_team, projected_away_seed, away_team):
    home_stats = [stat for stat in stats if stat.team == home_team][0]
    away_stats = [stat for stat in stats if stat.team == away_team][0]
    record = {
        &apos;home_o_rating&apos;: home_stats.team_stats.rating,
        &apos;home_d_rating&apos;: home_stats.opponent_stats.rating,
        &apos;home_pace&apos;: home_stats.pace,
        &apos;home_free_throw_rate&apos;: home_stats.team_stats.four_factors.free_throw_rate,
        &apos;home_offensive_rebound_rate&apos;: home_stats.team_stats.four_factors.offensive_rebound_pct,
        &apos;home_turnover_ratio&apos;: home_stats.team_stats.four_factors.turnover_ratio,
        &apos;home_efg&apos;: home_stats.team_stats.four_factors.effective_field_goal_pct,
        &apos;home_free_throw_rate_allowed&apos;: home_stats.opponent_stats.four_factors.free_throw_rate,
        &apos;home_offensive_rebound_rate_allowed&apos;: home_stats.opponent_stats.four_factors.offensive_rebound_pct,
        &apos;home_turnover_ratio_forced&apos;: home_stats.opponent_stats.four_factors.turnover_ratio,
        &apos;home_efg_allowed&apos;: home_stats.opponent_stats.four_factors.effective_field_goal_pct,
        &apos;away_o_rating&apos;: away_stats.team_stats.rating,
        &apos;away_d_rating&apos;: away_stats.opponent_stats.rating,
        &apos;away_pace&apos;: away_stats.pace,
        &apos;away_free_throw_rate&apos;: away_stats.team_stats.four_factors.free_throw_rate,
        &apos;away_offensive_rebound_rate&apos;: away_stats.team_stats.four_factors.offensive_rebound_pct,
        &apos;away_turnover_ratio&apos;: away_stats.team_stats.four_factors.turnover_ratio,
        &apos;away_efg&apos;: away_stats.team_stats.four_factors.effective_field_goal_pct,
        &apos;away_free_throw_rate_allowed&apos;: away_stats.opponent_stats.four_factors.free_throw_rate,
        &apos;away_offensive_rebound_rate_allowed&apos;: away_stats.opponent_stats.four_factors.offensive_rebound_pct,
        &apos;away_turnover_ratio_forced&apos;: away_stats.opponent_stats.four_factors.turnover_ratio,
        &apos;away_efg_allowed&apos;: away_stats.opponent_stats.four_factors.effective_field_goal_pct,
        &apos;homeSeed&apos;: projected_home_seed,
        &apos;awaySeed&apos;: projected_away_seed
    }
    return model.predict(pd.DataFrame([record]))[0]
    
predict_game(model, stats, 5, &apos;Michigan&apos;, 11, &apos;Dayton&apos;)
</code></pre><!--kg-card-end: html--><p>np.float32(6.149086)</p><p>In the above example, I loaded up data from the current season, created a method that constructs a data frame record using the required features, and then called that method to get a prediction, passing in a model, stats collection, and team projected seeds and names. This model predicts that Michigan as a 5 seed would beat Dayton as an 11 seed by 6.1 points. Voila!</p><hr><!--kg-card-begin: html--><script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-3674605305984905" crossorigin="anonymous"></script>
<ins class="adsbygoogle" style="display:block; text-align:center;" data-ad-layout="in-article" data-ad-format="fluid" data-ad-client="ca-pub-3674605305984905" data-ad-slot="7107763740"></ins>
<script>
     (adsbygoogle = window.adsbygoogle || []).push({});
</script><!--kg-card-end: html--><hr><p>And this is where I leave you. &#xA0;As mentioned, there are many improvements that can be made to get this thing ready from prime time. There were many features returned by the Stats API that we aren&apos;t even using. And none of our stats are opponent-adjusted. And you aren&apos;t limited to the Stats API, either. Tryi incorporating other endpoints or even other data sources.</p><p>As always, let me know what you think on Twitter, Bluesky, Discord, etc. And good luck with your brackets!</p>]]></content:encoded></item><item><title><![CDATA[Talking Tech: Generating Shot Charts using the Basketball API]]></title><description><![CDATA[We are going to be plotting team shot charts on top of a standard NCAA men's court using Python and the CollegeBasketballData.com API along with a few common Python packages.]]></description><link>https://blog.collegefootballdata.com/talking-tech-generating-shot-charts-using-the-basketball-api/</link><guid isPermaLink="false">67b22c442a659a00015cb034</guid><category><![CDATA[Talking Tech]]></category><category><![CDATA[Programming]]></category><category><![CDATA[Basketball]]></category><dc:creator><![CDATA[Bill Radjewski]]></dc:creator><pubDate>Wed, 05 Mar 2025 18:00:39 GMT</pubDate><media:content url="https://images.unsplash.com/photo-1546519638-68e109498ffc?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wxMTc3M3wwfDF8c2VhcmNofDF8fGJhc2tldGJhbGx8ZW58MHx8fHwxNzM5Njc1NzI0fDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=2000" medium="image"/><content:encoded><![CDATA[<img src="https://images.unsplash.com/photo-1546519638-68e109498ffc?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wxMTc3M3wwfDF8c2VhcmNofDF8fGJhc2tldGJhbGx8ZW58MHx8fHwxNzM5Njc1NzI0fDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=2000" alt="Talking Tech: Generating Shot Charts using the Basketball API"><p>Welcome to the first ever basketball post on this here blog! As announced a few weeks back, CollegeBasketballData.com is now live. I&apos;ve often been asked about providing service for college basketball and have always been hesitant. For one, the sheer volume of data is multiple times greater than for football due to nearly triple the number of teams and triple the number of games per team. I&apos;ve also been a big fan both of <a href="https://barttorvik.com/?ref=blog.collegefootballdata.com">Bart Torvik</a> and <a href="https://kenpom.com/?ref=blog.collegefootballdata.com">Ken Pomeroy</a> and wasn&apos;t sure there was much of need for a CFBD-like service for CBB with the stats and analytics those guys provide.</p><p>That all said, I have been asked consistently over the years from various users and the CFBD site and API refreshes have made me energized to give CBB a go. I&apos;m excited to provide this service and if I&apos;ve been a part of your CFB analytics journey, I hope I can do the same for CBB. </p><p>Now let&apos;s dive into some charts!</p><hr><!--kg-card-begin: html--><script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-3674605305984905" crossorigin="anonymous"></script>
<ins class="adsbygoogle" style="display:block; text-align:center;" data-ad-layout="in-article" data-ad-format="fluid" data-ad-client="ca-pub-3674605305984905" data-ad-slot="7107763740"></ins>
<script>
     (adsbygoogle = window.adsbygoogle || []).push({});
</script><!--kg-card-end: html--><hr><!--kg-card-begin: markdown--><h1 id="plotting-the-court">Plotting the Court</h1>
<!--kg-card-end: markdown--><p>We are going to be plotting team shot charts on top of a standard NCAA men&apos;s court using Python and the CollegeBasketballData.com API along with a few common Python packages. When all is said and done, we will have something that looks like this.</p><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2025/03/image-28.png" class="kg-image" alt="Talking Tech: Generating Shot Charts using the Basketball API" loading="lazy" width="1057" height="1056" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2025/03/image-28.png 600w, https://blog.collegefootballdata.com/content/images/size/w1000/2025/03/image-28.png 1000w, https://blog.collegefootballdata.com/content/images/2025/03/image-28.png 1057w" sizes="(min-width: 720px) 720px"></figure><p>Before we do anything, we need to make sure we have all dependencies installed. We will need the <a href="https://github.com/CFBD/cbbd-python?ref=blog.collegefootballdata.com">CBBD Python package</a> and a few others. Run the following code in terminal.</p><pre><code class="language-bash">pip install cbbd pandas numpy matplotlib seaborn</code></pre><p>Now we need to focus on plotting a basketball court. We will be using <code>matplotlib</code> to achieve this. Go ahead and run the following block to import all of dependencies we just installed.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
import cbbd
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
import matplotlib as mpl
from matplotlib.patches import Circle, Rectangle, Arc
from matplotlib.offsetbox import OffsetImage, AnnotationBbox
import seaborn as sns

plt.style.use(&apos;seaborn-v0_8-dark-palette&apos;)
</code></pre><!--kg-card-end: html--><p>As we did into plotting the court, I first need to give a huge shout out to <a href="https://github.com/RobMulla?ref=blog.collegefootballdata.com">Rob Mulla</a>, who wrote a series of <a href="https://www.kaggle.com/code/robikscube/ncaa-basketball-court-plot-helper-functions?ref=blog.collegefootballdata.com">helper functions for plotting NCAA courts on Kaggle</a>. His Kaggle article goes more in-depth and even includes a plot for a full size court. We&apos;ll just be using a half court and copy/pasting a function from that article.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
def create_ncaa_half_court(ax=None, three_line=&apos;mens&apos;, court_color=&apos;#dfbb85&apos;,
                           lw=3, lines_color=&apos;black&apos;, lines_alpha=0.5,
                           paint_fill=&apos;blue&apos;, paint_alpha=0.4,
                          inner_arc=False):
    &quot;&quot;&quot;
    Version 2020.2.19

    Creates NCAA Basketball Half Court
    Dimensions are in feet (Court is 97x50 ft)
    Created by: Rob Mulla / https://github.com/RobMulla

    * Note that this function uses &quot;feet&quot; as the unit of measure.
    * NCAA Data is provided on a x range: 0, 100 and y-range 0 to 100
    * To plot X/Y positions first convert to feet like this:
    ```
    Events[&apos;X_&apos;] = (Events[&apos;X&apos;] * (94/100))
    Events[&apos;Y_&apos;] = (Events[&apos;Y&apos;] * (50/100))
    ```
    ax: matplotlib axes if None gets current axes using `plt.gca`
    
    three_line: &apos;mens&apos;, &apos;womens&apos; or &apos;both&apos; defines 3 point line plotted
    court_color : (hex) Color of the court
    lw : line width
    lines_color : Color of the lines
    lines_alpha : transparency of lines
    paint_fill : Color inside the paint
    paint_alpha : transparency of the &quot;paint&quot;
    inner_arc : paint the dotted inner arc
    &quot;&quot;&quot;
    if ax is None:
        ax = plt.gca()

    # Create Pathes for Court Lines
    center_circle = Circle((50/2, 94/2), 6,
                           linewidth=lw, color=lines_color, lw=lw,
                           fill=False, alpha=lines_alpha)
    hoop = Circle((50/2, 5.25), 1.5 / 2,
                       linewidth=lw, color=lines_color, lw=lw,
                       fill=False, alpha=lines_alpha)

    # Paint - 18 Feet 10 inches which converts to 18.833333 feet - gross!
    paint = Rectangle(((50/2)-6, 0), 12, 18.833333,
                           fill=paint_fill, alpha=paint_alpha,
                           lw=lw, edgecolor=None)
    
    paint_boarder = Rectangle(((50/2)-6, 0), 12, 18.833333,
                           fill=False, alpha=lines_alpha,
                           lw=lw, edgecolor=lines_color)
    
    arc = Arc((50/2, 18.833333), 12, 12, theta1=-
                   0, theta2=180, color=lines_color, lw=lw,
                   alpha=lines_alpha)
    
    block1 = Rectangle(((50/2)-6-0.666, 7), 0.666, 1, 
                           fill=True, alpha=lines_alpha,
                           lw=0, edgecolor=lines_color,
                           facecolor=lines_color)
    block2 = Rectangle(((50/2)+6, 7), 0.666, 1, 
                           fill=True, alpha=lines_alpha,
                           lw=0, edgecolor=lines_color,
                           facecolor=lines_color)
    ax.add_patch(block1)
    ax.add_patch(block2)
    
    l1 = Rectangle(((50/2)-6-0.666, 11), 0.666, 0.166,
                           fill=True, alpha=lines_alpha,
                           lw=0, edgecolor=lines_color,
                           facecolor=lines_color)
    l2 = Rectangle(((50/2)-6-0.666, 14), 0.666, 0.166,
                           fill=True, alpha=lines_alpha,
                           lw=0, edgecolor=lines_color,
                           facecolor=lines_color)
    l3 = Rectangle(((50/2)-6-0.666, 17), 0.666, 0.166,
                           fill=True, alpha=lines_alpha,
                           lw=0, edgecolor=lines_color,
                           facecolor=lines_color)
    ax.add_patch(l1)
    ax.add_patch(l2)
    ax.add_patch(l3)
    l4 = Rectangle(((50/2)+6, 11), 0.666, 0.166,
                           fill=True, alpha=lines_alpha,
                           lw=0, edgecolor=lines_color,
                           facecolor=lines_color)
    l5 = Rectangle(((50/2)+6, 14), 0.666, 0.166,
                           fill=True, alpha=lines_alpha,
                           lw=0, edgecolor=lines_color,
                           facecolor=lines_color)
    l6 = Rectangle(((50/2)+6, 17), 0.666, 0.166,
                           fill=True, alpha=lines_alpha,
                           lw=0, edgecolor=lines_color,
                           facecolor=lines_color)
    ax.add_patch(l4)
    ax.add_patch(l5)
    ax.add_patch(l6)
    
    # 3 Point Line
    if (three_line == &apos;mens&apos;) | (three_line == &apos;both&apos;):
        # 22&apos; 1.75&quot; distance to center of hoop
        three_pt = Arc((50/2, 6.25), 44.291, 44.291, theta1=12,
                            theta2=168, color=lines_color, lw=lw,
                            alpha=lines_alpha)

        # 4.25 feet max to sideline for mens
        ax.plot((3.34, 3.34), (0, 11.20),
                color=lines_color, lw=lw, alpha=lines_alpha)
        ax.plot((50-3.34, 50-3.34), (0, 11.20),
                color=lines_color, lw=lw, alpha=lines_alpha)
        ax.add_patch(three_pt)

    if (three_line == &apos;womens&apos;) | (three_line == &apos;both&apos;):
        # womens 3
        three_pt_w = Arc((50/2, 6.25), 20.75 * 2, 20.75 * 2, theta1=5,
                              theta2=175, color=lines_color, lw=lw, alpha=lines_alpha)
        # 4.25 inches max to sideline for mens
        ax.plot( (4.25, 4.25), (0, 8), color=lines_color,
                lw=lw, alpha=lines_alpha)
        ax.plot((50-4.25, 50-4.25), (0, 8.1),
                color=lines_color, lw=lw, alpha=lines_alpha)

        ax.add_patch(three_pt_w)

    # Add Patches
    ax.add_patch(paint)
    ax.add_patch(paint_boarder)
    ax.add_patch(center_circle)
    ax.add_patch(hoop)
    ax.add_patch(arc)
    
    if inner_arc:
        inner_arc = Arc((50/2, 18.833333), 12, 12, theta1=180,
                             theta2=0, color=lines_color, lw=lw,
                       alpha=lines_alpha, ls=&apos;--&apos;)
        ax.add_patch(inner_arc)

    # Restricted Area Marker
    restricted_area = Arc((50/2, 6.25), 8, 8, theta1=0,
                        theta2=180, color=lines_color, lw=lw,
                        alpha=lines_alpha)
    ax.add_patch(restricted_area)
    
    # Backboard
    ax.plot(((50/2) - 3, (50/2) + 3), (4, 4),
            color=lines_color, lw=lw*1.5, alpha=lines_alpha)
    ax.plot( (50/2, 50/2), (4.3, 4), color=lines_color,
            lw=lw, alpha=lines_alpha)

    # Half Court Line
    ax.axhline(94/2, color=lines_color, lw=lw, alpha=lines_alpha)

    
    # Plot Limit
    ax.set_xlim(0, 50)
    ax.set_ylim(0, 94/2 + 2)
    ax.set_facecolor(court_color)
    ax.set_xticks([])
    ax.set_yticks([])
    ax.set_xlabel(&apos;&apos;)
    return ax
</code></pre><!--kg-card-end: html--><p>You&apos;ll note that the code has several formatting options and you can even switch between a men&apos;s and women&apos;s courts. CBBD does not currently offer NCAA women&apos;s data, but that is still a very nice feature to have.</p><p>Go ahead and run the function without any options specified.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
create_ncaa_half_court()
</code></pre><!--kg-card-end: html--><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2025/03/image.png" class="kg-image" alt="Talking Tech: Generating Shot Charts using the Basketball API" loading="lazy" width="535" height="403"></figure><p>Pretty basic and it just works! We can add some formatting options.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
create_ncaa_half_court(three_line=&apos;mens&apos;, court_color=&apos;black&apos;, lines_color=&apos;white&apos;, paint_alpha=0, inner_arc=True)
</code></pre><!--kg-card-end: html--><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2025/03/image-1.png" class="kg-image" alt="Talking Tech: Generating Shot Charts using the Basketball API" loading="lazy" width="543" height="403"></figure><p>Feel free to mess around more with different court and style combinations.</p><hr><!--kg-card-begin: html--><script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-3674605305984905" crossorigin="anonymous"></script>
<ins class="adsbygoogle" style="display:block; text-align:center;" data-ad-layout="in-article" data-ad-format="fluid" data-ad-client="ca-pub-3674605305984905" data-ad-slot="7107763740"></ins>
<script>
     (adsbygoogle = window.adsbygoogle || []).push({});
</script><!--kg-card-end: html--><hr><!--kg-card-begin: markdown--><h1 id="importing-shot-location-data">Importing Shot Location Data</h1>
<!--kg-card-end: markdown--><p>We will grab shot location data from the CollegeBasketballData.com (CBBD) API. Specifically, we&apos;ll be working with the <code>cbbd</code> Python package (imported above). First, configure your API key, replacing your own API key with the placeholder below. If you need an API key, you can register for a free key via the <a href="https://collegebasketballdata.com/key?ref=blog.collegefootballdata.com">CBBD main website</a>.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
configuration = cbbd.Configuration(
    access_token = &apos;your_api_key_here&apos;
)
</code></pre><!--kg-card-end: html--><p>Shot location data is included in play by play data. We can use the CBBD Plays API to grab all shooting plays for a specific team or player. In this example, we will grab team-level data. We will specify <code>season</code> and <code>team</code> parameters. We will also pass in a <code>shooting_plays_only</code> flag to only return shooting plays (i.e. filtering out things like timeouts, rebounds, fouls, etc). The code block below will grab shooting plays associated with Dayton in the 2025 season. Feel free to switch up the team or season.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
with cbbd.ApiClient(configuration) as api_client:
    plays_api = cbbd.PlaysApi(api_client)
    plays = plays_api.get_plays_by_team(season=2025, team=&apos;Dayton&apos;, shooting_plays_only=True)
plays[0]
</code></pre><!--kg-card-end: html--><p>Example output of a shooting play:</p><pre><code class="language-bash">PlayInfo(id=118229, source_id=&apos;401715398101806301&apos;, game_id=426, game_source_id=&apos;401715398&apos;, game_start_date=datetime.datetime(2024, 11, 9, 19, 30, tzinfo=datetime.timezone.utc), season=2025, season_type=&lt;SeasonType.REGULAR: &apos;regular&apos;&gt;, game_type=&apos;STD&apos;, play_type=&apos;LayUpShot&apos;, is_home_team=False, team_id=212, team=&apos;Northwestern&apos;, conference=&apos;Big Ten&apos;, opponent_id=64, opponent=&apos;Dayton&apos;, opponent_conference=&apos;A-10&apos;, period=1, clock=&apos;19:36&apos;, seconds_remaining=1176, home_score=0, away_score=0, home_win_probability=0.635, scoring_play=False, shooting_play=True, score_value=2, wallclock=None, play_text=&apos;Ty Berry missed Layup.&apos;, participants=[PlayInfoParticipantsInner(name=&apos;Ty Berry&apos;, id=5452)], shot_info=ShotInfo(shooter=ShotInfoShooter(name=&apos;Ty Berry&apos;, id=5452), made=False, range=&apos;rim&apos;, assisted=False, assisted_by=ShotInfoShooter(name=None, id=None), location=ShotInfoLocation(y=270, x=864.8)))</code></pre><p>We can easily load this up into a pandas DataFrame. The current scale for the <code>x</code> and <code>y</code> coordinates is 10 pts for every 1 foot. Dividing by 10, we can convert that into feet as we import into a DataFrame, which will make it easier to work with the half court plot we ran through above. We will also filter out any shooting plays that may be missing location data for whatever reason.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
df = pd.DataFrame.from_records([
    dict(
        x=p.shot_info.location.x / 10,
        y=p.shot_info.location.y / 10,
    )
    for p in plays
    if p.shot_info is not None
        and p.shot_info.location is not None
        and p.shot_info.location.x is not None
        and p.shot_info.location.y is not None
])

df.head()
</code></pre><!--kg-card-end: html--><!--kg-card-begin: html--><table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right">
      <th></th>
      <th>x</th>
      <th>y</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>76.14</td>
      <td>29.5</td>
    </tr>
    <tr>
      <th>1</th>
      <td>22.56</td>
      <td>41.0</td>
    </tr>
    <tr>
      <th>2</th>
      <td>26.32</td>
      <td>8.5</td>
    </tr>
    <tr>
      <th>3</th>
      <td>81.78</td>
      <td>31.5</td>
    </tr>
    <tr>
      <th>4</th>
      <td>69.56</td>
      <td>9.5</td>
    </tr>
  </tbody>
</table><!--kg-card-end: html--><p>We have one last step to take to get our data into a usable state. We are currently working with half court plots, but these shot locations correspond to a full court. We will convert the shot locations to half court coordinates by translating locations from the missing half over to the visible half of the court.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
df[&apos;x_half&apos;] = df[&apos;x&apos;]
df.loc[df[&apos;x&apos;] &gt; 47, &apos;x_half&apos;] = (94 - df[&apos;x&apos;].loc[df[&apos;x&apos;] &gt; 47])
df[&apos;y_half&apos;] = df[&apos;y&apos;]
df.loc[df[&apos;x&apos;] &gt; 47, &apos;y_half&apos;] = (50 - df[&apos;y&apos;].loc[df[&apos;x&apos;] &gt; 47])

# cast these to float to avoid typing issues later
df[&apos;x_half&apos;] = df[&apos;x_half&apos;].astype(float)
df[&apos;y_half&apos;] = df[&apos;y_half&apos;].astype(float)
</code></pre><!--kg-card-end: html--><!--kg-card-begin: markdown--><h1 id="plotting-the-data">Plotting the Data</h1>
<!--kg-card-end: markdown--><p>We can easily plot this data using <code>matplotlib</code>. For example, we can put it into a scatter plot.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
plt.scatter(df[&apos;y_half&apos;], df[&apos;x_half&apos;])
</code></pre><!--kg-card-end: html--><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2025/03/image-17.png" class="kg-image" alt="Talking Tech: Generating Shot Charts using the Basketball API" loading="lazy" width="543" height="413"></figure><p>Not very pretty, but you can clearly see a basketball court, including the general outline of the 3-point line.</p><p>We can improve upon these by making a <a href="https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hexbin.html?ref=blog.collegefootballdata.com">hexbin</a> chart, which will bucket shots into hexagonal areas of the court to create a sort of heatmap. The below code will create a hexbin plot using the <code>inferno</code> color map.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
plt.hexbin(df[&apos;y_half&apos;], df[&apos;x_half&apos;], gridsize=20, cmap=&apos;inferno&apos;)
</code></pre><!--kg-card-end: html--><p>You can <a href="https://matplotlib.org/stable/users/explain/colors/colormaps.html?ref=blog.collegefootballdata.com">view more colormaps here</a> and play around with different color schemes. Just replace <code>inferno</code> in the above snippet with the colormap of our choice. You can also type in <code>plt.cm.</code> and use autocomplete to conveniently see what is available.</p><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2025/03/image-18.png" class="kg-image" alt="Talking Tech: Generating Shot Charts using the Basketball API" loading="lazy" width="543" height="413"></figure><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2025/03/image-4.png" class="kg-image" alt="Talking Tech: Generating Shot Charts using the Basketball API" loading="lazy" width="588" height="294"></figure><p>I&apos;m partial to <code>gist_heat_r</code>, so let&apos;s check that one out. We&apos;ll just rerun the code from above, replacing the colormap with that one.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
plt.hexbin(df[&apos;y_half&apos;], df[&apos;x_half&apos;], gridsize=20, cmap=plt.cm.gist_heat_r)
</code></pre><!--kg-card-end: html--><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2025/03/image-19.png" class="kg-image" alt="Talking Tech: Generating Shot Charts using the Basketball API" loading="lazy" width="543" height="413"></figure><p>You can also mess around with the <code>gridsize</code> parameter for lower or higher resolution. Here I will increase the value from 20 to 40.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
plt.hexbin(df[&apos;y_half&apos;], df[&apos;x_half&apos;], gridsize=40, cmap=plt.cm.gist_heat_r)
</code></pre><!--kg-card-end: html--><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2025/03/image-20.png" class="kg-image" alt="Talking Tech: Generating Shot Charts using the Basketball API" loading="lazy" width="543" height="413"></figure><hr><!--kg-card-begin: html--><script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-3674605305984905" crossorigin="anonymous"></script>
<ins class="adsbygoogle" style="display:block; text-align:center;" data-ad-layout="in-article" data-ad-format="fluid" data-ad-client="ca-pub-3674605305984905" data-ad-slot="7107763740"></ins>
<script>
     (adsbygoogle = window.adsbygoogle || []).push({});
</script><!--kg-card-end: html--><hr><!--kg-card-begin: markdown--><h1 id="bringing-it-all-together">Bringing it all together</h1>
<!--kg-card-end: markdown--><p>We&apos;ve plotted an empty half court. We&apos;ve plot actual shot location data points. It&apos;s time to bring that all together. Run the below snippet and then we&apos;ll break it down line by line.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
fig, ax = plt.subplots(figsize=(13.8, 14))
ax.hexbin(x=&apos;y_half&apos;, y=&apos;x_half&apos;, cmap=plt.cm.gist_heat_r, gridsize=40, data=df)
create_ncaa_half_court(ax, court_color=&apos;white&apos;,
                       lines_color=&apos;black&apos;, paint_alpha=0,
                       inner_arc=True)
plt.show()
</code></pre><!--kg-card-end: html--><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2025/03/image-21.png" class="kg-image" alt="Talking Tech: Generating Shot Charts using the Basketball API" loading="lazy" width="1089" height="1098" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2025/03/image-21.png 600w, https://blog.collegefootballdata.com/content/images/size/w1000/2025/03/image-21.png 1000w, https://blog.collegefootballdata.com/content/images/2025/03/image-21.png 1089w" sizes="(min-width: 720px) 720px"></figure><p>Pretty nice, huh? Let&apos;s walk through it.</p><ul><li>On line 1, we are setting the size of the plot and returning the plot <code>fig</code> and <code>ax</code> objects.</li><li>On line 2, we are using the <code>ax</code> object to create a hexbin plot, almost identical to above.</li><li>On line 3, we are calling the <code>create_ncaa_half_court</code> function with our desired styling options. The colormap used here works best with a white background.</li><li>Lastly, we show the court with the plotted hex bins.</li></ul><p>Let&apos;s make this even cooler. We&apos;re going to use a library called <a href="https://seaborn.pydata.org/?ref=blog.collegefootballdata.com">seaborn</a>, which is built upon <code>matplotlib</code>. It contains many of the base plots found within <code>matplotlib</code>, but with its own tweaks and improvements. It also offers several additional, more advanced types of plots. You can view <a href="https://seaborn.pydata.org/examples/index.html?ref=blog.collegefootballdata.com">the gallery here</a>. We are going to be working with a <a href="https://seaborn.pydata.org/examples/hexbin_marginals.html?ref=blog.collegefootballdata.com">jointplot</a>, which will combine the hexbin chart we created with aspects of a bar chart.</p><p>It&apos;s pretty simply. Just run the snippet below to see what it looks like.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
sns.jointplot(data=df, x=&apos;y_half&apos;, y=&apos;x_half&apos;,
                                    kind=&apos;hex&apos;, space=0, color=plt.cm.gist_heat_r(.2), cmap=plt.cm.gist_heat_r)
</code></pre><!--kg-card-end: html--><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2025/03/image-22.png" class="kg-image" alt="Talking Tech: Generating Shot Charts using the Basketball API" loading="lazy" width="585" height="590"></figure><p>Now put it all together and let&apos;s plot the jointplot on top of our half court plot.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
cmap = plt.cm.gist_heat_r
joint_shot_chart = sns.jointplot(data=df, x=&apos;y_half&apos;, y=&apos;x_half&apos;,
                                kind=&apos;hex&apos;, space=0, color=cmap(.2), cmap=cmap)

joint_shot_chart.figure.set_size_inches(12,11)

# A joint plot has 3 Axes, the first one called ax_joint 
# is the one we want to draw our court onto 
ax = joint_shot_chart.ax_joint
create_ncaa_half_court(ax=ax,
                            three_line=&apos;mens&apos;,
                            court_color=&apos;white&apos;,
                            lines_color=&apos;black&apos;,
                            paint_alpha=0,
                            inner_arc=True)
</code></pre><!--kg-card-end: html--><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2025/03/image-23.png" class="kg-image" alt="Talking Tech: Generating Shot Charts using the Basketball API" loading="lazy" width="1076" height="985" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2025/03/image-23.png 600w, https://blog.collegefootballdata.com/content/images/size/w1000/2025/03/image-23.png 1000w, https://blog.collegefootballdata.com/content/images/2025/03/image-23.png 1076w" sizes="(min-width: 720px) 720px"></figure><p>One last thing, let&apos;s remove the access labels and add a title.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
cmap = plt.cm.gist_heat_r
joint_shot_chart = sns.jointplot(data=df, x=&apos;y_half&apos;, y=&apos;x_half&apos;,
                                kind=&apos;hex&apos;, space=0, color=cmap(.2), cmap=cmap)

joint_shot_chart.figure.set_size_inches(12,11)

# A joint plot has 3 Axes, the first one called ax_joint 
# is the one we want to draw our court onto 
ax = joint_shot_chart.ax_joint
create_ncaa_half_court(ax=ax,
                            three_line=&apos;mens&apos;,
                            court_color=&apos;white&apos;,
                            lines_color=&apos;black&apos;,
                            paint_alpha=0,
                            inner_arc=True)

# Get rid of axis labels and tick marks
ax.set_xlabel(&apos;&apos;)
ax.set_ylabel(&apos;&apos;)
ax.tick_params(labelbottom=&apos;off&apos;, labelleft=&apos;off&apos;)
ax.set_title(f&quot;Dayton Shot Attempts\n(2024-2025)&quot;, y=1.22, fontsize=18)
</code></pre><!--kg-card-end: html--><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2025/03/image-24.png" class="kg-image" alt="Talking Tech: Generating Shot Charts using the Basketball API" loading="lazy" width="1057" height="1056" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2025/03/image-24.png 600w, https://blog.collegefootballdata.com/content/images/size/w1000/2025/03/image-24.png 1000w, https://blog.collegefootballdata.com/content/images/2025/03/image-24.png 1057w" sizes="(min-width: 720px) 720px"></figure><p>There are other styles of joint plots you can make by changing the <code>kind</code> parameter on line 3 above. For example, changing the <code>kind</code> from <code>hex</code> to <code>scatter</code> results in this.</p><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2025/03/image-25.png" class="kg-image" alt="Talking Tech: Generating Shot Charts using the Basketball API" loading="lazy" width="1057" height="1056" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2025/03/image-25.png 600w, https://blog.collegefootballdata.com/content/images/size/w1000/2025/03/image-25.png 1000w, https://blog.collegefootballdata.com/content/images/2025/03/image-25.png 1057w" sizes="(min-width: 720px) 720px"></figure><p>Here is what happens when we change it to <code>kde</code>.</p><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2025/03/image-26.png" class="kg-image" alt="Talking Tech: Generating Shot Charts using the Basketball API" loading="lazy" width="1057" height="1056" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2025/03/image-26.png 600w, https://blog.collegefootballdata.com/content/images/size/w1000/2025/03/image-26.png 1000w, https://blog.collegefootballdata.com/content/images/2025/03/image-26.png 1057w" sizes="(min-width: 720px) 720px"></figure><p>It doesn&apos;t look so great, does it? We can mess around a bit with the styling to make that look a little better. I&apos;m going to change the colormap to <code>inferno</code>, add <code>fill</code> and <code>thresh</code> parameters, and change the half court styling a little bit.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
cmap = plt.cm.inferno
joint_shot_chart = sns.jointplot(data=df, x=&apos;y_half&apos;, y=&apos;x_half&apos;,
                                kind=&apos;kde&apos;, space=0, fill=True, thresh=0,  color=cmap(.2), cmap=cmap)

joint_shot_chart.figure.set_size_inches(12,11)

# A joint plot has 3 Axes, the first one called ax_joint 
# is the one we want to draw our court onto 
ax = joint_shot_chart.ax_joint
create_ncaa_half_court(ax=ax,
                            three_line=&apos;mens&apos;,
                            court_color=&apos;black&apos;,
                            lines_color=&apos;white&apos;,
                            paint_alpha=0,
                            inner_arc=True)

# Get rid of axis labels and tick marks
ax.set_xlabel(&apos;&apos;)
ax.set_ylabel(&apos;&apos;)
ax.tick_params(labelbottom=&apos;off&apos;, labelleft=&apos;off&apos;)
ax.set_title(f&quot;Dayton Shot Attempts\n(2024-2025)&quot;, y=1.22, fontsize=18)
</code></pre><!--kg-card-end: html--><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2025/03/image-27.png" class="kg-image" alt="Talking Tech: Generating Shot Charts using the Basketball API" loading="lazy" width="1057" height="1056" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2025/03/image-27.png 600w, https://blog.collegefootballdata.com/content/images/size/w1000/2025/03/image-27.png 1000w, https://blog.collegefootballdata.com/content/images/2025/03/image-27.png 1057w" sizes="(min-width: 720px) 720px"></figure><p>That&apos;s better. See the <a href="https://seaborn.pydata.org/generated/seaborn.jointplot.html?ref=blog.collegefootballdata.com">jointplot docs</a> for more styles, examples, and customizations.</p><h1 id="conclusion-and-further-reading">Conclusion and Further Reading</h1><p>You should now be able to create shot location charts against an actual court using <code>matplotlib</code> and <code>seaborn</code> with the CBBD Python library. There are many ways to take this further:</p><ul><li>Plot multiple teams using subplots</li><li>Plot made shots and missed shots side-by-side for the same team using subplots</li><li>Apply the same code to plotting shot charts for specific players</li><li>Find new styling and customizations</li></ul><p>Lastly, I already cited Rob Mulla and his <a href="https://www.kaggle.com/code/robikscube/ncaa-basketball-court-plot-helper-functions?ref=blog.collegefootballdata.com">excellent Kaggle article</a> and helper functions for plotting NCAA basketball courts. I&apos;d be remiss if I also didn&apos;t shout Savvas Tjortjoglou as a drew a lot of inspiration from his <a href="http://savvastjortjoglou.com/nba-shot-sharts.html?ref=blog.collegefootballdata.com">article on plotting NBA shot charts</a>.</p><p>As always, let me know what you think and happy coding!</p>]]></content:encoded></item><item><title><![CDATA[Talking Tech: Build an environment for data analysis in 2025]]></title><description><![CDATA[An updated guide on building an environment for college football and college basketball analysis, using VS Code, Jupyter, and GitHub Copilot.]]></description><link>https://blog.collegefootballdata.com/talking-tech-build-an-environment/</link><guid isPermaLink="false">67a62c772a659a00015caed1</guid><category><![CDATA[Programming]]></category><category><![CDATA[Talking Tech]]></category><dc:creator><![CDATA[Bill Radjewski]]></dc:creator><pubDate>Sat, 08 Feb 2025 01:45:56 GMT</pubDate><media:content url="https://images.unsplash.com/photo-1522832712787-3fbd36c9fe2d?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wxMTc3M3wwfDF8c2VhcmNofDgwfHxidWlsZHxlbnwwfHx8fDE3Mzg5NTgzMjF8MA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=2000" medium="image"/><content:encoded><![CDATA[<img src="https://images.unsplash.com/photo-1522832712787-3fbd36c9fe2d?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wxMTc3M3wwfDF8c2VhcmNofDgwfHxidWlsZHxlbnwwfHx8fDE3Mzg5NTgzMjF8MA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=2000" alt="Talking Tech: Build an environment for data analysis in 2025"><p>If you follow this blog, chances are that you&apos;ve seen and perhaps even walked through <a href="https://blog.collegefootballdata.com/talking-tech-building-an-environment-for-predictive-analysis/">my guide on building an environment for analysis</a>. That article is from 5 years ago and I still get questions and feedback on it to this day. To be clear, I still think it&apos;s a perfectly valid way to build an environment and to this day I still primarily use the Docker setup outlined in the guide. However, I find myself starting to gravitate more and more towards a non-Docker environment.</p><p>Docker is great and I still use it for many things, but lately I&apos;ve found that it eats up a lot of resources on my local machine so I don&apos;t always have it running. The <a href="https://github.com/BlueSCar/docker-jupyter/pkgs/container/docker-jupyter%2Fdocker-jupyter?ref=blog.collegefootballdata.com">base Docker image</a> I shared in the previous article is still published and available for anyone to use, but it has been increasingly challenging to maintain and keep up-to-date via automation. You can still use that image and it still works great in my experience, but recently gained appreciation for a more lightweight approach.</p><p>These are the tools used in this approach:</p><!--kg-card-begin: markdown--><ul>
<li><a href="https://code.visualstudio.com/?ref=blog.collegefootballdata.com">VS Code</a></li>
<li>Jupyter</li>
<li>Python (with virtual environments)</li>
<li>The CFBD and CBBD Python packages</li>
</ul>
<!--kg-card-end: markdown--><p>If you&apos;ve never used VS Code before as an IDE, you should be checking it out. It&apos;s long been my IDE of choice for everything else and it provides a fantastic experience for working with Jupyter notebooks. What has put it over the top for me and caused my to use it more and more for data analytics task is <a href="https://code.visualstudio.com/docs/copilot/overview?ref=blog.collegefootballdata.com">GitHub Copilot</a>. GitHub Copilot has become something that I am no longer able to live without. You may be familiar with my recent rewrite of the CFBD API, website, and most associated infrastructure. You may also be familiar with my recent foray into basketball with CollegeBasketballData.com. I wouldn&apos;t have been able to do any of this without Copilot. It&apos;s probably at least halved my development time on the above. And it works seamlessly with Jupyter notebooks in VS Code.</p><p>Just as with the previous guide, this guide should work whether you are on Windows, Mac, or Linux. I am a Windows user and still highly recommend setting up <a href="https://docs.microsoft.com/en-us/windows/wsl/install-win10?ref=blog.collegefootballdata.com">Windows Subsystem for Linux (WSL)</a> with your favorite Linux flavor (I use Ubuntu) if you are also in Windows. I do all my development (personal and professional) exclusively in WSL.</p><hr><!--kg-card-begin: html--><script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<ins class="adsbygoogle" style="display:block; text-align:center;" data-ad-layout="in-article" data-ad-format="fluid" data-ad-client="ca-pub-3674605305984905" data-ad-slot="4698398920"></ins>
<script>
     (adsbygoogle = window.adsbygoogle || []).push({});
</script><!--kg-card-end: html--><hr><!--kg-card-begin: markdown--><h1 id="getting-started">Getting Started</h1>
<!--kg-card-end: markdown--><p>Prerequisites are that you have the following installed:</p><!--kg-card-begin: markdown--><ul>
<li>VS Code</li>
<li>Python</li>
</ul>
<!--kg-card-end: markdown--><p>You will also need some VS Code extensions, at the very least the Python and Jupyter extensions. Here is the list of extensions I am running for this tutorial:</p><!--kg-card-begin: markdown--><ul>
<li><a href="https://marketplace.visualstudio.com/items?itemName=ms-toolsai.jupyter&amp;ref=blog.collegefootballdata.com">Jupyter</a></li>
<li><a href="https://marketplace.visualstudio.com/items?itemName=ms-python.python&amp;ref=blog.collegefootballdata.com">Python</a></li>
<li><a href="https://marketplace.visualstudio.com/items?itemName=ms-toolsai.jupyter-renderers&amp;ref=blog.collegefootballdata.com">Jupyter Notebook Renderers</a></li>
</ul>
<!--kg-card-end: markdown--><p>Open up a terminal window. Let&apos;s create a directory called <code>jupyter</code> and move into that directory.</p><!--kg-card-begin: html--><pre class="command-line" data-prompt="user@desktop ~"><code class="lang-bash">
mkdir jupyter
cd jupyter
</code></pre><!--kg-card-end: html--><p>Next, we&apos;re going to create a <a href="https://docs.python.org/3/library/venv.html?ref=blog.collegefootballdata.com">Python virtual environment</a>. This is always a good practice as allows you to work with different Python versions and package versions across different folders/repos.</p><!--kg-card-begin: markdown--><pre class="command-line" data-prompt="user@desktop ~/jupyter"><code class="lang-bash">
python -m venv ./venv
</code></pre><!--kg-card-end: markdown--><p>This should have created a <code>venv</code> folder with the Python binaries and some scripts. We are going to activate the virtual environment we just created by running:</p><!--kg-card-begin: html--><pre class="command-line" data-prompt="user@desktop ~/jupyter"><code class="lang-bash">
source ./venv/bin/activate
</code></pre><!--kg-card-end: html--><p>Note that this command may differ for Mac and non-WSL Windows. Refer to the documentation linked above for instructions specific to those OSes.</p><p>Next we will install a list of commonly used Python packages. Feel free to add any others you may need. We will also write these packages into a <code>requirements.txt</code> file for easy installation.</p><!--kg-card-begin: html--><pre class="command-line" data-prompt="venv user@desktop ~/jupyter"><code class="lang-bash">
pip install cbbd cfbd ipykernel matplotlib numpy pandas scikit-learn xgboost
pip freeze &gt; requirements.txt
</code></pre><!--kg-card-end: html--><p>Let&apos;s create an empty Jupyter notebook and open this directory in VS Code.</p><!--kg-card-begin: html--><pre class="command-line" data-prompt="venv user@desktop ~/jupyter"><code class="lang-bash">
touch test.ipynb
code .
</code></pre><!--kg-card-end: html--><p>Inside VS Code, open the <code>test.ipynb</code> file from the left sidebar. Then, click on &quot;Select Kernel&quot; in the top-right and then &quot;Python Environments...&quot; from the dropdown list that appears.</p><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2025/02/image.png" class="kg-image" alt="Talking Tech: Build an environment for data analysis in 2025" loading="lazy" width="839" height="214" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2025/02/image.png 600w, https://blog.collegefootballdata.com/content/images/2025/02/image.png 839w" sizes="(min-width: 720px) 720px"></figure><p>Select the environment labeled <code>venv</code>. There should be a star next to it.</p><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2025/02/image-1.png" class="kg-image" alt="Talking Tech: Build an environment for data analysis in 2025" loading="lazy" width="625" height="352" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2025/02/image-1.png 600w, https://blog.collegefootballdata.com/content/images/2025/02/image-1.png 625w"></figure><p>Now we can begin working in the Jupyter notebook. Let&apos;s start by importing the <code>cfbd</code> and <code>pandas</code> packages and running the code block.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
import cfbd
import pandas as pd
</code></pre><!--kg-card-end: html--><p>If you didn&apos;t install the <code>ipykernel</code> package with the list of packages above, you may be greeted with the below prompt. Just click &apos;Install&apos; and wait.</p><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2025/02/image-2.png" class="kg-image" alt="Talking Tech: Build an environment for data analysis in 2025" loading="lazy" width="468" height="138"></figure><p>Next, let&apos;s configure the CFBD package with our CFBD API key. If you do not have a key, <a href="https://collegefootballdata.com/key?ref=blog.collegefootballdata.com">you can acquire one from the website</a>. Replace the text below with your personal key.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
configuration = cfbd.Configuration(
    access_token = &apos;your_key_here&apos;
)
</code></pre><!--kg-card-end: html--><p>We can now call the API to grab a list of games:</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
with cfbd.ApiClient(configuration) as api_client:
    games_api = cfbd.GamesApi(api_client)
    
    games = games_api.get_games(year=2024, classification=&apos;fbs&apos;)

len(games)
</code></pre><!--kg-card-end: html--><p>In my example, there were 920 games returned. It&apos;s pretty easy to load those into a Pandas DataFrame.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
df = pd.DataFrame.from_records([g.to_dict() for g in games])
df.head()
</code></pre><!--kg-card-end: html--><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2025/02/image-3.png" class="kg-image" alt="Talking Tech: Build an environment for data analysis in 2025" loading="lazy" width="705" height="451" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2025/02/image-3.png 600w, https://blog.collegefootballdata.com/content/images/2025/02/image-3.png 705w"></figure><p>One neat trick using the Python library is that every method has a special version that will also include the HTTP response metadata. Simply attach <code>_with_http_info</code> to the end of the method. You can use this to keep track of how many monthly calls you have remaining.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
with cfbd.ApiClient(configuration) as api_client:
    games_api = cfbd.GamesApi(api_client)
    
    response = games_api.get_games_with_http_info(year=2024, classification=&apos;fbs&apos;)
    
response.headers[&apos;X-CallLimit-Remaining&apos;]
</code></pre><!--kg-card-end: html--><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2025/02/image-4.png" class="kg-image" alt="Talking Tech: Build an environment for data analysis in 2025" loading="lazy" width="753" height="197" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2025/02/image-4.png 600w, https://blog.collegefootballdata.com/content/images/2025/02/image-4.png 753w" sizes="(min-width: 720px) 720px"></figure><p>And then access the same as before data via the <code>response.data</code> field.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
games = response.data
df = pd.DataFrame.from_records([g.to_dict() for g in games])
df.head()
</code></pre><!--kg-card-end: html--><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2025/02/image-5.png" class="kg-image" alt="Talking Tech: Build an environment for data analysis in 2025" loading="lazy" width="854" height="514" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2025/02/image-5.png 600w, https://blog.collegefootballdata.com/content/images/2025/02/image-5.png 854w" sizes="(min-width: 720px) 720px"></figure><p>And that is all there is to it!</p><hr><!--kg-card-begin: html--><script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<ins class="adsbygoogle" style="display:block; text-align:center;" data-ad-layout="in-article" data-ad-format="fluid" data-ad-client="ca-pub-3674605305984905" data-ad-slot="4698398920"></ins>
<script>
     (adsbygoogle = window.adsbygoogle || []).push({});
</script><!--kg-card-end: html--><hr><!--kg-card-begin: markdown--><h1 id="conclusion">Conclusion</h1>
<!--kg-card-end: markdown--><p>I do still love Docker for many things and think it is still perfectly adequate to use for a data analytics environment. However, you can see how this approach is much more lightweight and allows you to leverage the full capabilities of VS Code. We didn&apos;t really dig into the GitHub Copilot extension. If you didn&apos;t install, then I cannot recommend it enough as it is a gamechanger.</p><p>Some other tweaks that people make include swapping out <code>pip</code> for <a href="https://docs.conda.io/en/latest/?ref=blog.collegefootballdata.com">conda</a>. However, I have found the above setup to be more than adequate. Anyway, happy coding!</p>]]></content:encoded></item><item><title><![CDATA[REST API v2 is now in general availability!]]></title><description><![CDATA[<p>The CFBD API v2 is now publicly available! The free tier has been set at 1000 monthly calls (tiering and call limits subject to change). Documentation is available at apinext.collegefootballdata.com. As previously announced, API v1 will be shut down prior to the start of the 2025 season. In</p>]]></description><link>https://blog.collegefootballdata.com/api-v2-is-now-in-general-availability/</link><guid isPermaLink="false">6778b35e0a66cc0001ae111b</guid><category><![CDATA[Announcement]]></category><dc:creator><![CDATA[Bill Radjewski]]></dc:creator><pubDate>Sat, 04 Jan 2025 14:00:00 GMT</pubDate><media:content url="https://images.unsplash.com/photo-1587357327780-c9d724b0bff8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wxMTc3M3wwfDF8c2VhcmNofDY2fHxncm93dGh8ZW58MHx8fHwxNzQwNzQ4NzczfDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=2000" medium="image"/><content:encoded><![CDATA[<img src="https://images.unsplash.com/photo-1587357327780-c9d724b0bff8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wxMTc3M3wwfDF8c2VhcmNofDY2fHxncm93dGh8ZW58MHx8fHwxNzQwNzQ4NzczfDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=2000" alt="REST API v2 is now in general availability!"><p>The CFBD API v2 is now publicly available! The free tier has been set at 1000 monthly calls (tiering and call limits subject to change). Documentation is available at apinext.collegefootballdata.com. As previously announced, API v1 will be shut down prior to the start of the 2025 season. In May 2025, both api.collegefootballdata.com and apinext.collegefootballdata.com will point to v2.</p><p>To reiterate what has already been announced, there WILL be breaking changes, so it is recommended to check out the docs and update your code at your earliest possible convenience. Current API limits are as follows:</p><!--kg-card-begin: markdown--><ul>
<li>Free tier - 1000 monthly calls</li>
<li>Patreon Tier 1 ($1/mo) - 5000 monthly calls</li>
<li>Patreon Tier 2 ($5/mo) - 30,000 monthly calls</li>
<li>Patreon Tier 3 ($10/mo) - 75,000 monthly calls (+ access to the GraphQL API with realtime data subscriptions)</li>
</ul>
<!--kg-card-end: markdown--><p>These tiers and limits are subject to change prior to the 2025 season. More tiers will be added as needed. If you need more than 75k monthly calls, reach out ot me and I will add more tiers.</p><p>Unlike REST API v1, there is no request throttling in REST API v2. This was done in favor of monthly limits to make things more transparent and easier to communicate and implement. However, note that Cloudflare limits are still in place and if you make a large amount of simultaneous requests, you may be blocked by Cloudflare for a short period of time (~10mins).</p><p>There are multiple ways to access REST API v2:</p><!--kg-card-begin: markdown--><ul>
<li>Read the API docs at apinext.collegefootballdata.com</li>
<li>Install the <a href="https://github.com/CFBD/cfbd-python?ref=blog.collegefootballdata.com">revamped Python package</a>.</li>
<li>Install the <a href="https://github.com/CFBD/cfbd-typescript?ref=blog.collegefootballdata.com">new TypeScript package</a>.</li>
<li>Install the <a href="https://github.com/CFBD/cfbd-net?ref=blog.collegefootballdata.com">new C# package</a>.</li>
</ul>
<!--kg-card-end: markdown--><p>I am exploring adding support for additional languages. If there are specific languages you would like to see, please let me know.</p><p>Subscribers in Patreon Tier 3 receive access to the <a href="https://blog.collegefootballdata.com/building-dynamic-queries-and-data-subscriptions-with-the-new-cfbd-graphql-api/">new GraphQL API</a> with <a href="https://blog.collegefootballdata.com/subscribing-to-data-events-with-the-cfbd-graphql-api/">realtime data subscriptions</a>.</p><p>REST API v2 and the new GraphQL API should still be considered to be in beta. Please do not hesitate to reach out if you run into any potential bugs or issues.</p>]]></content:encoded></item><item><title><![CDATA[Subscribing to Data Events with the CFBD GraphQL API]]></title><description><![CDATA[<p>Over the weekend, I announced the <a href="https://blog.collegefootballdata.com/building-dynamic-queries-and-data-subscriptions-with-the-new-cfbd-graphql-api/">new and experimental CFBD GraphQL API</a>. I already broke down most of the benefits of using GraphQL, which includese more dynamic querying and granular control over the data. One benefit is so big that it merits its own post, <a href="https://graphql.org/blog/2015-10-16-subscriptions/?ref=blog.collegefootballdata.com">GraphQL Subscriptions</a>.</p><p>Subscriptions do exactly</p>]]></description><link>https://blog.collegefootballdata.com/subscribing-to-data-events-with-the-cfbd-graphql-api/</link><guid isPermaLink="false">66d74138671b8500010a8de8</guid><category><![CDATA[Enhancements]]></category><category><![CDATA[Programming]]></category><category><![CDATA[Talking Tech]]></category><category><![CDATA[Announcement]]></category><dc:creator><![CDATA[Bill Radjewski]]></dc:creator><pubDate>Tue, 03 Sep 2024 19:00:01 GMT</pubDate><media:content url="https://images.unsplash.com/photo-1676906242774-a047ff5cfebd?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wxMTc3M3wwfDF8c2VhcmNofDE0fHxtYWlsfGVufDB8fHx8MTcyNTM4MzA0N3ww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=2000" medium="image"/><content:encoded><![CDATA[<img src="https://images.unsplash.com/photo-1676906242774-a047ff5cfebd?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wxMTc3M3wwfDF8c2VhcmNofDE0fHxtYWlsfGVufDB8fHx8MTcyNTM4MzA0N3ww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=2000" alt="Subscribing to Data Events with the CFBD GraphQL API"><p>Over the weekend, I announced the <a href="https://blog.collegefootballdata.com/building-dynamic-queries-and-data-subscriptions-with-the-new-cfbd-graphql-api/">new and experimental CFBD GraphQL API</a>. I already broke down most of the benefits of using GraphQL, which includese more dynamic querying and granular control over the data. One benefit is so big that it merits its own post, <a href="https://graphql.org/blog/2015-10-16-subscriptions/?ref=blog.collegefootballdata.com">GraphQL Subscriptions</a>.</p><p>Subscriptions do exactly what they say. They allow you to subscribe to data updates. If you&apos;re a Patreon subscriber, you may already be familiar with the live endpoints in the CFBD REST API (e.g. /scoreboard). While these endpoints present live data, they also require you, the user, to implement some sort of polling mechanism to re-trigger the endpoint on a cycle. And what&apos;s more, the data returned by the endpoint may or may not have changed. It&apos;s up to the user to figure out if it has.</p><p>In GraphQL, however, subscriptions are event-based. You specify a GraphQL query as a subscription and, instead of polling the data source repeatedly, the query auto-triggers each time that data has actually updated. Instead of making a bunch of calls, you specify one operation and then the data is pushed directly to your code whenever it changes in the CFBD database.</p><p>Subscriptions are pretty simple. Let&apos;s take a regular GraphQL query, one that queries betting lines from a specific sportsbook for all future games:</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-graphql">
query bettingQuery {
	game(
		where: {
			status: { _eq: &quot;scheduled&quot; }
			lines: { provider: { name: { _eq: &quot;Bovada&quot; } } }
			_or: [
				{ homeClassification: { _eq: &quot;fbs&quot; } }
				{ awayClassification: { _eq: &quot;fbs&quot; } }
			]
		}
	) {
		homeTeam
		awayTeam
		lines(where: { provider: { name: { _eq: &quot;Bovada&quot; } } }) {
			spread
			overUnder
			provider {
				name
			}
		}
	}
}

</code></pre><!--kg-card-end: html--><p>Pretty standard query, right? If we wanted, we could call this query regularly, parsing the response to see if any of the data has changed. Much simpler would be turning it into a subscription:</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-graphql">
subscription bettingSubscription {
	game(
		where: {
			status: { _eq: &quot;scheduled&quot; }
			lines: { provider: { name: { _eq: &quot;Bovada&quot; } } }
			_or: [
				{ homeClassification: { _eq: &quot;fbs&quot; } }
				{ awayClassification: { _eq: &quot;fbs&quot; } }
			]
		}
	) {
		homeTeam
		awayTeam
		lines(where: { provider: { name: { _eq: &quot;Bovada&quot; } } }) {
			spread
			overUnder
			provider {
				name
			}
		}
	}
}

</code></pre><!--kg-card-end: html--><p>That was simple! The only change I made was changing the <strong>query </strong>operation to a <strong>subscription</strong> operation (I also changed the arbitrary name of <strong>bettingSubscription</strong>). Now, whenever the data returned by this query changes in CFBD, I will get an update pushed directly to me. No more polling over and over again. No more trying to figure out if anything has actually changed.</p><p>If you want to get pushed an update whenever a game&apos;s status changes to &quot;completed&quot; so you know that it&apos;s time to pull play or box score data, you can do that. If you want to be alerted as above when a sportsbook spread has changed, you can do that. &#xA0;Want to be pushed an update when recruiting data changes? You can now do that, too.</p><h2 id="creating-a-subscription-in-python">Creating a Subscription in Python</h2><p>One important thing to note, Insomnia does not support GraphQL subscriptions. However, I still recommend always designing all of your GraphQL operations Insomnia since you can take advantage of its autocomplete and interactive GraphQL docs. You would just build the subscription as a query and then change it to a subscription when putting it into your Python code.</p><p>We&apos;re going to be working with three PyPI packages: <code>gql</code>, <code>asyncio</code>, and <code>backoff</code>. So make sure to have all of these installed in your environment.</p><p>We&apos;re going to walk through two different examples. Here is the first example and it&apos;s pretty simple:</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
from gql import Client, gql
from gql.transport.websockets import WebsocketsTransport

transport = WebsocketsTransport(
    url=&quot;wss://graphql.collegefootballdata.com/v1/graphql&quot;,
    headers={ &quot;Authorization&quot;: &quot;Bearer YOUR_API_KEY&quot;}
)

client = Client(
    transport=transport,
    fetch_schema_from_transport=True,
)

query = gql(&apos;&apos;&apos;
    subscription bettingSubscription {
        game(
            where: {
                status: { _eq: &quot;scheduled&quot; }
                lines: { provider: { name: { _eq: &quot;Bovada&quot; } } }
                _or: [
                    { homeClassification: { _eq: &quot;fbs&quot; } }
                    { awayClassification: { _eq: &quot;fbs&quot; } }
                ]
            }
        ) {
            homeTeam
            awayTeam
            lines(where: { provider: { name: { _eq: &quot;Bovada&quot; } } }) {
                spread
                overUnder
                provider {
                    name
                }
            }
        }
    }
&apos;&apos;&apos;)

for result in client.subscribe(query):
    # put your logic here
    print(result)
</code></pre><!--kg-card-end: html--><p>Let&apos;s walk through what this code is doing. On line 4, we are creating a <code>WebsocketsTransport</code>. You&apos;ll note this is different than what we did in the previous post for making GraphQL queries. If you remember, queries and mutations are just HTTP POST requests. If you look at line 5, we are instead using a <code>wss://</code> protocol. Instead of making an HTTP request, we are working over <a href="https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API?ref=blog.collegefootballdata.com">a WebSocket</a>. Unlike the HTTP protocol, WebSockets establish a persistent connection that allow for two-way communication. This is how GraphQL subscriptions are possible. A persistent connection is opened over a WebSocket. The client submits the subscription to the GraphQL server and then the GraphQL server pushes a communication out to the client whenever there is an update relevant to that subscription.</p><p>On line 6, be sure to replace <code>YOUR_API_KEY</code> with the same API key you use to access the CFBD REST API. </p><p>Starting at line 14, we build out a GraphQL operation that will be submitted to the GraphQL server as a subscription. This is the same subscription we outlined at the start of this post which subscribes to updates to the spreads and totals from a specific sportsbook (Bovada) for upcoming games.</p><p>On line 39, we begin looping through subscription updates. The GraphQL server will return an initial data set pertaining to the subscription query. Whenever there are updates to the data set, more results will appear in the loop and our code will act upon it. In the example above, we are merely printing the results to the console, but this is where you would put the logic that you want to be executed whenever there is a data update, such as pushing the updated data to your own data store.</p><p>I mentioned that we would be walking through two different examples. There is one potential issue with the example above: WebSocket connections, while incredibly useful, can be very brittle. The persistent connection can be interrupted for any number of reasons: network outage on your end, network outage on the GraphQL server&apos;s end, the GraphQL server going down temporarily for maintenance, etc.</p><p>Luckily, there are ways to address this. This is where we will be using the <code>asyncio</code> &#xA0;and <code>backoff</code> packages. Let&apos;s start with some imports:</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
import asyncio
import backoff

from gql import Client, gql
from gql.transport.websockets import WebsocketsTransport
</code></pre><!--kg-card-end: html--><p>Next, we are going to extract the GraphQL operation into its own <code>async</code> function. We will take a <code>session</code> as a parameter, which will be used to subscribe to a WebSocket session we will create later. This is basically a copy and paste from the previous example</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
async def subscribe(session):
    query = gql(&apos;&apos;&apos;
        subscription bettingSubscription {
            game(
                where: {
                    status: { _eq: &quot;scheduled&quot; }
                    lines: { provider: { name: { _eq: &quot;Bovada&quot; } } }
                    _or: [
                        { homeClassification: { _eq: &quot;fbs&quot; } }
                        { awayClassification: { _eq: &quot;fbs&quot; } }
                    ]
                }
            ) {
                homeTeam
                awayTeam
                lines {
                    spread
                    overUnder
                    provider {
                        name
                    }
                }
            }
        }
    &apos;&apos;&apos;)

    async for result in session.subscribe(query):
        # put your logic here
        print(result)
</code></pre><!--kg-card-end: html--><p>We will now create another function for managing the WebSocket connection and calling our subscription function:</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
@backoff.on_exception(backoff.expo, Exception, max_time=60)
async def graphql_connection():
    transport = WebsocketsTransport(
        url=&quot;wss://graphql.collegefootballdata.com/v1/graphql&quot;,
        headers={ &quot;Authorization&quot;: &quot;Bearer YOUR_API_KEY&quot;}
    )

    client = Client(
        transport=transport,
        fetch_schema_from_transport=True,
    )
    
    async with client as session:
        task = asyncio.create_task(subscribe(session))
        
        await asyncio.gather(task)
</code></pre><!--kg-card-end: html--><p>The <code>backoff</code> module is used on line 1. This establishes some retry logic with an exponential backoff. In other words, if the WebSocket connection gets interrupted for any reason, it will retry this method over and over again with an exponential increase in the wait period in between retries.</p><p>Starting on line 3, we have some more code copy and pasted from the previous example. Be sure to enter your CFBD API key in on line 5.</p><p>The last four lines deal with calling the subscription method using the WebSocket session that was established on the previous lines. What&apos;s interesting is that we are calling the <code>subscribe</code> method inside of a task. We could take advantage of this to call multiple subscriptions at once if we had multiple. This would enable them all to share the same WebSocket connection. The modified code would look similar to this:</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
def subscribe1(session):
    # GraphQL subscription here
    
def subscribe2(session):
    # GraphQL subscription here
    
def subscribe3(session):
    # GraphQL subscription here
    
def subscribe4(session):
    # GraphQL subscription here

@backoff.on_exception(backoff.expo, Exception, max_time=60)
async def graphql_connection():
    transport = WebsocketsTransport(
        url=&quot;wss://graphql.collegefootballdata.com/v1/graphql&quot;,
        headers={ &quot;Authorization&quot;: &quot;Bearer YOUR_API_KEY&quot;}
    )

    client = Client(
        transport=transport,
        fetch_schema_from_transport=True,
    )
    
    async with client as session:
        task1 = asyncio.create_task(subscribe1(session))
        task2 = asyncio.create_task(subscribe2(session))
        task3 = asyncio.create_task(subscribe3(session))
        task4 = asyncio.create_task(subscribe4(session))
        
        await asyncio.gather(task1, task2, task3, task4)
</code></pre><!--kg-card-end: html--><p>This modification has four different subscriptions to track, each encapsulated by its own function.</p><p>The last thing we need to do is call the <code>graphql_connection</code> function and this is where the <code>asyncio</code> package comes into play:</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
asyncio.run(graphql_connection())
</code></pre><!--kg-card-end: html--><p>Putting everything together, your final code should look similar to this:</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
import asyncio
import backoff

from gql import Client, gql
from gql.transport.websockets import WebsocketsTransport

async def subscribe(session):
    query = gql(&apos;&apos;&apos;
        subscription bettingSubscription {
            game(
                where: {
                    status: { _eq: &quot;scheduled&quot; }
                    lines: { provider: { name: { _eq: &quot;Bovada&quot; } } }
                    _or: [
                        { homeClassification: { _eq: &quot;fbs&quot; } }
                        { awayClassification: { _eq: &quot;fbs&quot; } }
                    ]
                }
            ) {
                homeTeam
                awayTeam
                lines {
                    spread
                    overUnder
                    provider {
                        name
                    }
                }
            }
        }
    &apos;&apos;&apos;)

    async for result in session.subscribe(query):
        # put your logic here
        print(result)
        
@backoff.on_exception(backoff.expo, Exception, max_time=60)
async def graphql_connection():
    transport = WebsocketsTransport(
        url=&quot;wss://graphql.collegefootballdata.com/v1/graphql&quot;,
        headers={ &quot;Authorization&quot;: &quot;Bearer YOUR_API_KEY&quot;}
    )

    client = Client(
        transport=transport,
        fetch_schema_from_transport=True,
    )
    
    async with client as session:
        task = asyncio.create_task(subscribe(session))
        
        await asyncio.gather(task)
        
asyncio.run(graphql_connection())
</code></pre><!--kg-card-end: html--><h2 id="conclusion">Conclusion</h2><p>GraphQL subscriptions are a great and efficient mechanism for subscribing to data updates. Whether you are looking to cut back on your API calls or be more efficient with your code, they are a great option. They are also a great option if you need to know when data updates. The experimental CFBD GraphQL API is available to <a href="https://www.patreon.com/collegefootballdata?ref=blog.collegefootballdata.com">Patreon subscribers at Tier 3</a>. Join today if you would like to check it out. Also, check out <a href="https://blog.collegefootballdata.com/building-dynamic-queries-and-data-subscriptions-with-the-new-cfbd-graphql-api/">my previous post</a> to see more examples of what the GraphQL API can do for you. As always, let me know what you think!</p>]]></content:encoded></item><item><title><![CDATA[Building Dynamic Queries with the CFBD GraphQL API]]></title><description><![CDATA[The experimental CFBD GraphQL API is now available and can be used to query college football data in all sorts of dynamic ways. In this post, we'll walk through how to use the new GraphQL API.]]></description><link>https://blog.collegefootballdata.com/building-dynamic-queries-and-data-subscriptions-with-the-new-cfbd-graphql-api/</link><guid isPermaLink="false">66d1d672671b8500010a8ad8</guid><category><![CDATA[Programming]]></category><category><![CDATA[Talking Tech]]></category><category><![CDATA[Enhancements]]></category><category><![CDATA[Announcement]]></category><dc:creator><![CDATA[Bill Radjewski]]></dc:creator><pubDate>Sat, 31 Aug 2024 01:08:25 GMT</pubDate><media:content url="https://images.unsplash.com/photo-1495592822108-9e6261896da8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wxMTc3M3wwfDF8c2VhcmNofDEzOHx8ZGF0YSUyMHdlYnxlbnwwfHx8fDE3MjUwMjk0MDB8MA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=2000" medium="image"/><content:encoded><![CDATA[<img src="https://images.unsplash.com/photo-1495592822108-9e6261896da8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wxMTc3M3wwfDF8c2VhcmNofDEzOHx8ZGF0YSUyMHdlYnxlbnwwfHx8fDE3MjUwMjk0MDB8MA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=2000" alt="Building Dynamic Queries with the CFBD GraphQL API"><p>Have you ever wanted more granular control over how you query data from CFBD? By more granular control, I mean dynamic filtering and sorting, querying related pieces of data in one query, and even the ability to specify which specific fields you want to be queried.</p><p>What about better real-time data support in the form of subscriptions? The REST API offers a few live endpoints that require constant polling, but I&apos;m talking about being able to create a specific data query, subscribing to that query, and your own code being notified in real time when the data in that query changes. And this is far beyond the few live REST endpoints offered today. Imagine being able to subscribe to betting line updates, for example.</p><!--kg-card-begin: markdown--><p>The experimental CFBD GraphQL API can enable you to do all of this and it is available to Patreon Tier 3 subscribers starting today. I put emphasis on the word experimental. It does not yet have full access to the entire CFBD data catalog, but it does incorporate a decent amount as of right now:</p>
<ul>
<li>Team information</li>
<li>Conference information</li>
<li>Historical team/conference associations</li>
<li>Historical and live game data (scores, Elo ratings, excitement index, weather, media information)</li>
<li>Historical and live betting data</li>
<li>Recruiting data</li>
<li>Transfer data</li>
<li>NFL Draft history</li>
</ul>
<p>Things that are <strong>not</strong> currently included but will be added over time:</p>
<ul>
<li>Drive and play data</li>
<li>Basic game, player, and season stats</li>
<li>Advanced game, player, and season stats</li>
</ul>
<p>Neither of these lists are exhaustive.</p>
<!--kg-card-end: markdown--><p>If you would like to learn and see some examples, then read on.</p><!--kg-card-begin: markdown--><h2 id="what-is-graphql">What is GraphQL?</h2>
<!--kg-card-end: markdown--><p><a href="https://graphql.org/?ref=blog.collegefootballdata.com">GraphQL</a> is a query language for APIs. Its central premise is that it defines a data model as a &quot;graph&quot; of attributes and relationships. When interfacing with such an API, you specify exactly which data you need, how it should be filtered, how it should be sorted, and it has paging abilities to grab data in batches. This is much different than a traditional REST API where you are given a concrete set of REST endpoints with discrete query parameters and a rigid data model response.</p><p>So how does it work differently from working with REST endpoints? The funny thing is, it basically is a REST endpoint. Unlike traditional REST APIs where you would likely have many different endpoints scattered across multiple different HTTP operations (e.g. GET, POST, PUT, etc), GraphQL exposes a single POST endpoint, usually named just <code>graphql</code>. You submit a POST request to that endpoint and the request body contains all the information about what you are trying to do and what data you want to receive back, all in GraphQL syntax.</p><p>Here is a simple GraphQL query using the new CFBD GraphQL endpoint:</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-graphql">
query gamesQuery {
	game(where: { season: { _eq: 2024 } }, orderBy: { startDate: ASC }) {
		id
		season
		seasonType
		week
		startDate
		homeTeam
		homeClassification
		homeConferece
		homePoints
		awayTeam
		awayClassification
		awayConferece
		awayPoints
		lines {
			provider {
				name
			}
			spread
		}
	}
}
</code></pre><!--kg-card-end: html--><p>GraphQL offers three types of operations: <strong>queries</strong> for querying data, <strong>mutations</strong> for changing data, and <strong>subscriptions</strong> for subscribing to data updates. The above example is a <strong>query</strong> named <code>gamesQuery</code>. The <code>query</code> part is important since it tells the API that we are querying for data, but the <code>gamesQuery</code> part is completely arbitrary. In fact, we could have completely left off <code>query gamesQuery</code> and the API would implicitly know we are trying to query data.</p><p>The interesting stuff starts on line 2. There is a <code>game</code> object that is made available in the graph and we are telling the API that we want to query these objects. We are also including some filtering and sorting on this line. We are telling the API to return games from the 2024 season and to sort by the <code>start_date</code> property.</p><p>Let&apos;s look at the filter a little more closely: <code>where: { season: { _eq: 2024 } }</code>. We are using an equal operator (<code>_eq</code>) to filter on the 2024 season, but there are many more operators. For example, we could use <code>_gt</code> if we wanted to query on seasons greater than a specific year. We can also combine filters. Let&apos;s say we wanted to query games from the 2024 season, but only in weeks 1, 3, and 5. We could do something like this: <code>where: { season: { _eq: 2024 }, week: { _in: [1, 3, 5] } }</code>. We&apos;ll look at some more complex scenarios later on.</p><p>We also have an ordering statement: <code>orderBy: { startDate: ASC }</code>. This tells the API to sort the results by the <code>startDate</code> field in ascending order. Similar to filters, we can combine these if we want to sort by multiple fields. And we can specify whether we want to sort in ascending or descending order on each field.</p><p>As we continue past line 2, you can see that we are also able to specify which game object fields we would like returned back in the query. On line 16, we introduce another object in the graph via the <code>lines</code> property. We have a whole <code>gameLines</code> object that we could write a separate query on. However, we also have a relationship between games and game lines via the <code>lines</code> property. Because of this, we can tell the API to return any game lines associated with each game object. We can also specify which properties we want to be returned in these nested relationships. Notably, you&apos;ll see that we have another relationship nested within a relationship, as the <code>provider</code> object has a relationship with the <code>lines</code> object. <code>provider</code> provides information on the sportsbook that provides the game line. </p><p>We&apos;ve gotten this far, so we should probably look at the data that gets returned by this query. </p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-json">
...
{
	&quot;id&quot;: 401635525,
	&quot;season&quot;: 2024,
	&quot;seasonType&quot;: &quot;regular&quot;,
	&quot;week&quot;: 1,
	&quot;startDate&quot;: &quot;2024-08-24T16:00:00&quot;,
	&quot;homeTeam&quot;: &quot;Georgia Tech&quot;,
	&quot;homeClassification&quot;: &quot;fbs&quot;,
	&quot;homeConferece&quot;: &quot;ACC&quot;,
	&quot;homePoints&quot;: 24,
	&quot;awayTeam&quot;: &quot;Florida State&quot;,
	&quot;awayClassification&quot;: &quot;fbs&quot;,
	&quot;awayConferece&quot;: &quot;ACC&quot;,
	&quot;awayPoints&quot;: 21,
	&quot;lines&quot;: [
		{
			&quot;provider&quot;: {
				&quot;name&quot;: &quot;ESPN Bet&quot;
			},
			&quot;spread&quot;: 10.5
		},
		{
			&quot;provider&quot;: {
				&quot;name&quot;: &quot;DraftKings&quot;
			},
			&quot;spread&quot;: 11.5
		},
		{
			&quot;provider&quot;: {
				&quot;name&quot;: &quot;Bovada&quot;
			},
			&quot;spread&quot;: 10.0
		}
	]
},
...
</code></pre><!--kg-card-end: html--><p>As you can see, it matches the format and fields that we specified in the query. Let&apos;s write another query with a little bit more complexity. I want to query the most exciting games of the past 10 seasons as measured by the CFBD Excitement Index metrics. My query would look like this:</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-graphql">
query excitementQuery {
	game(
		where: { season: { _gte: 2014 }, excitement: { _isNull: false } }
		orderBy: { excitement: DESC }
		limit: 100
	) {
		id
		season
		seasonType
		week
		startDate
		homeTeam
		homeClassification
		homeConferece
		homePoints
		awayTeam
		awayClassification
		awayConferece
		awayPoints
		excitement
	}
}

</code></pre><!--kg-card-end: html--><p>I&apos;m writing this article right at the start of the 2024 season, so I&apos;ve updated my filter, <code>where: { season: { _gte: 2014 }, excitement: { _isNull: false } }</code> to query all games starting with the 2014 season where the <code>excitement</code> field is not null or empty. I also included a sort clause, <code>orderBy: { excitement: DESC }</code>, because I want to sort by <code>excitement</code> in descending order so that the most exciting games are returned at the top. Lastly, I specified a limit of 100 results (<code>limit: 100</code>) because I only want the top 100 most exciting games.</p><p>Here are the partial results of that query:</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-json">
{
	&quot;data&quot;: {
		&quot;game&quot;: [
			{
				&quot;id&quot;: 401282177,
				&quot;season&quot;: 2021,
				&quot;seasonType&quot;: &quot;regular&quot;,
				&quot;week&quot;: 1,
				&quot;startDate&quot;: &quot;2021-09-05T00:00:00&quot;,
				&quot;homeTeam&quot;: &quot;South Alabama&quot;,
				&quot;homeClassification&quot;: &quot;fbs&quot;,
				&quot;homeConferece&quot;: &quot;SBC&quot;,
				&quot;homePoints&quot;: 31,
				&quot;awayTeam&quot;: &quot;Southern Mississippi&quot;,
				&quot;awayClassification&quot;: &quot;fbs&quot;,
				&quot;awayConferece&quot;: &quot;CUSA&quot;,
				&quot;awayPoints&quot;: 7,
				&quot;excitement&quot;: 21.5355699358
			},
			{
				&quot;id&quot;: 401418780,
				&quot;season&quot;: 2022,
				&quot;seasonType&quot;: &quot;regular&quot;,
				&quot;week&quot;: 9,
				&quot;startDate&quot;: &quot;2022-10-29T21:00:00&quot;,
				&quot;homeTeam&quot;: &quot;Central Arkansas&quot;,
				&quot;homeClassification&quot;: &quot;fcs&quot;,
				&quot;homeConferece&quot;: &quot;ASUN&quot;,
				&quot;homePoints&quot;: 64,
				&quot;awayTeam&quot;: &quot;North Alabama&quot;,
				&quot;awayClassification&quot;: &quot;fcs&quot;,
				&quot;awayConferece&quot;: &quot;ASUN&quot;,
				&quot;awayPoints&quot;: 29,
				&quot;excitement&quot;: 16.5218277643
			},
			{
				&quot;id&quot;: 401416599,
				&quot;season&quot;: 2022,
				&quot;seasonType&quot;: &quot;regular&quot;,
				&quot;week&quot;: 2,
				&quot;startDate&quot;: &quot;2022-09-10T22:00:00&quot;,
				&quot;homeTeam&quot;: &quot;Miami (OH)&quot;,
				&quot;homeClassification&quot;: &quot;fbs&quot;,
				&quot;homeConferece&quot;: &quot;MAC&quot;,
				&quot;homePoints&quot;: 31,
				&quot;awayTeam&quot;: &quot;Robert Morris&quot;,
				&quot;awayClassification&quot;: &quot;fcs&quot;,
				&quot;awayConferece&quot;: null,
				&quot;awayPoints&quot;: 14,
				&quot;excitement&quot;: 15.5860040950
			},
            ...
		]
	}
}
</code></pre><!--kg-card-end: html--><p>In the next few sections, we&apos;ll dive into how to query from the CFBD GraphQL API using Insomnia and Python.</p><h2 id="using-the-cfbd-graphql-api-with-insomnia">Using the CFBD GraphQL API with Insomnia</h2><p>If you haven&apos;t seen my post on <a href="https://blog.collegefootballdata.com/talking-tech-navigating-the-cfbd-api-with-insomnia/">using Insomnia with the CFBD API</a>, then be sure to check it out. <a href="https://insomnia.rest/?ref=blog.collegefootballdata.com">Insomnia</a> is by far the best tool for experimenting with different APIs. Not only is it fantastic for experimenting with traditional REST calls, but it also has really great GraphQL support. This section of the guide assumes you are familiar with Insomnia and have it set up.</p><p>So let&apos;s go ahead and open up Insomnia. You are going to create a new request just like you normally would, but this time select &quot;GraphQL Request&quot; from the dropdown.</p><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2024/08/image.png" class="kg-image" alt="Building Dynamic Queries with the CFBD GraphQL API" loading="lazy" width="517" height="522"></figure><p>The new request should look really similar to a POST request and even be labeled as such. Before we fill in the URL, we&apos;re going to add our Auth details. Select &quot;Bearer Token&quot; from the Auth dropdown.</p><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2024/08/image-1.png" class="kg-image" alt="Building Dynamic Queries with the CFBD GraphQL API" loading="lazy" width="581" height="700"></figure><p>In the Token field, fill in your API key. It will be the same API key you use on the CFBD API. There is no need to add a Bearer prefix or anything else. Just paste in your key.</p><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2024/08/image-2.png" class="kg-image" alt="Building Dynamic Queries with the CFBD GraphQL API" loading="lazy" width="564" height="243"></figure><p>Now go ahead and fill out the URL: <code>https://graphql.collegefootballdata.com/v1/graphql</code>. After pasting that in, click on &quot;schema&quot; and select &quot;Refresh Schema&quot;. Also, make sure that &quot;Automatic Fetch&quot; is enabled.</p><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2024/08/image-3.png" class="kg-image" alt="Building Dynamic Queries with the CFBD GraphQL API" loading="lazy" width="508" height="384"></figure><p>Click on &quot;Show Documentation&quot; from the same dropdown will open up a documentation side panel on the right. From the side panel, click on <code>query_root</code> to see which queries are available.</p><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2024/08/image-4.png" class="kg-image" alt="Building Dynamic Queries with the CFBD GraphQL API" loading="lazy" width="460" height="974"></figure><p>These docs are interactive, you feel free to click around to learn about the different queries and types. However, these docs aren&apos;t even necessary to get going but I did want to point them out because it&apos;s still a very nice feature.</p><p>Go ahead and click on the GraphQL tab, click inside of the code body, and then hit Ctrl+Space. The code editor has full autocomplete capabilities.</p><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2024/08/image-5.png" class="kg-image" alt="Building Dynamic Queries with the CFBD GraphQL API" loading="lazy" width="519" height="430"></figure><p>As you type out queries, you can use this functionality to guide you without even needing to really know or reference the documentation.</p><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2024/08/image-6.png" class="kg-image" alt="Building Dynamic Queries with the CFBD GraphQL API" loading="lazy" width="559" height="298"></figure><p>Let&apos;s query some recruiting data. I want to query every #1 overall high school recruit since the 2014 cycle. Additionally, I want to order by overall composite rating, with the highest ratings at the top. My query would look like this:</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-graphql">
query myQuery {
	recruit(
		where: {
			year: { _gte: 2014 }
			overallRank: { _eq: 1 }
			recruitType: { _eq: &quot;HighSchool&quot; }
		}
		orderBy: { rating: DESC }
	) {
		rating
		name
		position {
			position
			positionGroup
		}
		college {
			school
			conference
		}
		recruitSchool {
			name
		}
	}
}
</code></pre><!--kg-card-end: html--><p>Feel free to mess around with the query. Pick whatever fields you want to return and tweak the filters and the sorts if you desire to do so. Once you&apos;re satisfied, go ahead and submit. This is what my query returned back:</p><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2024/08/image-7.png" class="kg-image" alt="Building Dynamic Queries with the CFBD GraphQL API" loading="lazy" width="1200" height="738" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2024/08/image-7.png 600w, https://blog.collegefootballdata.com/content/images/size/w1000/2024/08/image-7.png 1000w, https://blog.collegefootballdata.com/content/images/2024/08/image-7.png 1200w" sizes="(min-width: 720px) 720px"></figure><p>I&apos;m actually curious about my hometown. I come from a really tiny town in northern Ohio called Huron. I would like to know if there have been any legitimate recruits in the recruiting service era to hail from there. When I played (early aughts), the recruiting services where just becoming a thing and we didn&apos;t really have any FBS-level players. We had a really great TE named Jim Fisher who played at Michigan and would have fit the bill, but he was a year or two before my time and before Rivals and Scout got big.</p><p>Anyway, here&apos;s the query I drew up.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-graphql">
query myQuery {
	recruit(
		where: {
			recruitType: { _eq: &quot;HighSchool&quot; }
			hometown: { city: { _eq: &quot;Huron&quot; }, state: { _eq: &quot;OH&quot; } }
		}
		orderBy: { rating: DESC }
	) {
		stars
		ranking
		positionRank
		rating
		name
		position {
			position
			positionGroup
		}
		college {
			school
			conference
		}
		recruitSchool {
			name
		}
		hometown {
			city
			state
		}
	}
}
</code></pre><!--kg-card-end: html--><p>And here are the results:</p><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2024/08/image-8.png" class="kg-image" alt="Building Dynamic Queries with the CFBD GraphQL API" loading="lazy" width="438" height="639"></figure><p>We&apos;ve had one lone 2* WR who ended up at Toledo. Way to go, Cody!</p><p>I can slightly modify this query if I want to filter historical recruits by any geographic region. Like if I wanted to query all-time recruits from the state of Alaska:</p><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2024/08/image-9.png" class="kg-image" alt="Building Dynamic Queries with the CFBD GraphQL API" loading="lazy" width="1285" height="718" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2024/08/image-9.png 600w, https://blog.collegefootballdata.com/content/images/size/w1000/2024/08/image-9.png 1000w, https://blog.collegefootballdata.com/content/images/2024/08/image-9.png 1285w" sizes="(min-width: 720px) 720px"></figure><p>We can even do aggregates. For example, if I wanted to find mean stars and ratings and their respective standard deviations for all Michigan recruits since 2016, I could run something like the below:</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-graphql">
query myQuery {
	recruitAggregate(
		where: {
			college: { school: { _eq: &quot;Michigan&quot; } }
			year: { _gte: 2016 }
			recruitType: { _eq: &quot;HighSchool&quot; }
		}
	) {
		aggregate {
			count
			avg {
				rating
				stars
			}
			stddev {
				rating
				stars
			}
		}
	}
}
</code></pre><!--kg-card-end: html--><p>Here are the results:</p><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2024/08/image-10.png" class="kg-image" alt="Building Dynamic Queries with the CFBD GraphQL API" loading="lazy" width="1237" height="575" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2024/08/image-10.png 600w, https://blog.collegefootballdata.com/content/images/size/w1000/2024/08/image-10.png 1000w, https://blog.collegefootballdata.com/content/images/2024/08/image-10.png 1237w" sizes="(min-width: 720px) 720px"></figure><!--kg-card-begin: markdown--><h2 id="using-the-cfbd-graphql-api-with-python">Using the CFBD GraphQL API with Python</h2>
<!--kg-card-end: markdown--><p>I will preface this section by stating that you can interface with GraphQL APIs using just about any programming. It all amounts to a basic HTTP POST request after all. If you can make an HTTP request, you can make a GraphQL request. That all said, some tools and libraries make things much easier. If I&apos;m being honest, TypeScript/JavaScript is the best ecosystem for working with GraphQL. Much like Python is largely unparalleled when it comes to libraries available for data science and machine learning, the TypeScript/JavaScript ecosystem is unparalleled when it comes to libraries and utilities for GraphQL.</p><p> However, I recognized that a large majority of CFBD users are working in Python. And frankly, Python is probably still the correct choice for you if you are working in data and analytics. Luckily, Python does have its own set of libraries for working with GraphQL.</p><p><a href="https://gql.readthedocs.io/en/latest/intro.html?ref=blog.collegefootballdata.com">GQL</a> is one of the more popular packages for interfacing with GraphQL APIs in Python. We can install it from PyPI:</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-bash">
pip install &quot;gql[all]&quot;
</code></pre><!--kg-card-end: html--><p>Or if you&apos;re using Conda:</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-bash">
conda install gql-with-all
</code></pre><!--kg-card-end: html--><p>For the duration of this section, I will be running my Python code out of a Jupyter notebook. However, you should be able to run this same code even if you aren&apos;t running in Jupyter.</p><p>We&apos;ll start off by importing packages from GQL:</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
from gql import Client, gql
from gql.transport.aiohttp import AIOHTTPTransport
</code></pre><!--kg-card-end: html--><p>Next, we will create a transport around the CFBD GraphQL URL and GraphQL client around this transport.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
transport = AIOHTTPTransport(
    url=&quot;https://graphql.collegefootballdata.com/v1/graphql&quot;,
    headers={ &quot;Authorization&quot;: &quot;Bearer YOUR_API_KEY_HERE&quot;}
)

client = Client(transport=transport, fetch_schema_from_transport=True)
</code></pre><!--kg-card-end: html--><p>Note that this is also where you need to configure your API. Replace <code>YOUR_API_KEY_HERE</code> in the above snippet with the API key you use for the CFBD API. Notice that we do need to supply a &quot;Bearer &quot; prefix here.</p><p>I&apos;m going to mirror the previous section on using Insomnia. If you skipped it, I highly recommend checking it out. I find it&apos;s usually easier to design GraphQL queries in Insomnia prior to putting them into Python code.</p><p>Executing the same query, which grabs all #1 overall high school recruits since 2014 and sorting in descending order of Composite rating looks like this:</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
query = gql(
    &quot;&quot;&quot;
    query myQuery {
        recruit(
            where: {
                year: { _gte: 2014 }
                overallRank: { _eq: 1 }
                recruitType: { _eq: &quot;HighSchool&quot; }
            }
            orderBy: { rating: DESC }
        ) {
            rating
            name
            position {
                position
                positionGroup
            }
            college {
                school
                conference
            }
            recruitSchool {
                name
            }
        }
    }
&quot;&quot;&quot;
)

result = await client.execute_async(query)
result
</code></pre><!--kg-card-end: html--><p>This is what the output looks like in my Jupyter notebook.</p><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2024/08/image-12.png" class="kg-image" alt="Building Dynamic Queries with the CFBD GraphQL API" loading="lazy" width="805" height="745" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2024/08/image-12.png 600w, https://blog.collegefootballdata.com/content/images/2024/08/image-12.png 805w" sizes="(min-width: 720px) 720px"></figure><p>We can run <code>type(result)</code> to see that <code>result</code> is a <code>dict</code>. It should be relatively easy to loop through this result and format it to our liking.</p><p>We can flatten all of the dicts to make them easier to put into a DataFrame:</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
formatted = [dict(rating=r[&apos;rating&apos;], name=r[&apos;name&apos;], college=r[&apos;college&apos;][&apos;school&apos;], position=r[&apos;position&apos;][&apos;position&apos;]) for r in result[&apos;recruit&apos;]]
formatted
</code></pre><!--kg-card-end: html--><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2024/08/image-13.png" class="kg-image" alt="Building Dynamic Queries with the CFBD GraphQL API" loading="lazy" width="778" height="693" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2024/08/image-13.png 600w, https://blog.collegefootballdata.com/content/images/2024/08/image-13.png 778w" sizes="(min-width: 720px) 720px"></figure><p>We can now easily get this into a pandas DataFrame.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
import pandas as pd
df = pd.DataFrame(formatted)
df.head()
</code></pre><!--kg-card-end: html--><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2024/08/image-14.png" class="kg-image" alt="Building Dynamic Queries with the CFBD GraphQL API" loading="lazy" width="466" height="218"></figure><p>Let&apos;s run another query. This time I am going to query Michigan&apos;s historical entries in the AP poll, sorted with the most recent appearances first.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
query = gql(
    &quot;&quot;&quot;
    query myQuery {
        pollRank(
            where: {
                team: { school: { _eq: &quot;Michigan&quot; } }
                poll: { pollType: { name: { _eq: &quot;AP Top 25&quot; } } }
            }
            orderBy: [
                { poll: { season: DESC } }
                { poll: { seasonType: DESC } }
                { poll: { week: DESC } }
            ]
        ) {
            rank
            points
            firstPlaceVotes
            poll {
                season
                seasonType
                week
                pollType {
                    name
                }
            }
        }
    }

&quot;&quot;&quot;
)

result = await client.execute_async(query)
result
</code></pre><!--kg-card-end: html--><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2024/08/image-15.png" class="kg-image" alt="Building Dynamic Queries with the CFBD GraphQL API" loading="lazy" width="514" height="754"></figure><p>We can again flatten this and load it into a DataFrame if we desire, but I&apos;ll leave that up to you.</p><h2 id="conclusion">Conclusion</h2><p>I hope that illustrates the power of GraphQL and what it can do for you. It allows for much more flexibility and fewer restrictions. I get requests all the time for querying the data in different ways or different formats or allowing different types of query parameters. This can be very difficult to keep up with and maintain in a traditional REST API, but is easy work when working with GraphQL.</p><p>Again, this is available to you if you are a <strong>Patreon Tier 3</strong> subscriber. Got to <a href="https://www.patreon.com/collegefootballdata?ref=blog.collegefootballdata.com">Patreon</a> if you are interested in checking it out. I will reiterate that this is very experimental right now. If there are pieces of data available in the REST API that you would like to see here, I am in the process of adding more and more data. Another huge benefit is real-time GraphQL subscriptions, but I&apos;ll save that for a future post. If you end up checking it out, let me know what you think!</p>]]></content:encoded></item><item><title><![CDATA[Talking Tech: Creating Charts with matplotlib]]></title><description><![CDATA[<p>In one of my earlier blog posts, I wrote a <a href="https://blog.collegefootballdata.com/making-charts-with-plotly-and-the-cfbd-python-library/">guide on creating charts</a> using the (at the time) nascent CFBD Python library and a charting library/platform called <a href="https://plotly.com/python/?ref=blog.collegefootballdata.com">Plotly</a>. I was still relatively new to Python myself and was trying to sort out the ecosystem of Python charting libraries.</p>]]></description><link>https://blog.collegefootballdata.com/talking-tech-matplotlib/</link><guid isPermaLink="false">64d4f2f800396c00013fe1d9</guid><category><![CDATA[Talking Tech]]></category><category><![CDATA[Programming]]></category><dc:creator><![CDATA[Bill Radjewski]]></dc:creator><pubDate>Thu, 12 Oct 2023 19:03:18 GMT</pubDate><media:content url="https://images.unsplash.com/photo-1591696205602-2f950c417cb9?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=MnwxMTc3M3wwfDF8c2VhcmNofDR8fGNoYXJ0fGVufDB8fHx8MTY2MDc1OTg0MQ&amp;ixlib=rb-1.2.1&amp;q=80&amp;w=2000" medium="image"/><content:encoded><![CDATA[<img src="https://images.unsplash.com/photo-1591696205602-2f950c417cb9?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=MnwxMTc3M3wwfDF8c2VhcmNofDR8fGNoYXJ0fGVufDB8fHx8MTY2MDc1OTg0MQ&amp;ixlib=rb-1.2.1&amp;q=80&amp;w=2000" alt="Talking Tech: Creating Charts with matplotlib"><p>In one of my earlier blog posts, I wrote a <a href="https://blog.collegefootballdata.com/making-charts-with-plotly-and-the-cfbd-python-library/">guide on creating charts</a> using the (at the time) nascent CFBD Python library and a charting library/platform called <a href="https://plotly.com/python/?ref=blog.collegefootballdata.com">Plotly</a>. I was still relatively new to Python myself and was trying to sort out the ecosystem of Python charting libraries. Indeed in that very post, I noted that there was a wide array of different options. Ultimately, I settled on Plotly due to its ease of use, large feature set, and fantastic documentation. I still think that Plotly is a fantastic library for those very reasons. It offers a lot out of the box with a relatively minimal level of fiddling. In recent years, however, I have gravitated towards a different charting library that has since usurped Plotly as my charting library of choice: <a href="https://matplotlib.org/stable/index.html?ref=blog.collegefootballdata.com">matplotlib</a>.</p><p>The primary reason I&apos;ve grown to love matplotlib is that it&apos;s very customizable. I&apos;ve found that I&apos;ve been able to do just about anything I&apos;ve been able to draw up in my own imagination. Due to its versatility, things are not always as straightforward as they are with Plotly but I&apos;ve found I&apos;ve been able to do much, much more. Before we dive in deeper, check out some of the charts I&apos;ve been able to generate with matplotlib.</p><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2022/08/elo_risers.png" class="kg-image" alt="Talking Tech: Creating Charts with matplotlib" loading="lazy" width="1440" height="720" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2022/08/elo_risers.png 600w, https://blog.collegefootballdata.com/content/images/size/w1000/2022/08/elo_risers.png 1000w, https://blog.collegefootballdata.com/content/images/2022/08/elo_risers.png 1440w" sizes="(min-width: 720px) 720px"></figure><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2022/08/RosterMap.png" class="kg-image" alt="Talking Tech: Creating Charts with matplotlib" loading="lazy" width="1484" height="954" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2022/08/RosterMap.png 600w, https://blog.collegefootballdata.com/content/images/size/w1000/2022/08/RosterMap.png 1000w, https://blog.collegefootballdata.com/content/images/2022/08/RosterMap.png 1484w" sizes="(min-width: 720px) 720px"></figure><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2022/08/draft_recruiting.png" class="kg-image" alt="Talking Tech: Creating Charts with matplotlib" loading="lazy" width="2000" height="1000" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2022/08/draft_recruiting.png 600w, https://blog.collegefootballdata.com/content/images/size/w1000/2022/08/draft_recruiting.png 1000w, https://blog.collegefootballdata.com/content/images/size/w1600/2022/08/draft_recruiting.png 1600w, https://blog.collegefootballdata.com/content/images/2022/08/draft_recruiting.png 2160w" sizes="(min-width: 720px) 720px"></figure><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2022/08/CadeMcNamara2021.png" class="kg-image" alt="Talking Tech: Creating Charts with matplotlib" loading="lazy" width="1440" height="720" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2022/08/CadeMcNamara2021.png 600w, https://blog.collegefootballdata.com/content/images/size/w1000/2022/08/CadeMcNamara2021.png 1000w, https://blog.collegefootballdata.com/content/images/2022/08/CadeMcNamara2021.png 1440w" sizes="(min-width: 720px) 720px"></figure><p>I initially grew frustrated with Plotly when I was trying to create plots that had logos, which apparently Plotly can&apos;t really do. This is when I really started using matplotlib and discovered how to do all kinds of advanced stuff like you see above. If you want to learn how to get started doing some of this, keep on reading!</p><!--kg-card-begin: html--><script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-3674605305984905" crossorigin="anonymous"></script>
<ins class="adsbygoogle" style="display:block; text-align:center;" data-ad-layout="in-article" data-ad-format="fluid" data-ad-client="ca-pub-3674605305984905" data-ad-slot="7107763740"></ins>
<script>
     (adsbygoogle = window.adsbygoogle || []).push({});
</script><!--kg-card-end: html--><hr><!--kg-card-begin: markdown--><h2 id="lets-get-charting">Let&apos;s get charting</h2>
<!--kg-card-end: markdown--><p><strong>Edit: </strong>The Jupyter notebook used in this guide has been <a href="https://github.com/BlueSCar/jupyter-notebooks/blob/master/Talking%20Tech/matplotlib.ipynb?ref=blog.collegefootballdata.com">uploaded to GitHub</a> if you would like to use it to follow along.</p><p>First off, we&apos;ll assume you have a Python environment setup, preferably using Jupyter notebooks. We&apos;ll begin by importing the libraries that we need, starting with the standard ones: cfbd, pandas, and numpy. I don&apos;t always end up using numpy but I usually always import it anyway because you never know. We&apos;ll also import matplotlib.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
import cfbd
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
</code></pre><!--kg-card-end: html--><p>matplotlib is pretty standard in any Jupyter or data science environment, so you should have it. If not and you get an error above, then open up a terminal and install the matplotlib package and then run the import statement again.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-bash">
pip install matplotlib
</code></pre><!--kg-card-end: html--><p>Next up, we&apos;ll configure the cfbd Python library so we can make some calls. Be sure to replace the placeholder below with your personal API key.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
config = cfbd.Configuration(
    access_token = &apos;YOUR_API_KEY&apos;
)
client = cfbd.ApiClient(config)
</code></pre><!--kg-card-end: html--><p>Now let&apos;s grab some data that we can turn into a chart. We&apos;ll grab team Elo and SP+ ratings from the end of the 2022 season and put these into a scatterplot. Run the code below to get the data.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
ratings_api = cfbd.RatingsApi(client)
elo_ratings = ratings_api.get_elo(year=2022)
sp_ratings = ratings_api.get_sp(year=2022)
</code></pre><!--kg-card-end: html--><p>Let&apos;s take a look at the format of the data that was returned from the API.</p><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2023/10/image.png" class="kg-image" alt="Talking Tech: Creating Charts with matplotlib" loading="lazy" width="772" height="779" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2023/10/image.png 600w, https://blog.collegefootballdata.com/content/images/2023/10/image.png 772w" sizes="(min-width: 720px) 720px"></figure><p>The Elo rating object is pretty simple. It&apos;s just a flat object consisting of team, year, conference, and the team&apos;s final Elo rating. The SP+ object is a bit more complex with some nesting. We really only care about the top-level properties for team and overall rating. We also want to combine these lists, but first let&apos;s convert them int DataFrames that can be merged.</p><p>Here&apos;s the code for converting the list of Elo ratings.</p><!--kg-card-begin: html--><pre class="line_numbers"><code class="lang-python">
elo_df = pd.DataFrame.from_records([e.to_dict() for e in elo_ratings])
elo_df.head()
</code></pre><!--kg-card-end: html--><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2023/10/image-1.png" class="kg-image" alt="Talking Tech: Creating Charts with matplotlib" loading="lazy" width="709" height="307" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2023/10/image-1.png 600w, https://blog.collegefootballdata.com/content/images/2023/10/image-1.png 709w"></figure><p>When converting the SP+ ratings to a DataFrame, we&apos;re only going to grab the properties we care about (team and rating).</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
sp_df = pd.DataFrame.from_records([dict(team=s.team, rating=s.rating) for s in sp_ratings])
sp_df.head()
</code></pre><!--kg-card-end: html--><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2023/10/image-2.png" class="kg-image" alt="Talking Tech: Creating Charts with matplotlib" loading="lazy" width="902" height="302" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2023/10/image-2.png 600w, https://blog.collegefootballdata.com/content/images/2023/10/image-2.png 902w" sizes="(min-width: 720px) 720px"></figure><p>Now we can merge these together into a single DataFrame. I&apos;m also going to rename the <code>rating</code> column to <code>sp</code> to make it things more clear in the data.</p><!--kg-card-begin: html--><pre class="line_numbers"><code class="lang-python">
df = elo_df.merge(sp_df, left_on=&apos;team&apos;, right_on=&apos;team&apos;)
df.rename(columns={&apos;rating&apos;: &apos;sp&apos;}, inplace=True)
df.head()
</code></pre><!--kg-card-end: html--><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2023/10/image-3.png" class="kg-image" alt="Talking Tech: Creating Charts with matplotlib" loading="lazy" width="616" height="342" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2023/10/image-3.png 600w, https://blog.collegefootballdata.com/content/images/2023/10/image-3.png 616w"></figure><p>We can now generate a scatterplot. We&apos;ll plot Elo ratings on the x-axis and SP+ ratings on the y-axis. This is super easy.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
plt.scatter(df[&apos;elo&apos;], df[&apos;sp&apos;])
</code></pre><!--kg-card-end: html--><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2023/10/image-5.png" class="kg-image" alt="Talking Tech: Creating Charts with matplotlib" loading="lazy" width="798" height="619" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2023/10/image-5.png 600w, https://blog.collegefootballdata.com/content/images/2023/10/image-5.png 798w" sizes="(min-width: 720px) 720px"></figure><p>Good charts should always have a title and labels, so let&apos;s add some of those and regenerate the chart.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
plt.scatter(df[&apos;elo&apos;], df[&apos;sp&apos;])

plt.xlabel(&apos;Elo rating&apos;)
plt.ylabel(&apos;SP+ rating&apos;)
plt.title(&apos;Elo and SP+ ratings (2022 season)&apos;)
</code></pre><!--kg-card-end: html--><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2023/10/image-6.png" class="kg-image" alt="Talking Tech: Creating Charts with matplotlib" loading="lazy" width="796" height="748" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2023/10/image-6.png 600w, https://blog.collegefootballdata.com/content/images/2023/10/image-6.png 796w" sizes="(min-width: 720px) 720px"></figure><p>Pretty easy, huh?</p><hr><!--kg-card-begin: markdown--><h2 id="jazzing-things-up">Jazzing Things Up</h2>
<!--kg-card-end: markdown--><p>These charts look a little... bland? Don&apos;t you think? Let&apos;s look at jazzing things up a bit.</p><p>I mentioned that matplotlib is highly customizable. As a result, it can be heavily themed using style sheets. Luckily, it has several builtin themes out of the box. I recommend <a href="https://matplotlib.org/stable/gallery/style_sheets/style_sheets_reference.html?ref=blog.collegefootballdata.com">checking them all out</a>.</p><p>A popular option is the ggplot theme, inspired by the famous R charting library. Let&apos;s check that one out. &#xA0;</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
plt.style.use(&apos;ggplot&apos;)
</code></pre><!--kg-card-end: html--><p>And then just rerun our chart code.</p><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2023/10/image-8.png" class="kg-image" alt="Talking Tech: Creating Charts with matplotlib" loading="lazy" width="797" height="804" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2023/10/image-8.png 600w, https://blog.collegefootballdata.com/content/images/2023/10/image-8.png 797w" sizes="(min-width: 720px) 720px"></figure><p>Personally, I&apos;m partial to the <code>fivethirtyeight</code> theme, inspired by the charts from FiveThirtyEight.com.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
plt.style.use(&apos;fivethirtyeight&apos;)
</code></pre><!--kg-card-end: html--><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2023/10/image-10.png" class="kg-image" alt="Talking Tech: Creating Charts with matplotlib" loading="lazy" width="902" height="845" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2023/10/image-10.png 600w, https://blog.collegefootballdata.com/content/images/2023/10/image-10.png 902w" sizes="(min-width: 720px) 720px"></figure><p>We can also easily manipulate the size and dimensions of charts. For example,</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
plt.rcParams[&quot;figure.figsize&quot;] = [20,10]
</code></pre><!--kg-card-end: html--><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2023/10/image-9.png" class="kg-image" alt="Talking Tech: Creating Charts with matplotlib" loading="lazy" width="1696" height="841" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2023/10/image-9.png 600w, https://blog.collegefootballdata.com/content/images/size/w1000/2023/10/image-9.png 1000w, https://blog.collegefootballdata.com/content/images/size/w1600/2023/10/image-9.png 1600w, https://blog.collegefootballdata.com/content/images/2023/10/image-9.png 1696w" sizes="(min-width: 720px) 720px"></figure><p>We can also easily export charts to an image file format, such as PNG. Just add a call to <code>savfig()</code> with the name of the file you want to save to.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
plt.scatter(df[&apos;elo&apos;], df[&apos;sp&apos;])

plt.xlabel(&apos;Elo rating&apos;)
plt.ylabel(&apos;SP+ rating&apos;)
plt.title(&apos;Elo and SP+ ratings (2022 season)&apos;)

plt.savefig(&quot;test.png&quot;)
</code></pre><!--kg-card-end: html--><hr><!--kg-card-begin: markdown--><h2 id="adding-team-logos">Adding Team Logos</h2>
<!--kg-card-end: markdown--><p>I mentioned the ability to plot team logos as being the initial impetus for my looking at matplotlib and moving away from Plotly. So this post would be no good if I didn&apos;t show you how to do that. First off, we need one more line of imports.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
from matplotlib.offsetbox import OffsetImage, AnnotationBbox
</code></pre><!--kg-card-end: html--><p>Secondly, we need some logo files. I have a collection of logos on a Google Drive that you can <a href="https://cfbd.nyc3.digitaloceanspaces.com/logos.zip?ref=blog.collegefootballdata.com">download here</a>. These logos (after you download and unzip them) should be placed in the same directory as your Jupyter notebook or Python script in a folder called <code>logos</code>.</p><p>Next up, we are going to define a function for retrieving a logo based on a team name and creating an image object from it.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
def getImage(team):
    return OffsetImage(plt.imread(f&apos;./logos/{team}.png&apos;))
</code></pre><!--kg-card-end: html--><p>We need to modify our scatterplot code above to utilize this function to plot team logos in place of points on the scatterplot. Go ahead and run this code. We&apos;ll break it all down in a second.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
fig, ax = plt.subplots()
ax.scatter(df[&apos;elo&apos;], df[&apos;sp&apos;], alpha=0)

for index, r in df.iterrows():
    ab = AnnotationBbox(getImage(r.team), (r.elo, r.sp), frameon=False)
    ax.add_artist(ab)
    
plt.xlabel(&apos;Elo rating&apos;)
plt.ylabel(&apos;SP+ rating&apos;)
plt.title(&apos;Elo and SP+ ratings (2022 season)&apos;)
</code></pre><!--kg-card-end: html--><p>If you added the logo directory properly, this is what should have been rendered:</p><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2023/10/image-11.png" class="kg-image" alt="Talking Tech: Creating Charts with matplotlib" loading="lazy" width="1253" height="626" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2023/10/image-11.png 600w, https://blog.collegefootballdata.com/content/images/size/w1000/2023/10/image-11.png 1000w, https://blog.collegefootballdata.com/content/images/2023/10/image-11.png 1253w" sizes="(min-width: 720px) 720px"></figure><p>Okay, let&apos;s break down the changes to our scatterplot code.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
fig, ax = plt.subplots()
</code></pre><!--kg-card-end: html--><p>Instead of working directly off of the <code>plt</code> object, we called the <code>subplots</code> function, which allows multiple plots to be plotted in the same figure. We don&apos;t need subplot functionality here, but what&apos;s important is that this returned figure object (<code>fig</code>) and an Axes object (<code>axes</code>) which can both be used for various customizations. This is usually how you&apos;ll generate a chart instead of using <code>plt</code> directly. </p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
ax.scatter(df[&apos;elo&apos;], df[&apos;sp&apos;], alpha=0)
</code></pre><!--kg-card-end: html--><p>There are two deviations here. First, we are calling <code>scatter</code> on the <code>ax</code> object instead of on <code>plt</code>. Secondly, we are setting an <code>alpha</code> property to 0. This is effectively making the plotted points invisible. We do not need the normal points to display because we will be adding team logos in their place.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
for index, r in df.iterrows():
    ab = AnnotationBbox(getImage(r.team), (r.elo, r.sp), frameon=False)
    ax.add_artist(ab)
</code></pre><!--kg-card-end: html--><p>This block contains the meat of the changes. We are iterating through all of the rows in the DataFrame and creating an annotation box that consists of the team logo. In constructing the annotation box (<code>AnnotationBbox</code>), we are passing in the logo image (created by our <code>getImage</code> function using the logo path), the coordinates where the logo should display (Elo rating as the x-coordinate and SP+ rating as the y-coordinate), and setting a property to turn off the image frame (which would otherwise draw an ugly border around each logo).</p><hr><h2 id="other-types-of-charts">Other types of charts</h2><p>You can use matplotlib to create just about any type of chart: line charts, pie charts, bar charts, and more. We won&apos;t go into every one of these but let&apos;s check out a line chart.</p><p>We&apos;ve already used the Elo ratings API endpoint let&apos;s use that to get historical data for a single team and put that into a line chart. I&apos;m a Michigan guy so that&apos;s the team I&apos;ll be using, but feel free to substitute in your favorite team.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
elos = ratings_api.get_elo(team=&apos;Michigan&apos;)
df = pd.DataFrame.from_records([e.to_dict() for e in elos])
df.head()
</code></pre><!--kg-card-end: html--><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2023/10/image-12.png" class="kg-image" alt="Talking Tech: Creating Charts with matplotlib" loading="lazy" width="622" height="344" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2023/10/image-12.png 600w, https://blog.collegefootballdata.com/content/images/2023/10/image-12.png 622w"></figure><p>And let&apos;s go ahead and create a line chart.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
fig, ax = plt.subplots()

ax.plot(df[&apos;year&apos;], df[&apos;elo&apos;], color=&apos;#00274c&apos;)

plt.xlabel(&apos;Year&apos;)
plt.ylabel(&apos;Elo rating&apos;)
plt.title(&apos;Historical Elo Rating (Michigan)&apos;)
</code></pre><!--kg-card-end: html--><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2023/10/image-13.png" class="kg-image" alt="Talking Tech: Creating Charts with matplotlib" loading="lazy" width="1247" height="632" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2023/10/image-13.png 600w, https://blog.collegefootballdata.com/content/images/size/w1000/2023/10/image-13.png 1000w, https://blog.collegefootballdata.com/content/images/2023/10/image-13.png 1247w" sizes="(min-width: 720px) 720px"></figure><p>Only two real minor changes from our previous code here. First, we&apos;re calling the <code>plot</code> function to generate a line chart whereas previously we were calling <code>scatter</code> for scatterplots. And then notice that passed in a <code>color</code> parameter to style the line to be in the team&apos;s primary color.</p><p>How about we add the team logo as a sort of watermark somewhere on the chart?</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
fig, ax = plt.subplots()

ax.plot(df[&apos;year&apos;], df[&apos;elo&apos;], color=&apos;#00274c&apos;)

logo = OffsetImage(plt.imread(&apos;./logos/Michigan.png&apos;), zoom=1.5)
ab = AnnotationBbox(logo, (2020, 2600), frameon=False)
ax.add_artist(ab)

plt.xlabel(&apos;Year&apos;)
plt.ylabel(&apos;Elo rating&apos;)
plt.title(&apos;Historical Elo Rating (Michigan)&apos;)
</code></pre><!--kg-card-end: html--><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2023/10/image-14.png" class="kg-image" alt="Talking Tech: Creating Charts with matplotlib" loading="lazy" width="1249" height="625" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2023/10/image-14.png 600w, https://blog.collegefootballdata.com/content/images/size/w1000/2023/10/image-14.png 1000w, https://blog.collegefootballdata.com/content/images/2023/10/image-14.png 1249w" sizes="(min-width: 720px) 720px"></figure><p>Note that lines 5-7 are almost identical to the code we used in the <code>getImage</code> function and to plot team logos as scatterplot points. In this example, I am plotting the logo in the upper right corner of the graph. I just had to pass in the actual graph coordinates where I wanted the image to go, <code>(2020, 2600)</code> in this example.</p><p>Let&apos;s say I wanted to highlight a particular range of years, in this case the tenure of a significant coach in the program&apos;s history. </p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
fig, ax = plt.subplots()

ax.plot(df[&apos;year&apos;], df[&apos;elo&apos;], color=&apos;#00274c&apos;)

logo = OffsetImage(plt.imread(&apos;./logos/Michigan.png&apos;), zoom=1.5)
ab = AnnotationBbox(logo, (2020, 2600), frameon=False)
ax.add_artist(ab)

ax.axvspan(1969, 1989, alpha=0.5, color=&quot;#FFCB05&quot;)
ax.text(1974, 1400, &apos;    1969-1989\nBo Schembechler&apos;, va=&apos;center&apos;, fontstyle=&apos;italic&apos;, fontsize=&apos;small&apos;)

plt.xlabel(&apos;Year&apos;)
plt.ylabel(&apos;Elo rating&apos;)
plt.title(&apos;Historical Elo Rating (Michigan)&apos;)
</code></pre><!--kg-card-end: html--><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2023/10/image-15.png" class="kg-image" alt="Talking Tech: Creating Charts with matplotlib" loading="lazy" width="1247" height="623" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2023/10/image-15.png 600w, https://blog.collegefootballdata.com/content/images/size/w1000/2023/10/image-15.png 1000w, https://blog.collegefootballdata.com/content/images/2023/10/image-15.png 1247w" sizes="(min-width: 720px) 720px"></figure><p>Line 9-10 are the only additions here. On line 9, I added a vertical span across the x-values 1969 to 1989, filled it in with the team&apos;s secondary color, and added some transparency.</p><p>Now suppose there&apos;s a specific point on the chart I want to call out, maybe with some text and an arrow annotation. This is how I&apos;d do that.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-python">
fig, ax = plt.subplots()

ax.plot(df[&apos;year&apos;], df[&apos;elo&apos;], color=&apos;#00274c&apos;)

logo = OffsetImage(plt.imread(&apos;./logos/Michigan.png&apos;), zoom=1.5)
ab = AnnotationBbox(logo, (2020, 2600), frameon=False)
ax.add_artist(ab)

ax.axvspan(1969, 1989, alpha=0.5, color=&quot;#FFCB05&quot;)
ax.text(1974, 1400, &apos;    1969-1989\nBo Schembechler&apos;, va=&apos;center&apos;, fontstyle=&apos;italic&apos;, fontsize=&apos;small&apos;)

ax.annotate(&quot;Fielding Yost\nPoint-a-Minute teams&quot;,
            xy=(1903, 2700), xycoords=&apos;data&apos;,
            xytext=(1940, 2600), textcoords=&apos;data&apos;,
            arrowprops=dict(facecolor=&apos;#FFCB05&apos;),
            horizontalalignment=&apos;center&apos;, verticalalignment=&apos;top&apos;)

plt.xlabel(&apos;Year&apos;)
plt.ylabel(&apos;Elo rating&apos;)
plt.title(&apos;Historical Elo Rating (Michigan)&apos;)
</code></pre><!--kg-card-end: html--><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2023/10/image-16.png" class="kg-image" alt="Talking Tech: Creating Charts with matplotlib" loading="lazy" width="1247" height="623" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2023/10/image-16.png 600w, https://blog.collegefootballdata.com/content/images/size/w1000/2023/10/image-16.png 1000w, https://blog.collegefootballdata.com/content/images/2023/10/image-16.png 1247w" sizes="(min-width: 720px) 720px"></figure><p>Lines 12-16 here are the additions. This is the basic format for adding an arrow annotation. Using the <code>annotate</code> function, I specified the text, where the arrow should point, where the arrow should end, some styling for the arrow color (using the team&apos;s secondary color again), and some alignment properties. Notice how for <code>xycoords</code> and <code>textcoords</code> we specified the <code>data</code> option. This tells the figure how to render these annotations. In this case, we are just going by the chart&apos;s coordinate system. There are several other options for specifying these locations, but those are outside of the scope of the article. I highly recommend looking into them on your own. &#xA0;</p><hr><h2 id="further-steps">Further Steps</h2><p>We&apos;ve covered the basics of matplotlib. Hopefully it&apos;s given good insight into its versatility and power. While this post should give you some good building blocks to get started creating your own charts, we&apos;ve really only touched the surface. We really only hit on scatter and line charts and there&apos;s a plethora of other chart types you can create. You can also create animated charts! Maybe that will be blog post down the road. Here are some more resources which should help you expand upon what we&apos;ve gone through here.</p><!--kg-card-begin: markdown--><ul>
<li><a href="https://matplotlib.org/stable/plot_types/index.html?ref=blog.collegefootballdata.com">matplotlib Plot Types</a></li>
<li><a href="https://matplotlib.org/stable/gallery/index.html?ref=blog.collegefootballdata.com">matplotlib Examples Gallery</a></li>
<li><a href="https://matplotlib.org/stable/tutorials/index.html?ref=blog.collegefootballdata.com">matplotlib Tutorials</a></li>
</ul>
<!--kg-card-end: markdown-->]]></content:encoded></item><item><title><![CDATA[Measuring Field Goal Kicker Efficiency]]></title><description><![CDATA[<p>I was recently in a conundrum about the best way to go about measuring field goal kicker efficiency. This has been a topic of discussion in much of the college football content I follow, which is largely centered around the Michigan football team. For the past three years, Michigan has</p>]]></description><link>https://blog.collegefootballdata.com/measuring-field-goal-kicker-efficiency/</link><guid isPermaLink="false">6511e1eeae7f620001a283d1</guid><category><![CDATA[Analysis]]></category><dc:creator><![CDATA[Bill Radjewski]]></dc:creator><pubDate>Mon, 25 Sep 2023 23:00:06 GMT</pubDate><media:content url="https://images.unsplash.com/photo-1634306320557-4c4e4660692f?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wxMTc3M3wwfDF8c2VhcmNofDN8fGZpZWxkJTIwZ29hbHxlbnwwfHx8fDE2OTU2NzA3NzV8MA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=2000" medium="image"/><content:encoded><![CDATA[<img src="https://images.unsplash.com/photo-1634306320557-4c4e4660692f?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wxMTc3M3wwfDF8c2VhcmNofDN8fGZpZWxkJTIwZ29hbHxlbnwwfHx8fDE2OTU2NzA3NzV8MA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=2000" alt="Measuring Field Goal Kicker Efficiency"><p>I was recently in a conundrum about the best way to go about measuring field goal kicker efficiency. This has been a topic of discussion in much of the college football content I follow, which is largely centered around the Michigan football team. For the past three years, Michigan has had the benefit of having perhaps the best special team tandems in school history. What MGoBlog has dubbed the <a href="https://mgoblog.com/content/preview-2023-special-teams?ref=blog.collegefootballdata.com">&quot;Pax Specialistica&quot;</a> consisted of Groza-winning kicker Jake &quot;Money&quot; Moody and punter Brad Robbins, both taken in this years NFL Draft. In the wake of Moody&apos;s departure, Michigan added Louisville transfer James Turner who has had a pretty solid career. He&apos;s not quite Jake Moody but then again it would be unrealistic to expect him to be so.</p><p>This has raised the question around the value of field goal kicking. Namely, what is the difference between an average college kicker and one who is at the upper echelon of college kickers? Expected Points Added (EPA), such as this site&apos;s own Predicted Points Added model, seems like a good starting point with how ubiquitous EPA metrics have become in the world of CFB analytics. As it turns out, however, existing EPA models are almost entirely unsuitable for providing field goal kicker metrics. We&apos;ll break some of the reasons for that down.</p><p>If you are reading this, I am going to presume you have some familiarity with EPA. If not, we&apos;ll do just a really quick breakdown of EPA since it&apos;s central to the discussion around the unsuitability of existing EPA models in evaluating this sort of thing. The basic premise of EPA is that each yard line on the football field is assigned an Expected Points (EP) value which variable based on down and distance. Because of this, you have an EP value at the start of each play predicated on the starting down, distance, and yard line. Each play results in a new EP value, either from scoring points or from the resulting down, distance, and yard line. You take the difference between the play&apos;s ending EP and the play&apos;s starting EP, you get the value of Expected Points Added, or EPA.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://blog.collegefootballdata.com/content/images/2023/09/image-21.png" class="kg-image" alt="Measuring Field Goal Kicker Efficiency" loading="lazy" width="1685" height="944" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2023/09/image-21.png 600w, https://blog.collegefootballdata.com/content/images/size/w1000/2023/09/image-21.png 1000w, https://blog.collegefootballdata.com/content/images/size/w1600/2023/09/image-21.png 1600w, https://blog.collegefootballdata.com/content/images/2023/09/image-21.png 1685w" sizes="(min-width: 720px) 720px"><figcaption>Visualization of this site&apos;s EP model for 1st and 10</figcaption></figure><p>It might seem logical to apply this principal to field goal kicking and in certain contexts it certainly is. However, evaluating kickers is not one of them. Think about the factors that affect the difficulty of making a FG kick. Here are a few:</p><ul><li>Distance from the goalposts</li><li>Wind velocity</li><li>Wind direction</li><li>Kick angle (e.g. from one of the hashes versus dead center)</li></ul><p>Ideally, we would consider each of these factors when evaluating kickers. Unfortunately, we only have data for the first factor listed here, distance from the goalposts. Also note that down and yards to go are not listed here. Whether it is 1st down or 4th down, there is no material impact on the difficulty of the kick. It also doesn&apos;t really matter if it is 4th and 10 or 4th and 1. Since these are central features in traditional EPA models, it renders the ability of these models to measure kickers as very limited.</p><p>There&apos;s also another component to this. If a kicker misses a 50 yard field goal, the result of the play is a turnover on downs and great field position for the opponent. If a kicker misses a 20 yard field goal, there&apos;s still a turnover on downs but the opponent&apos;s field position is going to be pretty poor. As a result, the resulting negative EPA from missing a 50 yard kick will be much more extreme compared to the negative EPA resulting from missing a chip shot. This resulting EPA is still a valuable metrics in certain contexts, evaluating a coach&apos;s decision to attempt a FG vs going for it vs punting, for example. But it doesn&apos;t really make much sense to punish a kicker disproportionately more for missing a 50 yard kick than for missing a chip shot, does it?</p><p>A more sensible approach would be to take the one metric we have data for, field goal distance, and see how it correlates to field goal success. We could then use this information to spit out an Expected Points model for field goals based on kick distance. In fact, this is exactly what I did.</p><hr><!--kg-card-begin: html--><script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-3674605305984905" crossorigin="anonymous"></script>
<ins class="adsbygoogle" style="display:block; text-align:center;" data-ad-layout="in-article" data-ad-format="fluid" data-ad-client="ca-pub-3674605305984905" data-ad-slot="7107763740"></ins>
<script>
     (adsbygoogle = window.adsbygoogle || []).push({});
</script><!--kg-card-end: html--><hr><h2 id="methodology">Methodology</h2><p>For this exercise, I decided to query every field attempt dating back to the 2016 season. As we are currently a few weeks into the 2023 season, this is just over 7 seasons worth of field goal data. I then assigned each kick a points value of 0 (for a missed kick) or 3 (for a successful kick). While there have been several instances of field goals being returned for a TD by the defense, I decided not to count these as -6 point outcomes. For one, this category of plays is a miniscule sample. Additionally, the factors that lead to a Kick Six type of play are massively outside of the control of the kicker.</p><p>I should also note that I only included FBS attempts in the dataset. This means that the resulting metrics won&apos;t necessarily be applicable at other levels, such as the NFL or FCS. After aggregating this data and assigning a points value to each attempt, I then calculate the average points scored on field goal attempts based on kick distance.</p><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2023/09/image-22.png" class="kg-image" alt="Measuring Field Goal Kicker Efficiency" loading="lazy" width="2000" height="1000" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2023/09/image-22.png 600w, https://blog.collegefootballdata.com/content/images/size/w1000/2023/09/image-22.png 1000w, https://blog.collegefootballdata.com/content/images/size/w1600/2023/09/image-22.png 1600w, https://blog.collegefootballdata.com/content/images/2023/09/image-22.png 2000w" sizes="(min-width: 720px) 720px"></figure><p>As you can see in the figure above, this gave a nice little trend that was easily fitted to a curve. The closest field goal attempts average out just short of 3 points per attempt and at a certain point. At a certain point, the expected value is functionally 0 points. I am using this curve to define expected points at the FBS level for a field goal at a given distance.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://blog.collegefootballdata.com/content/images/2023/09/image-23.png" class="kg-image" alt="Measuring Field Goal Kicker Efficiency" loading="lazy" width="249" height="351"><figcaption>FGA Expected Points at selected distances</figcaption></figure><p>I am also using this curve to define &quot;replacement level&quot; for field goal kickers. If a specific kick distance has an expected points value of 1.5, you would expect a replacement-level kicker to make that field goal about 50% of the time. Similarly, if the expected points value is 2.0, then a replacement-level kicker would be expected to make that kick 2 out of 3 times.</p><p>Using this concept, I&apos;ve devised a metric called Points Added Above Replacement, or PAAR. To calculate PAAR, we look at each of a kicker&apos;s FG attempts and find the difference between the actual points scored by the kicker and the expected points based on FG distance. We then add these value up for each of a kicker&apos;s attempts. For example, here were the top 25 kickers in PAAR for the 2022 season.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://blog.collegefootballdata.com/content/images/2023/09/image-24.png" class="kg-image" alt="Measuring Field Goal Kicker Efficiency" loading="lazy" width="1919" height="2565" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2023/09/image-24.png 600w, https://blog.collegefootballdata.com/content/images/size/w1000/2023/09/image-24.png 1000w, https://blog.collegefootballdata.com/content/images/size/w1600/2023/09/image-24.png 1600w, https://blog.collegefootballdata.com/content/images/2023/09/image-24.png 1919w" sizes="(min-width: 720px) 720px"><figcaption>2022 PAAR leaders</figcaption></figure><p>Looking at this chart, Jake Moody placed third in this metric behind Stanford&apos;s Joshua Katy and NC State&apos;s Christopher Dunn. Moody&apos;s PAAR value was +15.6. We define a replacement-level kicker as one who measures out at +0.0 PAAR, neither above nor below expectations. This means that, across all of his FG attempts for the 2022 season, Jake Moody scored 15.6 more points than a replacement-level kicker given the the same attempts. That is more than two TDs over the course of the season. Or put another way, he provided ~1.1 points per game over what would be expected for a replacement-level kicker.</p><p>Conversely, you can also have negative PAAR values. I hate to single out kickers since it&apos;s one of the toughest and highest pressure jobs on the football field, but here&apos;s the flip side of the above chart.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://blog.collegefootballdata.com/content/images/2023/09/image-25.png" class="kg-image" alt="Measuring Field Goal Kicker Efficiency" loading="lazy" width="1919" height="2565" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2023/09/image-25.png 600w, https://blog.collegefootballdata.com/content/images/size/w1000/2023/09/image-25.png 1000w, https://blog.collegefootballdata.com/content/images/size/w1600/2023/09/image-25.png 1600w, https://blog.collegefootballdata.com/content/images/2023/09/image-25.png 1919w" sizes="(min-width: 720px) 720px"><figcaption>2002 PAAR bottom 25</figcaption></figure><p>Kansas&apos;s Jacob Borcila netted -14.2 PAAR for the season. Accounting for each of his FG attempts, he scored 14.2 less points than would be expected for a replacement level kicker. Compare with Stanford&apos;s Joshua Karty at the top of the previous table with +19.9 PAAR. The difference between the top kicker and the bottom kicker from last season was a whopping 34.1 points, or just under five TDs! Kicking is important.</p><h2 id="other-applications">Other Applications</h2><p>I argued at the start of this that traditional EPA models aren&apos;t suitable for measuring kicker performance, but that doesn&apos;t mean they are altogether useless. We can combine this new FG expected points model with our traditional EPA model to visualize when it might make sense to go for a 1st down or TD versus attempting field goal.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://blog.collegefootballdata.com/content/images/2023/09/image-26.png" class="kg-image" alt="Measuring Field Goal Kicker Efficiency" loading="lazy" width="2000" height="1000" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2023/09/image-26.png 600w, https://blog.collegefootballdata.com/content/images/size/w1000/2023/09/image-26.png 1000w, https://blog.collegefootballdata.com/content/images/size/w1600/2023/09/image-26.png 1600w, https://blog.collegefootballdata.com/content/images/2023/09/image-26.png 2000w" sizes="(min-width: 720px) 720px"><figcaption>EP differential with a replacement level kicker</figcaption></figure><p>This heatmap illustrates the expected point differential between kicking a FG with a replacement-level kicker and the current expected points based on the distance to go and the yard line. Situations where there is more value in attempting a FG are shaded green whereas the redder areas are where points are being left on the table in deciding on a FG attempt. We can contrast this with the chart for an elite kicker.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://blog.collegefootballdata.com/content/images/2023/09/image-27.png" class="kg-image" alt="Measuring Field Goal Kicker Efficiency" loading="lazy" width="2000" height="1000" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2023/09/image-27.png 600w, https://blog.collegefootballdata.com/content/images/size/w1000/2023/09/image-27.png 1000w, https://blog.collegefootballdata.com/content/images/size/w1600/2023/09/image-27.png 1600w, https://blog.collegefootballdata.com/content/images/2023/09/image-27.png 2000w" sizes="(min-width: 720px) 720px"><figcaption>EP differential with an elite kicker</figcaption></figure><p>See how much greener this chart is than the previous one? Having an elite kicker makes the decision to attempt a FG versus going for the TD or 1st down an easier one since you have greater confidence in actually getting points with a FG attempt. Now, let&apos;s check out the chart for a kicker who is far below replacement level.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://blog.collegefootballdata.com/content/images/2023/09/image-28.png" class="kg-image" alt="Measuring Field Goal Kicker Efficiency" loading="lazy" width="2000" height="1000" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2023/09/image-28.png 600w, https://blog.collegefootballdata.com/content/images/size/w1000/2023/09/image-28.png 1000w, https://blog.collegefootballdata.com/content/images/size/w1600/2023/09/image-28.png 1600w, https://blog.collegefootballdata.com/content/images/2023/09/image-28.png 2000w" sizes="(min-width: 720px) 720px"><figcaption>EP differential with a far below replacement level kicker</figcaption></figure><p>A lot more red there. You&apos;re probably much better off taking your chances and going for it rather than attempt a FG. This is somewhat of a simplification since this isn&apos;t always a binary choice. The option to punt exists and should ideally be included in the calculus here. That said, I think the point is pretty clear on the value of a good kicker and how much an elite kicker can open up options and make decisions easier.</p><hr><h2 id="conclusion">Conclusion</h2><p>Hopefully, this gives a clear idea on how we can go about evaluating kickers. I&apos;m very excited to share the PAAR metric and start utilizing it. I plan on posting updates throughout the course of the season. And in case you missed it, I&apos;ve backfilled and am now including FG kickers in player-play stastistic data so you can now track inidivual kickers at the play level and devise some metrics of your own. Next steps for me are making some of this stuff, like PAAR and FG EP, available on the site and API. Then, it&apos;s onto punters!</p><p>As always, feel free to reach out and let me know what you think on Twitter, Discord, or Reddit.</p>]]></content:encoded></item><item><title><![CDATA[Talking Tech: Navigating the CFBD API with Insomnia]]></title><description><![CDATA[<p>There are a lot of good tools for working with APIs. Historically, <a href="https://www.postman.com/?ref=blog.collegefootballdata.com">Postman</a> has been ubiquitous in this area. While Postman is still a great tool, I ditched it a few years back for a competing tool called <a href="https://insomnia.rest/?ref=blog.collegefootballdata.com">Insomnia</a>. If you&apos;ve never used either of these tools, you</p>]]></description><link>https://blog.collegefootballdata.com/talking-tech-navigating-the-cfbd-api-with-insomnia/</link><guid isPermaLink="false">650a23840291ef00017e34d0</guid><category><![CDATA[Talking Tech]]></category><category><![CDATA[Programming]]></category><dc:creator><![CDATA[Bill Radjewski]]></dc:creator><pubDate>Fri, 22 Sep 2023 14:00:17 GMT</pubDate><media:content url="https://images.unsplash.com/photo-1610986602538-431d65df4385?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wxMTc3M3wwfDF8c2VhcmNofDJ8fGpzb258ZW58MHx8fHwxNjk1MTYzMzE4fDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=2000" medium="image"/><content:encoded><![CDATA[<img src="https://images.unsplash.com/photo-1610986602538-431d65df4385?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wxMTc3M3wwfDF8c2VhcmNofDJ8fGpzb258ZW58MHx8fHwxNjk1MTYzMzE4fDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=2000" alt="Talking Tech: Navigating the CFBD API with Insomnia"><p>There are a lot of good tools for working with APIs. Historically, <a href="https://www.postman.com/?ref=blog.collegefootballdata.com">Postman</a> has been ubiquitous in this area. While Postman is still a great tool, I ditched it a few years back for a competing tool called <a href="https://insomnia.rest/?ref=blog.collegefootballdata.com">Insomnia</a>. If you&apos;ve never used either of these tools, you may be wondering what they do. Mainly, they provide a convenient user interface for interacting with API endpoints. You can add an endpoint, and configure its URL, query parameters, request body, and request headers. Then you can call that endpoint and explore its output. You can do all of this in the UI without having a write a line of code. The benefit is you can quickly get to experimenting and testing out APIs right out of the box.</p><p>Here is an example of what this looks like querying the <code>/games</code> endpoint of the CFBD API:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://blog.collegefootballdata.com/content/images/2023/09/image.png" class="kg-image" alt="Talking Tech: Navigating the CFBD API with Insomnia" loading="lazy" width="1917" height="1147" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2023/09/image.png 600w, https://blog.collegefootballdata.com/content/images/size/w1000/2023/09/image.png 1000w, https://blog.collegefootballdata.com/content/images/size/w1600/2023/09/image.png 1600w, https://blog.collegefootballdata.com/content/images/2023/09/image.png 1917w" sizes="(min-width: 720px) 720px"><figcaption>Calling an endpoint with Insomnia&#xA0;</figcaption></figure><p>You can see all of the configuration needed to call the endpoint laid out in the middle panel, including the URL, setting query parameters, and any other additional properties. Here we configured the request to query all 2022 games. After sending the request, the formatted payload appears in the right panel.</p><p>If you look at the left panel, you can see that I have all endpoints in the CFBD API available to me and searchable.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://blog.collegefootballdata.com/content/images/2023/09/image-1.png" class="kg-image" alt="Talking Tech: Navigating the CFBD API with Insomnia" loading="lazy" width="357" height="1010"><figcaption>List of available endpoints</figcaption></figure><p>Looking at the middle panel more closely, I already have all available query parameters prepopulated for each endpoint. I can fill these in and enable/disable them at my leisure. Normally, you would have to manually add each endpoint and its available query parameters. Luckily, Insomnia has a very nice feature where you can auto-import an API collection from a Swagger or OpenAPI specification. We&apos;ll detail some steps for doing this further down.</p><p>For now, lets check out some more features of Insomnia that are worth noting. One of my favorite features is the ability to autogenerate code from any API call. All you need to do is to click on the little arrow to the right of a given endpoint and select &apos;Generate Code&apos;:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://blog.collegefootballdata.com/content/images/2023/09/image-2.png" class="kg-image" alt="Talking Tech: Navigating the CFBD API with Insomnia" loading="lazy" width="452" height="407"><figcaption>Endpoint menue</figcaption></figure><p>From there, you can select from one of many popular programming languages:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://blog.collegefootballdata.com/content/images/2023/09/image-3.png" class="kg-image" alt="Talking Tech: Navigating the CFBD API with Insomnia" loading="lazy" width="459" height="872"><figcaption>Selecting a programming language</figcaption></figure><p>And the code will auto-generate:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://blog.collegefootballdata.com/content/images/2023/09/image-4.png" class="kg-image" alt="Talking Tech: Navigating the CFBD API with Insomnia" loading="lazy" width="950" height="1038" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2023/09/image-4.png 600w, https://blog.collegefootballdata.com/content/images/2023/09/image-4.png 950w" sizes="(min-width: 720px) 720px"><figcaption>Autogenerated code</figcaption></figure><p>Another great feature is <a href="https://support.smartbear.com/alertsite/docs/monitors/api/endpoint/jsonpath.html?ref=blog.collegefootballdata.com">JSONPath</a> response filtering. Oftentimes, the response payload will be quite large and perhaps you&apos;d like to filter it down because you&apos;re looking for a specific item. This is where JSONPath comes in. Using the <code>/games</code> response above, let&apos;s say I wanted to see ids for all games in the response body with an excitement index greater than 15. The JSONPath value would be <code>$[?(@.excitement_index &gt; 15)].id</code> and filters and transforms the output as seen here:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://blog.collegefootballdata.com/content/images/2023/09/image-5.png" class="kg-image" alt="Talking Tech: Navigating the CFBD API with Insomnia" loading="lazy" width="783" height="888" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2023/09/image-5.png 600w, https://blog.collegefootballdata.com/content/images/2023/09/image-5.png 783w" sizes="(min-width: 720px) 720px"><figcaption>List of game ids</figcaption></figure><p>If you&apos;re interested in using Insomnia to work with the CFBD API, then read on. We&apos;ll walk through some steps to get things configured.</p><hr><!--kg-card-begin: html--><script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-3674605305984905" crossorigin="anonymous"></script>
<ins class="adsbygoogle" style="display:block; text-align:center;" data-ad-layout="in-article" data-ad-format="fluid" data-ad-client="ca-pub-3674605305984905" data-ad-slot="7107763740"></ins>
<script>
     (adsbygoogle = window.adsbygoogle || []).push({});
</script><!--kg-card-end: html--><hr><h2 id="configuring-insomnia-for-the-cfbd-api">Configuring Insomnia for the CFBD API</h2><p>Here are some simple steps for getting Insomnia up and running with the CFBD API. First off, we presume you already have Insomnia downloaded and installed. If not, <a href="https://insomnia.rest/?ref=blog.collegefootballdata.com">then please do that</a> before proceeding.</p><p>Now, let&apos;s go ahead and visit CollegeFootballData.com. On the right sidebar, you should see a convenient &quot;Run in Insomnia!&quot; button. Click it.</p><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2023/09/image-6.png" class="kg-image" alt="Talking Tech: Navigating the CFBD API with Insomnia" loading="lazy" width="403" height="406"></figure><p>A new browser tab will open. Click on the &quot;RUN CFBD&quot; button.</p><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2023/09/image-7.png" class="kg-image" alt="Talking Tech: Navigating the CFBD API with Insomnia" loading="lazy" width="660" height="626" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2023/09/image-7.png 600w, https://blog.collegefootballdata.com/content/images/2023/09/image-7.png 660w"></figure><p>If an alert box pops up, click on the &quot;Open Insomnia&quot; button.</p><p>Insomnia will open and present you with an import dialog. Click on &quot;Scan&quot;.</p><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2023/09/image-8.png" class="kg-image" alt="Talking Tech: Navigating the CFBD API with Insomnia" loading="lazy" width="954" height="440" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2023/09/image-8.png 600w, https://blog.collegefootballdata.com/content/images/2023/09/image-8.png 954w" sizes="(min-width: 720px) 720px"></figure><p> And then &quot;Import&quot;.</p><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2023/09/image-9.png" class="kg-image" alt="Talking Tech: Navigating the CFBD API with Insomnia" loading="lazy" width="955" height="471" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2023/09/image-9.png 600w, https://blog.collegefootballdata.com/content/images/2023/09/image-9.png 955w" sizes="(min-width: 720px) 720px"></figure><p>This will create a new Document. Go ahead and click to open the Document.</p><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2023/09/image-10.png" class="kg-image" alt="Talking Tech: Navigating the CFBD API with Insomnia" loading="lazy" width="262" height="259"></figure><p>Click on &quot;SPEC&quot; and the top of the window. This will show the Swagger documentation. Then, click on &quot;Generate Request Collection&quot;.</p><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2023/09/image-11.png" class="kg-image" alt="Talking Tech: Navigating the CFBD API with Insomnia" loading="lazy" width="1920" height="1029" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2023/09/image-11.png 600w, https://blog.collegefootballdata.com/content/images/size/w1000/2023/09/image-11.png 1000w, https://blog.collegefootballdata.com/content/images/size/w1600/2023/09/image-11.png 1600w, https://blog.collegefootballdata.com/content/images/2023/09/image-11.png 1920w" sizes="(min-width: 720px) 720px"></figure><p>You will now see all of the CFBD API endpoints available to you and ready to call. However, there are a few more small steps needed before we can do that.</p><p>We need to configure our Insomnia environment. Select &quot;Swagger env&quot; from the environment icon at the top left. Once selected, click on the gear icon just to the right.</p><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2023/09/image-12.png" class="kg-image" alt="Talking Tech: Navigating the CFBD API with Insomnia" loading="lazy" width="367" height="280"></figure><p>In the dialog that opens, add a key called <code>base_url</code> and give it a value of <code>https://api.collegefootballdata.com</code>. Afterwards, your environment config should look like this:</p><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2023/09/image-13.png" class="kg-image" alt="Talking Tech: Navigating the CFBD API with Insomnia" loading="lazy" width="842" height="385" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2023/09/image-13.png 600w, https://blog.collegefootballdata.com/content/images/2023/09/image-13.png 842w" sizes="(min-width: 720px) 720px"></figure><p>And with that, you are largely all set! You should now be able to configure and call any of the endpoints. Although, there is one pesky little detail we haven&apos;t looked at: authentication. Select any of the endpoints and click on the Auth tab. Click on the arrow to the left of the &quot;Auth&quot; text and select &quot;Bearer Token&quot;.</p><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2023/09/image-14.png" class="kg-image" alt="Talking Tech: Navigating the CFBD API with Insomnia" loading="lazy" width="318" height="638"></figure><p>Paste your API key into the &quot;TOKEN&quot; field. Note: you do NOT need to prefix your key with &quot;Bearer&quot;. You don&apos;t need to add &quot;Bearer&quot; anywhere. You can leave the &quot;PREFIX&quot; field blank.</p><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2023/09/image-15.png" class="kg-image" alt="Talking Tech: Navigating the CFBD API with Insomnia" loading="lazy" width="787" height="307" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2023/09/image-15.png 600w, https://blog.collegefootballdata.com/content/images/2023/09/image-15.png 787w" sizes="(min-width: 720px) 720px"></figure><p>And that is about it. One minor annoyance, you will need to configure auth for each and every endpoint. If you want an easier way, there is a plugin you can install that will automatically configure auth for you. I highly recommend doing this. This is the <a href="https://insomnia.rest/plugins/insomnia-plugin-global-headers?ref=blog.collegefootballdata.com">Global Headers</a> plugin. <a href="https://insomnia.rest/plugins/insomnia-plugin-global-headers?ref=blog.collegefootballdata.com">Click here</a> to go to the plugin page and click &quot;Install Plugin&quot; to install it into Insomnia.</p><p>Assuming you now have the plugin installed, you can now set the Authorization header as a global environment variable. Go ahead and click again on the gear to the right of &quot;Swagger env&quot; at the top left.</p><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2023/09/image-16.png" class="kg-image" alt="Talking Tech: Navigating the CFBD API with Insomnia" loading="lazy" width="359" height="156"></figure><p>Modify the environment configuration to look like below, replacing <code>&lt;token&gt;</code> with your API key. Note that the value IS prefixed with &quot;Bearer&quot; here. Be sure to keep that part in.</p><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2023/09/image-17.png" class="kg-image" alt="Talking Tech: Navigating the CFBD API with Insomnia" loading="lazy" width="809" height="373" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2023/09/image-17.png 600w, https://blog.collegefootballdata.com/content/images/2023/09/image-17.png 809w" sizes="(min-width: 720px) 720px"></figure><p>Or you can just copy the below content, paste, and modify it in your instance.</p><!--kg-card-begin: html--><pre class="line-numbers"><code class="lang-json">
{
	&quot;base_path&quot;: &quot;/&quot;,
	&quot;scheme&quot;: &quot;https&quot;,
	&quot;host&quot;: &quot;api.collegefootballdata.com&quot;,
	&quot;base_url&quot;: &quot;https://api.collegefootballdata.com&quot;,
	&quot;GLOBAL_HEADERS&quot;: {
        &quot;Authorization&quot;: &quot;Bearer <token>&quot;
    }
}
</token></code></pre><!--kg-card-end: html--><p>You should now be able to call any of the endpoints without needing to add auth details.</p><figure class="kg-card kg-image-card"><img src="https://blog.collegefootballdata.com/content/images/2023/09/image-18.png" class="kg-image" alt="Talking Tech: Navigating the CFBD API with Insomnia" loading="lazy" width="1915" height="1025" srcset="https://blog.collegefootballdata.com/content/images/size/w600/2023/09/image-18.png 600w, https://blog.collegefootballdata.com/content/images/size/w1000/2023/09/image-18.png 1000w, https://blog.collegefootballdata.com/content/images/size/w1600/2023/09/image-18.png 1600w, https://blog.collegefootballdata.com/content/images/2023/09/image-18.png 1915w" sizes="(min-width: 720px) 720px"></figure><p>Okay. Now we are really done. There are a few caveats to be aware of. First off, this will import Patreon-exclusive endpoints. You will get an HTTP error if you try to call any of these without the appropriate Patreon subscription level. Secondly, this will not automatically update when the API updates. For example, when new endpoints are added or modifications are made to the query parameters in existing endpoints. In this scenario, you will need to redo all of these steps. You&apos;ll notice that Insomnia imported the Document with the CFBD API version number. CFBD API versions follow standard versioning in the format of <code>&lt;major&gt;.&lt;minor&gt;.&lt;patch&gt;</code>. You will typically only need to reimport the configuration when a major or minor version changes. And even then you may not necessarily <em>need</em> to based on whatever has changed.</p><p>I hope you found these steps helpful. More importantly, I hope you find Insomnia to be a useful tool. As I said, it is my goto for quickly testing any API, including the CFBD API. If I want to debug something, it&apos;s the first program I open. And we&apos;ve really only touched the surface of its functionality.</p><p>Cheers!</p>]]></content:encoded></item></channel></rss>