Monday, July 23, 2018

Replicating the approach in code


The first real step in the analysis is to replicate his system and compare the results I get with the sample he sent me. I’m calling the system “Team Regression”. His sample included the recent Cardinal-Cub game which first requires looking at each’s past 30 games. Unfortunately, my files are all past seasons. The first step was to create a file with 2018 MLB results. I downloaded data and created a simple file like the one I described in earlier posts and shared with anyone who wanted it. The 2017 simple file can be found at http://bit.ly/2uHuxZB

The program uses an array for each team that contains the last 30 games encountered. As each game is read, the code looks to see if there are 30 past games for both teams. If not the data for the current game is inserted in the array. The data saved is a game id (used for testing purposes), the money line, and the results of the game (1 or 0).  As new data is inserted in the arrays, the previous data is pushed down and the 30th game is pushed out. Note that the money line saved is the modified one described in the previous post (LV line -100).

If both teams have 30 games, then each team’s array is scanned pulling out the games matching today’s game location, road games for visitors and home for the home teams. I then pass this data to a linear regression routine that I’ve written in the past. I use the resulting co-efficients and the modified LV line for today’s game to compute each team’s initial probability of winning. I then adjust these probabilities, so they total 1. These and the real LV line for today are used to compute expected returns. A line is added to a bet file for each team with some additional data. Finally, I insert this games data in the team arrays and push the 30th oldest game out.

When I ran the 2018 data through the process I was able to compare the results with the sample data he gave me on the Cardinal-Cub game. Unfortunately, they didn’t match. He had the final probabilities as 36.51% and 63.50%. I had 33.6% and 66.4%. I found the basic problen was with the 30 game history arrays. He used a different source than I did, hence had slightly different lines. But the results were close enough that I felt the program did replicate the process he used reasonably well.

Now I am ready to back test using lots of data. That will be in my next pot.

No comments:

Post a Comment

Run line analysis update

I looked back and had very slight profit on run line wagers in 2018. So, I decided to update my run line analysis from a year ago. I pos...