The first real step in the analysis is to replicate his
system and compare the results I get with the sample he sent me. I’m calling
the system “Team Regression”. His sample included the recent Cardinal-Cub game
which first requires looking at each’s past 30 games. Unfortunately, my files
are all past seasons. The first step was to create a file with 2018 MLB
results. I downloaded data and created a simple file like the one I described
in earlier posts and shared with anyone who wanted it. The 2017 simple file can
be found at http://bit.ly/2uHuxZB
The program uses an array for each team that contains the
last 30 games encountered. As each game is read, the code looks to see if there
are 30 past games for both teams. If not the data for the current game is
inserted in the array. The data saved is a game id (used for testing purposes),
the money line, and the results of the game (1 or 0). As new data is inserted in the arrays, the previous
data is pushed down and the 30th game is pushed out. Note that the
money line saved is the modified one described in the previous post (LV line
-100).
If both teams have 30 games, then each team’s array is
scanned pulling out the games matching today’s game location, road games for
visitors and home for the home teams. I then pass this data to a linear
regression routine that I’ve written in the past. I use the resulting
co-efficients and the modified LV line for today’s game to compute each team’s
initial probability of winning. I then adjust these probabilities, so they
total 1. These and the real LV line for today are used to compute expected
returns. A line is added to a bet file for each team with some additional data.
Finally, I insert this games data in the team arrays and push the 30th
oldest game out.
When I ran the 2018 data through the process I was able to
compare the results with the sample data he gave me on the Cardinal-Cub game.
Unfortunately, they didn’t match. He had the final probabilities as 36.51% and
63.50%. I had 33.6% and 66.4%. I found the basic problen was with the 30 game
history arrays. He used a different source than I did, hence had slightly
different lines. But the results were close enough that I felt the program did
replicate the process he used reasonably well.
Now I am ready to back test using lots of data. That will be
in my next pot.
No comments:
Post a Comment