Ole Bill's Sports Analysis to Support Wagering Decisions: July 2018

Saturday, July 28, 2018

Team regression initial results

I’ve completed the first test of the replicated code for the Team Regression project and the results don’t look particularly good. Following is what I got for the first part of 2018 grouped by expected return.

2018	Number	Amt Bet	Net	Actual Ret/$
1.00 to 1.05	149	$18,874	($1,900)	$0.899
1.05 to 1.10	139	$17,800	$206	$1.012
1.10 to 1.15	110	$15,013	($137)	$0.991
1.15 to 1.20	112	$14,552	($575)	$0.960
1.20 to 1.25	80	$9,701	$1,287	$1.133
1.25 above	348	$44,007	($1,819)	$0.959
Total	938	$119,947	($2,938)	$0.976

These are based on $100 bets, more for favorites. The return isn’t much better than what you’d get throwing darts. Those with expected returns over $1.25 are really bad. Probably big outliers. I believe the creator of this approach is now using Kelly to size his bets which won’t help any. There just isn’t any consistency in the different ranges to make one believe this would help.

I also ran the code over the last 8 seasons with the following results.

8 Years Worth	Number	Amt Bet	Net	Actual Ret/$
2010	1,895	$241,320	-$5,848	$0.976
2011	1,900	$236,653	-$3,628	$0.985
2012	1,901	$234,241	$329	$1.001
2013	1,899	$240,687	-$6,102	$0.975
2014	1,873	$228,414	-$4,331	$0.981
2015	1,899	$238,874	-$7,042	$0.971
2016	1,870	$240,407	-$3,824	$0.984
2017	1,883	$244,158	-$6,994	$0.971
Total	15,120	$1,904,754	-$37,440	$0.980

There was a small profit in one season which demonstrates a problem with any system. It might prove profitable over a short run, but that’s due to random variance and not necessarily a good indicator.

I’m a bit concerned that my code did not quite get the results he sent in his sample. While I attributed this to slightly different historical results for the teams involved, maybe there is a bigger problem. I will contact the creator of this approach to see if I can get some more samples.

I’m not yet ready to give up on this project. Follow me on Twitter, @ole44bill, to know when I make the next post.

Monday, July 23, 2018

Replicating the approach in code

The first real step in the analysis is to replicate his system and compare the results I get with the sample he sent me. I’m calling the system “Team Regression”. His sample included the recent Cardinal-Cub game which first requires looking at each’s past 30 games. Unfortunately, my files are all past seasons. The first step was to create a file with 2018 MLB results. I downloaded data and created a simple file like the one I described in earlier posts and shared with anyone who wanted it. The 2017 simple file can be found at http://bit.ly/2uHuxZB

The program uses an array for each team that contains the last 30 games encountered. As each game is read, the code looks to see if there are 30 past games for both teams. If not the data for the current game is inserted in the array. The data saved is a game id (used for testing purposes), the money line, and the results of the game (1 or 0). As new data is inserted in the arrays, the previous data is pushed down and the 30^th game is pushed out. Note that the money line saved is the modified one described in the previous post (LV line -100).

If both teams have 30 games, then each team’s array is scanned pulling out the games matching today’s game location, road games for visitors and home for the home teams. I then pass this data to a linear regression routine that I’ve written in the past. I use the resulting co-efficients and the modified LV line for today’s game to compute each team’s initial probability of winning. I then adjust these probabilities, so they total 1. These and the real LV line for today are used to compute expected returns. A line is added to a bet file for each team with some additional data. Finally, I insert this games data in the team arrays and push the 30^th oldest game out.

When I ran the 2018 data through the process I was able to compare the results with the sample data he gave me on the Cardinal-Cub game. Unfortunately, they didn’t match. He had the final probabilities as 36.51% and 63.50%. I had 33.6% and 66.4%. I found the basic problen was with the 30 game history arrays. He used a different source than I did, hence had slightly different lines. But the results were close enough that I felt the program did replicate the process he used reasonably well.

Now I am ready to back test using lots of data. That will be in my next pot.

Ole Bill's Sports Analysis to Support Wagering Decisions

Saturday, July 28, 2018

Team regression initial results

Monday, July 23, 2018

Replicating the approach in code

Run line analysis update

Report Abuse