1224 Readings

April 19th 2022

Explaining the algorithm

By Francisco Pascual Romero (Coordinator of the U. Master in Computer Engineering)

This week it is news that there have been, unfortunately, 26 deaths on Spanish roads. This improves the forecasts of Big Data, or rather of the artificial intelligence algorithms that make use of Big Data, which were 36.

Several questions arise from this prediction, first of all, an absolute figure is not normally predicted or forecast, the results of a prediction are usually somewhat more complex, for example, a degree of confidence of the figure is offered or in which interval it is more reliable that figure. A single number is good data for the audience but not really useful.

On the other hand, on many occasions the process of calculating the number is more valuable than the number itself. Presumably, variables such as the time series, meteorological circumstances, etc. have been taken into account to obtain that "36". Knowing the factors that influence the prediction is one of the most valuable elements that exist to help decision making.

Finally, there is the distortion of the prediction itself. When you predict something about some phenomenon you already affect it. This is something I learned from the teacher Jose Angel Olivas more than 20 years ago and it is still valid. For example, the DGT uses the figure to condition our behavior through an advertising campaign, trying to make us be more careful at the wheel. That is, the "big data" is used so that the "big data" does not get it right, since the important thing is not to get it right but to reduce the number of deaths on the road.

Another example: Nadal at the Australian Open.

Everyone saw the image of Nadal's 4% chance of victory against Medveveed in the final of the last Australian Open. Subsequent headlines resembled “Nadal Beats Big Data” etc. A few weeks later Marco Asensio scores a goal that, according to the model shown on television, had a 10% chance of being a goal. Subsequently, identical headlines, possibly what happens to us is that when we point to the moon we look at the finger and we do not realize what is behind those numbers.

Probability vs Possibility. A 4% probability indicates that Nadal's victory was possible and more so with the tennis scoring system. It is not an impossible event, it is rare, but it is not impossible. Let's stop for a moment, every December 21st we think that the next day he is going to win the lottery jackpot, and that with a probability of 0,001%, and we always consider it possible.

How does the model calculate that number? In the first place, an a priori probability is used in which Nadal had a 36% probability of winning the match. Normally these a priori models are based on the rankings (higher than Medveded's), past streaks, playing surface, last games, etc. Then there is the future of the party; Taking a simple model that analyzes similar historical situations, only in 4 of 100 matches to 5 sets had a result like the one been traced back. If we focus on Nadal, he had only come from 2 games out of 20 in that situation. So the number is justified, but does it give us enough information?

How do you get to that number? A figure is an instant, and valuing an instant we lose part of the story. Let's remember Heisenberg's Uncertainty Principle: if we value a variable very precisely, we are losing perspective of the rest. On the other hand, reaching that 4% from 1% is not the same as having dropped from 32%. And not only that, it is not only necessary to evaluate that number, but also how it evolved later during the match, and how different events such as saving a break ball or having a break above in the third set can greatly change that probability of victory. These events, knowing and analyzing them are the key in this analysis.

How can I change that probability? This is the useful part of the number, if you tell a player at a given moment that he has a 4% chance of victory you are not giving him anything new, he knows that by looking at the scoreboard. The important thing to contribute is what is happening in your game that is leading you to defeat and how you can change that dynamic. That is to say, the useful thing would be to say: “look Rafa, he is winning all the points you throw with the second serve, and you don't have any double faults, keep that in mind” or “of the services that he throws you the other way around you are losing 90% and with rallies of less than 3 balls”. This is something that can be useful to the player and the coach, that is the true usefulness of the application of these algorithms and models.

In conclusion, the isolated numbers are those numbers, but they need a context and a detailed analysis, there is no need for any “numerical” interpretation of them. Algorithms and models can provide much more information and relevant knowledge that allows you to understand behaviors and how to obtain better results.

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category.
cookielawinfo checkbox analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo checkbox functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-Necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies are used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-fastrs	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo checkbox performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not the user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_660H2MJ19C	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_199993715_1	1 minutes	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
CONSENT	2 years	YouTube sets this cookie via embedded youtube videos and registers anonymous statistical data.

Cookie	Duration	Description
VISITOR_INFO1_LIVE	5 months 27 days	A Cookie Set by YouTube to Measure Bandwidth That Determine Whether The User Gets The New Or Old Player Interface.
YSC	Session	YSC Cookie is set by youtube and is used to track the views of embedded Videos on YouTube Pages.
yt remote connected devices	never	YouTube Sets This Cookie To Store The Video Preferences of The User Using Embedded YouTube Video.
yt remote device id	never	YouTube Sets This Cookie To Store The Video Preferences of The User Using Embedded YouTube Video.

Explaining the algorithm

By Francisco Pascual Romero (Coordinator of the U. Master in Computer Engineering)

ESI Library, the 3+2 books recommended by Jesús Fontecha

Eusebio Angulo bronze in the national university championship

Beginning of the 3rd edition of the programming workshop with Educational Minecraft for pre-university students

Open meeting and data science with R

Summary Video – Open Day 2024

Advanced Training Course in Integrated Project Management

TFE Defenses – February 2024

2nd Semester Schedules

January 2024 Exams

XIX FORTE call

More than one million euros for the Chip UCLM chair

2024 national ranking of best IS researchers

Technologies for Inclusion Conference

COE and VR Commerce Students

J. Antonio de la Torre, Doctor in Computer Engineering

News

Coral Calero Lifetime Achievement Award...

Spanish/Bilingual Degree in Information...

ESI Library, the 3+2 books collected...

Contact

Explaining the algorithm

Explaining the algorithm

By Francisco Pascual Romero (Coordinator of the U. Master in Computer Engineering)

Rate this item

News

Contact