Zorkaltsev A.V.   Davydova A.V.  

Consequences of substituting probabilities with frequencies in entropy estimates

Reporter: Zorkaltsev A.V.

\documentclass[10pt]{article}


\usepackage{amsmath}
\usepackage{amsfonts,amssymb}
\usepackage{amsthm}
\usepackage[active]{srcltx}
\usepackage[utf8]{inputenc}
\usepackage[russian]{babel}

\usepackage[final]{graphicx}

\newenvironment{ltrtr}{
\vspace{0.5\baselineskip} \noindent {\footnotesize{СПИСОК ЛИТЕРАТУРЫ}} \vspace{-0.5\baselineskip}
\begin{enumerate}
\partopsep=0pt\topsep=0pt\itemsep=1pt\parsep=0pt\parskip=0pt}{\end{enumerate}}

\textwidth          13cm
\textheight         18cm
\topmargin          0mm
\oddsidemargin      5mm

\begin{document}

\setcounter{figure}{0} \setcounter{equation}{0}
\setcounter{table}{0} \setcounter{footnote}{0}

\begin{center}
\title{}{\bf  CONSEQUENCES OF SUBSTITUTING PROBABILITIES WITH FREQUENCIES IN ENTROPY ESTIMATES
}

\author{}{Zorkaltsev A.V., Davydova A. V.}

{\it Novosibirsk State University, Novosibirsk}

{\it avazork@mail.ru}

\end{center}

When calculating statistical indicators using event probabilities, frequencies of events in available limited samples are often used instead of precisely unknown probabilities. The report describes the methodology, algorithms and results of the analysis of the consequences of such substitution in relation to the calculation of the Shannon entropy index.

The basis for the study of individual situations is the repeated Monte Carlo simulation of samples of a given volume of discrete random events. The frequency of occurrence of each of the considered events is calculated for each sample. These frequencies are used as estimates of the probability of events that are used to calculate the entropy index. As a result, we get a set of calculated values of the entropy index. Based on this set, the reliability characteristics of entropy estimates are calculated when replacing the initial probabilities of event realization with the frequencies of event realization in samples of a given volume. These characteristics include (1) the mathematical expectation of entropy estimates and its offset relative to the true value; (2) the standard deviation of entropy estimates from the true value and from the mathematical expectation of entropy estimates; (3) the probability of falling within a given interval (for example, 5% or 10% of the true value and of the mathematical expectation) of the entropy estimates; (4) Entropy of entropy estimates.

The report presents the results of studies on the impact of sample sizes, the number of events and the degree of uniformity of probability distributions of events on these indicators. One of the goals of the study was to determine the minimum required sample size (for frequency estimates). The resulting estimates of expectation bias can help correct the entropy estimate.

The report is a development of the research reflected in the article \cite{1}, where the components of the described methodology, computational algorithms, possible applications and some results are presented in more detail.

\begin{ltrtr}

\bibitem{1}
{\it Zorkaltsev V. I. , Zorkaltsev A.V.} Errors in entropy estimates due to the replacement of event probabilities with frequencies in samples // System Analysis & Mathematical Modeling – 2024 – Vol.6, No.3 (in print).

\bibitem{2}
{\it Sobol I.M.} The Monte Carlo method (Popular lectures on mathematics). –M.: Nauka, 1968. - 64 p.

\bibitem{3}
{\it Shennon C.} Works on information theory and cybernetics. – M.: Publishing House of Foreign Literature, 1973. – 830 p.

\end{ltrtr}

\end{document}


To reports list

Comments

Name:
Captcha: