Statistics for Data Science
上QQ阅读APP看书,第一时间看更新

Statistical inference

What developer at some point in his or her career, had to create a sample or test data? For example, I've often created a simple script to generate a random number (based upon the number of possible options or choices) and then used that number as the selected option (in my test recordset). This might work well for data development, but with statistics and data science, this is not sufficient.

To create sample data (or a sample population), the data scientist will use a process called statistical inference, which is the process of deducing options of an underlying distribution through analysis of the data you have or are trying to generate for. The process is sometimes called inferential statistical analysis and includes testing various hypotheses and deriving estimates.

When the data scientist determines that a recordset (or population) should be larger than it actually is, it is assumed that the recordset is a sample from a larger population, and the data scientist will then utilize statistical inference to make up the difference.

The data or recordset in use is referred to by the data scientist as the observed data. Inferential statistics can be contrasted with descriptive statistics, which is only concerned with the properties of the observed data and does not assume that the recordset came from a larger population.