Collecting data
This should be somewhat obvious—without (at least some) data, we cannot perform any of the subsequent steps (although one might argue the point of inference, that would be inappropriate. There is no magic in data science. We, as data scientists, don't make something from anything. Inference (which we'll define later in this chapter) requires at least some data to begin with.
Some new concepts for collecting data include the fact that data can be collected from ample of sources, and the number and types of data sources continue to grow daily. In addition, how data is collected might require a perspective new to a data developer; data for data science isn't always sourced from a relational database, rather from machine-generated logging files, online surveys, performance statistics, and so on; again, the list is ever evolving.
Another point to ponder—collecting data also involves supplementation. For example, a data scientist might determine that he or she needs to be adding additional demographics to a particular pool of application data previously collected, processed, and reviewed.