Hands-On Python Deep Learning for the Web
上QQ阅读APP看书,第一时间看更新

Data preparation

After the data collection phase, we tend to prepare the data to feed it to the ML systems and this is known as data preparation. It is worth mentioning that this is the most time-consuming part of an ML workflow/pipeline. Data preparation includes a series of steps and they are as follows:

  • Exploratory data analysis
  • Data processing and wrangling
  • Feature engineering and extraction
  • Feature scaling and selection
This is one of the most time-consuming parts of an ML project. When we take a broader look at the process, we find that data identification and collection are also sometimes really important aspects as the correct format, as mentioned previously, might not always be available.