Data Wrangling with Python: Simplify your ETL processes with these hands-on data sanitation tips, tricks and best practices

Data Wrangling with Python: Simplify your ETL processes with these hands-on data sanitation tips, tricks and best practices
Data Wrangling with Python: Simplify your ETL processes with these hands-on data sanitation tips, tricks and best practices by Tirthajyoti Sarkar
English | 2019 | ISBN: 1789800111 | 460 Pages | EPUB | 72 MB

Data is the new oil but it comes as crude, just like oil. To do anything meaningful – modeling, visualization, machine learning, for predictive analysis – you first need to wrestle and wrangle with data. This book teaches the essential basics of data wrangling using Python.
To practice high-quality science with data, first you need to make sure it is properly sourced, cleaned, formatted, and pre-processed. This book teaches you the most essential basics of this invaluable component of the data science pipeline – data wrangling.
What you will learn

  • Able to manipulate complex and simple data structure using Python and it’s built-in functions
  • Use the fundamental and advanced level of Pandas DataFrames and numpy.array
  • Manipulate them at run time
  • Extract and format data from various formats (textual) – normal text file, SQL, CSV, Excel, JSON, and XML
  • Perform web scraping using Python libraries such as BeautifulSoup4 and html5lib
  • Perform advanced string search and manipulation using Python and RegEX
  • Handle outliers, apply advanced programming tricks, and perform data imputation using Pandas
  • Basic descriptive statistics and plotting techniques in Python for quick examination of data
  • Practice data wrangling and modeling using the random data generation techniques