Here is the general workflow I used to complete a recent data merge and cleanup project. TASKs: Merge data from 5 different data structures (Approximately 4Gb in 148 files) Extract business related to commercial painting Format data Create consolidated Master file Create separate files divided by State where business is located Tools: Python’s Pandas libraryContinue reading “Business Data Merge and Cleanup”
Category Archives: Data
Data Cleanup in Python
This notebook shows the initial data cleanup workflow for a capstone project in fulfilment of Springboard’s Data Science Track Program. The data was retrieved from the National Transportation and Safety Board website. The original data resides in a 20-table MS Access database. The pertinent information was exported to Coma-Separated Value (CSV) files utilizing Access’ queryContinue reading “Data Cleanup in Python”
Merging EXCEL Data
Data Integration This notebook highlights the process of integrating data from four different data sources (Excel files) onto a master file. For efficiency, the data manipulation and cleanup is done with Python. After the processing is done, a master Excel file is exported. Raw data files and table definitions can be found at https://www.kaggle.com/anikannal/solar-power-generation-data **Continue reading “Merging EXCEL Data”