Introduction to Pandas

Pandas is probably the package I enjoy using the most in Python and probably the one I use most frequently.  Coming from a business background and using Excel/VBA for data analysis, Pandas was a game changer for me.  Of course there are other packages out there and other ways to manipulate data in Excel using PowerQuery but here are my top three reasons for using Pandas:

  • Efficient for large sets of data, the only limit on data is how much RAM your computer has as compared to Excel which has around one million row limit.
  • Can perform calculations much faster than excel.  Excel will slow down once you are doing calculations on tens of thousands of rows.
  • Can be used to connect multiple data sources including Oracle/SQL databases, text files, and excel data.