Install Import Openpyxl For Python In

In this tutorial, we will learn about the following topics:

Openpyxl is a Python module to deal with Excel files without involving MS Excel application software. It is used extensively in different operations from data copying to data mining and data analysis by computer operators to data analysts and data scientists. Openpyxl is the most used module in python to handle excel files. Jun 28, 2021 How To Install Openpyxl When you install Python, Openpyxl libraries are not installed by default. We have to execute a command to get the libraries into our Python. For this, you need to open the command prompt and change the directory to the folder where your Python is placed and execute the below command. Oct 03, 2020 1 Answer1. Enter the currently selected environment in the terminal: ( shortcut keys: Ctrl+Shift+` )I use a virtual environment. Install the module ' openpyxl ' in your currently selected VScode environment: We can see the installation package of the module ' openpyxl ' under '.venv ' of the virtual environment used.

Python Openpyxl Introduction

Python provides the Openpyxl module, which is used to deal with Excel files without involving third-party Microsoft application software. By using this module, we can have control over excel without open the application. It is used to perform excel tasks such as read data from excel file, or write data to the excel file, draw some charts, accessing excel sheet, renaming sheet, modification (adding and deleting) in excel sheet, formatting, styling in the sheet, and any other task. Openpyxl is very efficient to perform these tasks for you.

Data scientists often use the Openpyxl to perform different operations such as data copying to data mining as well as data analysis.

Openpyxl Working Process

The Openpyxl library is used to write or read the data in the excel file and many other tasks. An excel file that we use for operation is called Workbook that contains a minimum of one Sheet and a maximum of tens of sheets.

  • Sheets consist of Rows (horizontal series) starting from 1 and Columns (vertical series) starting from A.
  • Row and column together make a grid and form a cell that may store some data. Data can be of any type, such as numeric, string.
  • Openpyxl provides flexibility to read data from the individual cell or write data to it.

Installation of Openpyxl

In the above section, we have discussed the openpyxl briefly and its working process. In order to use Openpyxl, one should have Python 3.7 and openpyxl 2.6.2 installed in the system. Let's start working with openpyxl by installing openpyxl using the following command:

The xlsx is the extension of the XML spreadsheet file. The xlsx file supports macros. Let's understand the basic operation related to the excel file. Consider the following code:

Output:

In the above code, we have written data into the five cells A1, A2, A3, A4, and A5. These cells consist of different types of values. We have imported Workbook class from the openpyxl module. A workbook class is a container that contains all parts of the document.

Here we have defined a new workbook. At least one sheet is always made with a workbook.

We get the location of the active sheet.

We have saved all data to the sample_file.xlsx file using the save() method.

Openpyxl Write Data to Cell

We can add data to the excel file using the following Python code. First, we will import the load_workbook function from the openpyxl module, then create the object of the file and pass filepath as an argument. Consider the following code:

Install openpyxl for python 2.7

Output:

Openpyxl Append values

Openpyxl provides an append() method, which is used to append the group of values. We can append any type of value. These values are appended at the bottom of the current working sheet. Consider the following code:

Output:

Openpyxl Read Data from cell

We can read the data that we have previously written in the cell. There are two methods to read a cell, firstly we can access it by cell name, and secondly, we can access it by the cell() function. For example, we are reading the data from the sample_file.xlrs file.

Output:

Openpyxl Read multiple cells

We can read the values from the multiple cells. In the following example, we have marks.xlsx named excel file and we will read each cell of file using the range operator. Let's have a look at the following program:

Output:

Openpyxl Iterate by rows

The openpyxl provides the iter_row() function, which is used to read data corresponding to rows. Consider the following example:

Output:

Openpyxl Iterate by Column

The openpyxl provides iter_col() method which return cells from the worksheet as columns. Consider the following example:

Output:

Openpyxl Sheets

As we know that each workbook can have multiple sheets. First, we need to create more than one sheet in a single workbook then we can access those excel sheets using Python. In the following example, we have created a workbook with three sheets:

Output:

It will look like the following image.

Openpyxl filter and sort data

The auto_filter attribute is used to set filtering and sorting conditions. Consider the following code:

Output:

Openpyxl Merging cell

We can merge the cell using the merge_cells() method. When we merge the cells, the top-left one is removed from the worksheet. The openpyxl also provides the unmerged_cells() method to unmerge the cell. Consider the following code:

Output:

The freezing panes are simply mean to freeze an area of worksheet as visible while scrolling to other parts of the worksheet. It is useful feature where we can display the top row or leftmost column on the screen. We can do this by passing the cell name to the freeze_panes variable. To unfreeze all panes, set freeze_panes to 'None'. Consider the following code:

Output:

Run the above code and scroll the worksheet.

Openpyxl formulas

We can write formula into the cell. These formulas are used to perform operations in excel file. After writing in the cell execute it from the workbook. Consider the following example:

Output:

Openpyxl Cell Inverter

The openpyxl cell inverter is used to invert the row and column of the cell in the spreadsheet. For example, the value at 3 rows will be inverted into 5 columns and row at row 5 will invert to column 3 (vice versa). You can see in the following images:


This program is written with the help of nested for loop. First the data structure writes in the sheetData[x][y] for the cell at column x and row y then the newly created spreadsheet in the spreadData[y][x] for the cell at column y and row x.

Adding Chart to Excel File

Charts are effective way to represent the data. Using the charts, it visualizes the data which can be easily untestable. There are various types of chart: pie chart, line chart, bar chart, and so on. We can draw a chart on a spreadsheet using an openpyxl module.

For building any chart on the spreadsheet, we need to define the chart type like BarChart, LineChart, and so on. We also import reference which represents data that is used for the chart. It is important to define what data we want to represent on the chart. Let's understand by the following example:

Output:

In the above code, we have created the sample data and drawn the bar chart corresponding to sample data.

Now we will create the line chart. Consider the following code:

Output:

In the above code, we are used from_rows = True as a parameter, it denotes chart plot row by row instead of the column by column.

Adding Image

Images are not generally used in a spreadsheet but sometimes we can use as per our requirement. We can use an image for the branding purposes or to make the spreadsheet more personal and attractive. For loading an image to spreadsheet, we need to install an additional module called pillow by the following command.

In the following program, we are importing the image into the excel file.

In this tutorial, we have covered all basic and advance concept of openpyxl.


By Lenin Mishra

Microsoft Excel is probably one of the highly used data storage applications. A huge proportion of small to medium size businesses fulfill their analytics requirement using Excel.

However, analyzing huge amount of data in Excel can become highly tedious and time-consuming. You could build customized data processing and analytics application using Visual Basic(VBA), the language that powers the Excel sheets. However, learning VBA could be difficult and perhaps, not worth it.

However, if you have a little knowledge of Python, you could build highly professional Business Intelligence using Excel data, without the need of a database. Using Python with Excel could be a game changer for your business.

Sections Covered

  1. Appending data to Excel with Openpyxl

Basic Information about Excel

Before beginning this Openpyxl tutorial, you need to keep the following details in mind.

  1. Excel files are called Workbooks.
  2. Each Workbook can contain multiple sheets.
  3. Every sheet consists of rows starting from 1 and columns starting from A.
  4. Rows and columns together make up a cell.
  5. Any type of data can be stored.

What is Openpyxl and how to install it?

The Openpyxl module in Python is used to handle Excel fileswithout involving third-party Microsoft application software. Openpyxl is arguably, the best python excel library that allows you to perform various Excel operations and automate excel reports using Python.You can perform all kinds of tasks using Openpyxl like:-

  1. Reading data
  2. Writing data
  3. Editing Excel files
  4. Drawing graphs and charts
  5. Working with multiple sheets
  6. Sheet Styling etc.

You can install Openpyxl module by using the following command.

Reading data from Excel using Openpyxl

Let’s import an Excel file named wb1.xlsx in Python using Openpyxl module~~~~. It has the following data as shown in the image below.

  1. Import the load_workbook method from Openpyxl.
  1. Provide the file location for the Excel file you want to open in Python.

In this example, the Excel file is present in the same directory as the python file. Hence, there is no need to provide to entire file location.

  1. Choose the first active sheet present in the workbook using wb.active attribute.

The above points are a standard way of accessing Excel sheets using Python. You will see them being used multiple times through out this article.

Let’s read all the data present in Row 1(header row).

Method 1 - Reading data through cell name in Excel

Code

Output

Method 2 - Reading data from Excel using cell() method

Code

Output

Reading Multiple Cells from Excel

You can also read multiple cells from an Excel workbook.Let’s understand this through various examples.Refer to the image of the wb1.xlsx file above for clarity.

Method 1 - Reading a range of cells in Excel using cell names

To read the data from a specific range of cells in your Excel sheet, you need to slice your sheet object through both the cells.

Code

Output

You can see that by slicing the sheet data from A1:D11, it returned us tuples of row data inside a tuple. In order to read the values of every cell returned, you can iterate over each row and use .value.

Code

Output

Method 2 - Reading a single row in Excel using cell name

To read a single row in your Excel sheet, just access the single row number from your sheet object.

Code

Output

Method 3 - Reading all rows in Excel using rows attribute

To read all the rows, use sheet.rows to iterate over rows with Openpyxl. You receive a tuple element per row by using the sheet.rows attribute.

Code

Output

Method 4 - Reading a single column in Excel using cell name

Similar to reading a single row, you can read the data in a single column of your Excel sheet by its alphabet.

Code

Output

Method 5 - Reading all the columns in Excel using columns attribute

To read all the data as a tuple of the columns in your Excel sheet, use sheet.columns attribute to iterate over all columns with Openpyxl.

Code

Output

Install Import Openpyxl For Python In Ubuntu

Method 6 - Reading all the data in Excel

To read all the data present in your Excel sheet, you don’t need to index the sheet object. You can just iterate over it.

Code

Output

Find the max row and column number with Openpyxl

To find the max row and column number in your Excel sheet, use sheet.max_row and sheet.max_column attributes.

Code

Output

Note - If you update a cell wih a value, the sheet.max_row and sheet.max_column values also change, even though you haven’t saved your changes.

Code

Output

iter_rows() and iter_cols() in Openpyxl

Openpyxl offers two commonly used methods to iterate over rows and column.

  1. iter_rows() - Returns one tuple element per row selected.
  2. iter_cols() - Returns one tuple element per column selected.

Both the above mentioned methods can receive the following arguments for setting boundaries for iteration:

  • min_row
  • max_row
  • min_col
  • max_col

Example 1 - iter_rows()

Code

Output

As you can see, only the first 3 columns of the first 2 rows are returned. The tuples are row based.

You can also choose to not pass in some or any arguments in iter_rows method.

Code - Not passing min_col and max_col

Output

All the columns from the first 2 rows are being printed.

What happens if you don’t pass in any arguments? Comment below.

Example 2 - iter_cols()

Code

Output

The tuples returned are column based on using iter_cols() method.

You can also choose to not pass in some or any arguments in iter_cols() method.

Code - Not passing any argument

Output

Create a new Excel file with Openpyxl

To create a new Excel file with Openpyxl, you need to import the Workbook class.

Code

This should create a new Excel workbook called pylenin.xlsx with the provided data.

Writing data to cells in Excel with Openpyxl

There are multiple ways to write data to an Excel file using Openpyxl.

Method 1 - Writing data to Excel using cell names

Code

Output

Method 2 - Writing data to Excel using the cell() method

Code

Output

Method 3 - Writing data to Excel by iterating over rows

Code - Example 1

Output

You can also use methods like iter_rows() and iter_cols() to write data to Excel.

Code - Example 2 - using iter_rows() method

Output

Code - Example 3 - using iter_cols() method

Install Import Openpyxl For Python In Linux

Output

Appending data to Excel with Openpyxl

Openpyxl also provides an append() method.This is used to append values to an existing Excel sheet.

Code

Output

Manipulating Sheets in Openpyxl

Each Excel workbook can contain multiple sheets.To get a list of all the sheet names in an Excel workbook, you can use the wb.sheetnames.

Code

Output

As you can see, pylenin.xlsx has only one sheet.

To create a new sheet, use the create_sheet() method provided by Openpyxl.

Code

Output

You can also create sheets at different positions in the Excel Workbook.

Code

Output

If your Excel workbook contains multiple sheets and you want to work with a particular sheet, you can refer the title of that sheet in your workbook object.

Code

Output

Practical usage example of Openpyxl

Let’s perform some data analysis with wb1.xlsx file as shown in the first image.

Objective

  1. Add a new column showing Total Price per Product.
  2. Calculate the Total Cost of all the items bought.

The resulting Excel sheet should look like the below image.

Step 1 - Find the max row and max column of the Excel sheet

As mentioned before, you can use the sheet.max_row and sheet.max_column attributesto find the max row and max column for any Excel sheet with Openpyxl.

Code

Output

Step 2 - Add an extra column in Excel with Openpyxl

To add an extra column in the active Excel sheet, with calculations, you need to first create a new column header in the first empty cell and then iterate over all rows to multiply Quantity with Cost per Unit.

Code

Output

Now that an extra column header has been created, the sheet.max_column value will change to 5.

Now you can calculate the Total Price per Product using iter_rows() method.

Install Import Openpyxl For Python In Excel

Code

Output

Step 3 - Calculate sum of a column in Excel with Openpyxl

The last step is to calculate the Total Cost of the last column in the Excel file.

  1. Access the last column and add up all the cost.

    You can read the last column by accessing the sheet.columns attribute.Since it returns a generator, you first convert it to a python list and access the last column.

  2. Create a new row 2 places down from the max_row and fill in Total Cost.

  3. Import Font class from openpyxl.styles to make the last row Bold.

Final Code looks like the below.

Code

When you run the above code, you should see all the relevant updates to your Excel sheet.

In this Openpyxl tutorial, we learnt about various ways to handle Excel files in Python. If you have any doubts, post your doubts in the comment section below.

Related Reading