In this tutorial, we will learn about the following topics:
Openpyxl is a Python module to deal with Excel files without involving MS Excel application software. It is used extensively in different operations from data copying to data mining and data analysis by computer operators to data analysts and data scientists. Openpyxl is the most used module in python to handle excel files. Jun 28, 2021 How To Install Openpyxl When you install Python, Openpyxl libraries are not installed by default. We have to execute a command to get the libraries into our Python. For this, you need to open the command prompt and change the directory to the folder where your Python is placed and execute the below command. Oct 03, 2020 1 Answer1. Enter the currently selected environment in the terminal: ( shortcut keys: Ctrl+Shift+` )I use a virtual environment. Install the module ' openpyxl ' in your currently selected VScode environment: We can see the installation package of the module ' openpyxl ' under '.venv ' of the virtual environment used.
Python Openpyxl Introduction
Python provides the Openpyxl module, which is used to deal with Excel files without involving third-party Microsoft application software. By using this module, we can have control over excel without open the application. It is used to perform excel tasks such as read data from excel file, or write data to the excel file, draw some charts, accessing excel sheet, renaming sheet, modification (adding and deleting) in excel sheet, formatting, styling in the sheet, and any other task. Openpyxl is very efficient to perform these tasks for you.
Data scientists often use the Openpyxl to perform different operations such as data copying to data mining as well as data analysis.
Openpyxl Working Process
The Openpyxl library is used to write or read the data in the excel file and many other tasks. An excel file that we use for operation is called Workbook that contains a minimum of one Sheet and a maximum of tens of sheets.
- Sheets consist of Rows (horizontal series) starting from 1 and Columns (vertical series) starting from A.
- Row and column together make a grid and form a cell that may store some data. Data can be of any type, such as numeric, string.
- Openpyxl provides flexibility to read data from the individual cell or write data to it.
Installation of Openpyxl
In the above section, we have discussed the openpyxl briefly and its working process. In order to use Openpyxl, one should have Python 3.7 and openpyxl 2.6.2 installed in the system. Let's start working with openpyxl by installing openpyxl using the following command:
The xlsx is the extension of the XML spreadsheet file. The xlsx file supports macros. Let's understand the basic operation related to the excel file. Consider the following code:
Output:
In the above code, we have written data into the five cells A1, A2, A3, A4, and A5. These cells consist of different types of values. We have imported Workbook class from the openpyxl module. A workbook class is a container that contains all parts of the document.
Here we have defined a new workbook. At least one sheet is always made with a workbook.
We get the location of the active sheet.
We have saved all data to the sample_file.xlsx file using the save() method.
Openpyxl Write Data to Cell
We can add data to the excel file using the following Python code. First, we will import the load_workbook function from the openpyxl module, then create the object of the file and pass filepath as an argument. Consider the following code:
Output:
Openpyxl Append values
Openpyxl provides an append() method, which is used to append the group of values. We can append any type of value. These values are appended at the bottom of the current working sheet. Consider the following code:
Output:
Openpyxl Read Data from cell
We can read the data that we have previously written in the cell. There are two methods to read a cell, firstly we can access it by cell name, and secondly, we can access it by the cell() function. For example, we are reading the data from the sample_file.xlrs file.
Output:
Openpyxl Read multiple cells
We can read the values from the multiple cells. In the following example, we have marks.xlsx named excel file and we will read each cell of file using the range operator. Let's have a look at the following program:
Output:
Openpyxl Iterate by rows
The openpyxl provides the iter_row() function, which is used to read data corresponding to rows. Consider the following example:
Output:
Openpyxl Iterate by Column
The openpyxl provides iter_col() method which return cells from the worksheet as columns. Consider the following example:
Output:
Openpyxl Sheets
As we know that each workbook can have multiple sheets. First, we need to create more than one sheet in a single workbook then we can access those excel sheets using Python. In the following example, we have created a workbook with three sheets:
Output:
It will look like the following image.
Openpyxl filter and sort data
The auto_filter attribute is used to set filtering and sorting conditions. Consider the following code:
Output:
Openpyxl Merging cell
We can merge the cell using the merge_cells() method. When we merge the cells, the top-left one is removed from the worksheet. The openpyxl also provides the unmerged_cells() method to unmerge the cell. Consider the following code:
Output:
The freezing panes are simply mean to freeze an area of worksheet as visible while scrolling to other parts of the worksheet. It is useful feature where we can display the top row or leftmost column on the screen. We can do this by passing the cell name to the freeze_panes variable. To unfreeze all panes, set freeze_panes to 'None'. Consider the following code:
Output:
Run the above code and scroll the worksheet.
Openpyxl formulas
We can write formula into the cell. These formulas are used to perform operations in excel file. After writing in the cell execute it from the workbook. Consider the following example:
Output:
Openpyxl Cell Inverter
The openpyxl cell inverter is used to invert the row and column of the cell in the spreadsheet. For example, the value at 3 rows will be inverted into 5 columns and row at row 5 will invert to column 3 (vice versa). You can see in the following images:
This program is written with the help of nested for loop. First the data structure writes in the sheetData[x][y] for the cell at column x and row y then the newly created spreadsheet in the spreadData[y][x] for the cell at column y and row x.
Adding Chart to Excel File
Charts are effective way to represent the data. Using the charts, it visualizes the data which can be easily untestable. There are various types of chart: pie chart, line chart, bar chart, and so on. We can draw a chart on a spreadsheet using an openpyxl module.
For building any chart on the spreadsheet, we need to define the chart type like BarChart, LineChart, and so on. We also import reference which represents data that is used for the chart. It is important to define what data we want to represent on the chart. Let's understand by the following example:
Output:
In the above code, we have created the sample data and drawn the bar chart corresponding to sample data.
Now we will create the line chart. Consider the following code:
Output:
In the above code, we are used from_rows = True as a parameter, it denotes chart plot row by row instead of the column by column.
Adding Image
Images are not generally used in a spreadsheet but sometimes we can use as per our requirement. We can use an image for the branding purposes or to make the spreadsheet more personal and attractive. For loading an image to spreadsheet, we need to install an additional module called pillow by the following command.
In the following program, we are importing the image into the excel file.
In this tutorial, we have covered all basic and advance concept of openpyxl.
By Lenin Mishra
Microsoft Excel is probably one of the highly used data storage applications. A huge proportion of small to medium size businesses fulfill their analytics requirement using Excel.
However, analyzing huge amount of data in Excel can become highly tedious and time-consuming. You could build customized data processing and analytics application using Visual Basic(VBA), the language that powers the Excel sheets. However, learning VBA could be difficult and perhaps, not worth it.
However, if you have a little knowledge of Python, you could build highly professional Business Intelligence using Excel data, without the need of a database. Using Python with Excel could be a game changer for your business.
Sections Covered
- Appending data to Excel with Openpyxl
Basic Information about Excel
Before beginning this Openpyxl tutorial, you need to keep the following details in mind.
- Excel files are called Workbooks.
- Each Workbook can contain multiple sheets.
- Every sheet consists of rows starting from 1 and columns starting from A.
- Rows and columns together make up a cell.
- Any type of data can be stored.
What is Openpyxl and how to install it?
The Openpyxl module in Python is used to handle Excel fileswithout involving third-party Microsoft application software. Openpyxl is arguably, the best python excel library that allows you to perform various Excel operations and automate excel reports using Python.You can perform all kinds of tasks using Openpyxl like:-
- Reading data
- Writing data
- Editing Excel files
- Drawing graphs and charts
- Working with multiple sheets
- Sheet Styling etc.
You can install Openpyxl module by using the following command.
Reading data from Excel using Openpyxl
Let’s import an Excel file named wb1.xlsx
in Python using Openpyxl module~~~~. It has the following data as shown in the image below.
- Import the
load_workbook
method from Openpyxl.
- Provide the file location for the Excel file you want to open in Python.
In this example, the Excel file is present in the same directory as the python file. Hence, there is no need to provide to entire file location.
- Choose the first active sheet present in the workbook using
wb.active
attribute.
The above points are a standard way of accessing Excel sheets using Python. You will see them being used multiple times through out this article.
Let’s read all the data present in Row 1(header row).
Method 1 - Reading data through cell name in Excel
Code
Output
Method 2 - Reading data from Excel using cell()
method
Code
Output
Reading Multiple Cells from Excel
You can also read multiple cells from an Excel workbook.Let’s understand this through various examples.Refer to the image of the wb1.xlsx
file above for clarity.
Method 1 - Reading a range of cells in Excel using cell names
To read the data from a specific range of cells in your Excel sheet, you need to slice your sheet object through both the cells.
Code
Output
You can see that by slicing the sheet
data from A1:D11
, it returned us tuples of row data inside a tuple. In order to read the values of every cell returned, you can iterate over each row and use .value
.
Code
Output
Method 2 - Reading a single row in Excel using cell name
To read a single row in your Excel sheet, just access the single row number from your sheet
object.
Code
Output
Method 3 - Reading all rows in Excel using rows
attribute
To read all the rows, use sheet.rows
to iterate over rows with Openpyxl. You receive a tuple element per row by using the sheet.rows
attribute.
Code
Output
Method 4 - Reading a single column in Excel using cell name
Similar to reading a single row, you can read the data in a single column of your Excel sheet by its alphabet.
Code
Output
Method 5 - Reading all the columns in Excel using columns
attribute
To read all the data as a tuple of the columns in your Excel sheet, use sheet.columns
attribute to iterate over all columns with Openpyxl.
Code
Output
Install Import Openpyxl For Python In Ubuntu
Method 6 - Reading all the data in Excel
To read all the data present in your Excel sheet, you don’t need to index the sheet
object. You can just iterate over it.
Code
Output
Find the max row and column number with Openpyxl
To find the max row and column number in your Excel sheet, use sheet.max_row
and sheet.max_column
attributes.
Code
Output
Note - If you update a cell wih a value, the sheet.max_row
and sheet.max_column
values also change, even though you haven’t saved your changes.
Code
Output
iter_rows() and iter_cols() in Openpyxl
Openpyxl offers two commonly used methods to iterate over rows and column.
iter_rows()
- Returns one tuple element per row selected.iter_cols()
- Returns one tuple element per column selected.
Both the above mentioned methods can receive the following arguments for setting boundaries for iteration:
- min_row
- max_row
- min_col
- max_col
Example 1 - iter_rows()
Code
Output
As you can see, only the first 3 columns of the first 2 rows are returned. The tuples are row based.
You can also choose to not pass in some or any arguments in iter_rows
method.
Code - Not passing min_col and max_col
Output
All the columns from the first 2 rows are being printed.
What happens if you don’t pass in any arguments? Comment below.
Example 2 - iter_cols()
Code
Output
The tuples returned are column based on using iter_cols()
method.
You can also choose to not pass in some or any arguments in iter_cols()
method.
Code - Not passing any argument
Output
Create a new Excel file with Openpyxl
To create a new Excel file with Openpyxl, you need to import the Workbook
class.
Code
This should create a new Excel workbook called pylenin.xlsx
with the provided data.
Writing data to cells in Excel with Openpyxl
There are multiple ways to write data to an Excel file using Openpyxl.
Method 1 - Writing data to Excel using cell names
Code
Output
Method 2 - Writing data to Excel using the cell()
method
Code
Output
Method 3 - Writing data to Excel by iterating over rows
Code - Example 1
Output
You can also use methods like iter_rows()
and iter_cols()
to write data to Excel.
Code - Example 2 - using iter_rows()
method
Output
Code - Example 3 - using iter_cols()
method
Install Import Openpyxl For Python In Linux
Output
Appending data to Excel with Openpyxl
Openpyxl also provides an append()
method.This is used to append values to an existing Excel sheet.
Code
Output
Manipulating Sheets in Openpyxl
Each Excel workbook can contain multiple sheets.To get a list of all the sheet names in an Excel workbook, you can use the wb.sheetnames
.
Code
Output
As you can see, pylenin.xlsx
has only one sheet.
To create a new sheet, use the create_sheet()
method provided by Openpyxl.
Code
Output
You can also create sheets at different positions in the Excel Workbook.
Code
Output
If your Excel workbook contains multiple sheets and you want to work with a particular sheet, you can refer the title of that sheet in your workbook object.
Code
Output
Practical usage example of Openpyxl
Let’s perform some data analysis with wb1.xlsx
file as shown in the first image.
Objective
- Add a new column showing
Total Price per Product
. - Calculate the
Total Cost
of all the items bought.
The resulting Excel sheet should look like the below image.
Step 1 - Find the max row and max column of the Excel sheet
As mentioned before, you can use the sheet.max_row
and sheet.max_column
attributesto find the max row and max column for any Excel sheet with Openpyxl.
Code
Output
Step 2 - Add an extra column in Excel with Openpyxl
To add an extra column in the active Excel sheet, with calculations, you need to first create a new column header in the first empty cell and then iterate over all rows to multiply Quantity
with Cost per Unit
.
Code
Output
Now that an extra column header has been created, the sheet.max_column
value will change to 5.
Now you can calculate the Total Price per Product
using iter_rows()
method.
Install Import Openpyxl For Python In Excel
Code
Output
Step 3 - Calculate sum of a column in Excel with Openpyxl
The last step is to calculate the Total Cost
of the last column in the Excel file.
Access the last column and add up all the cost.
You can read the last column by accessing the
sheet.columns
attribute.Since it returns a generator, you first convert it to apython list
and access the last column.Create a new row 2 places down from the max_row and fill in
Total Cost
.Import
Font
class fromopenpyxl.styles
to make the last row Bold.
Final Code looks like the below.
Code
When you run the above code, you should see all the relevant updates to your Excel sheet.
In this Openpyxl tutorial, we learnt about various ways to handle Excel files in Python. If you have any doubts, post your doubts in the comment section below.