Automate Excel Data: Batch Convert XLSX to Fixed Width

Written by

in

Automate Excel Data: Batch Convert XLSX to Width Manual data entry drains company resources. Converting multiple Excel files into fixed-width text formats often leads to costly formatting errors. Automation solves this bottleneck by ensuring perfect structural alignment across thousands of rows instantly.

This guide provides a production-ready Python solution to automate your entire spreadsheet conversion pipeline. Why Use Fixed-Width Formats?

Many legacy systems, financial networks, and government databases require flat files with strict character limits per column. Unlike CSVs, fixed-width files do not use separators like commas. Instead, they rely on precise padding to keep data columns perfectly aligned vertically. The Python Automation Blueprint

This script utilizes the Pandas library to process data and the built-in Pathlib module to manage file directories. It automatically reads every .xlsx file in a source folder, applies custom column widths, right-pads the data with spaces, and exports clean text files. Step 1: Install Required Libraries

Run this command in your terminal to install the necessary Excel processing engine: pip install pandas openpyxl Use code with caution. Step 2: The Batch Conversion Script Save the following code as batch_converter.py.

import os from pathlib import Path import pandas as pd # Define column specs: (Column Name, Total Character Width) COLUMN_LAYOUT = [ (“AccountID”, 10), (“CustomerName”, 30), (“TransactionDate”, 8), (“Amount”, 12), ] def convert_xlsx_to_fixed_width(file_path, output_dir, layout): “”“Converts a single Excel file to a fixed-width text file.”“” try: # Read the Excel file df = pd.read_excel(file_path, dtype=str) df.fillna(“”, inplace=True) fixed_width_lines = [] # Process each row for _, row in df.iterrows(): line = “” for col_name, width in layout: # Extract value or use empty string if column is missing val = str(row.get(col_name, “”)) # Truncate if value exceeds width, otherwise pad with spaces formatted_val = val[:width].ljust(width) line += formatted_val fixed_width_lines.append(line) # Save output file output_filename = file_path.stem + “.txt” output_path = output_dir / output_filename with open(output_path, “w”, encoding=“utf-8”) as f: f.write(” “.join(fixed_width_lines)) print(True, f”Successfully converted: {file_path.name}“) except Exception as e: print(False, f”Error processing {file_path.name}: {str(e)}“) def batch_process(source_folder, target_folder, layout): “”“Scans folder and processes all XLSX files.”“” src_dir = Path(source_folder) tgt_dir = Path(target_folder) tgt_dir.mkdir(parents=True, exist_ok=True) xlsx_files = list(src_dir.glob(“*.xlsx”)) if not xlsx_files: print(False, “No .xlsx files found in the source directory.”) return print(True, f”Found {len(xlsx_files)} files. Starting batch conversion…“) for file_path in xlsx_files: convert_xlsx_to_fixed_width(file_path, tgt_dir, layout) if name == “main”: # Configure your directories here SOURCE_DIRECTORY = “./excel_inputs” OUTPUT_DIRECTORY = “./text_outputs” batch_process(SOURCE_DIRECTORY, OUTPUT_DIRECTORY, COLUMN_LAYOUT) Use code with caution. Key Mechanics of the Script

String Preservation: Loading data using dtype=str prevents Pandas from stripping leading zeros out of account numbers or zip codes.

Missing Data Handling: The .fillna(“”) method replaces blank spreadsheet cells with empty strings to prevent script crashes.

String Manipulation: The .ljust(width) function appends spaces to the right side of the data until it matches the required column length exactly.

Safety Overwrite: The val[:width] snippet truncates data that exceeds the safety boundary, preventing shifting errors in downstream systems. Production Implementation Tips

To maximize the efficiency of your new data pipeline, implement these operational guardrails:

Automate Orchestration: Use Windows Task Scheduler or a Linux Cron job to run the script automatically every night.

Add Data Validation: Check data lengths before exporting to generate error logs for values that face truncation.

Archive Source Files: Program the script to move processed Excel files into an archive folder to avoid double-processing.

To help refine this automation script for your specific workflow, please tell me: What are your exact column names and character widths?

Do any columns require left-padding with zeros (like account numbers or financial amounts)?

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *