Come salvare un nuovo foglio in un file excel esistente, usando Pandas?

Question 1

Voglio utilizzare file excel per memorizzare i dati elaborati con python. Il mio problema è che non riesco ad aggiungere fogli a un file excel esistente. Qui suggerisco un codice di esempio con cui lavorare per raggiungere questo problema

import pandas as pd
import numpy as np

path = r"C:\Users\fedel\Desktop\excelData\PhD_data.xlsx"

x1 = np.random.randn(100, 2)
df1 = pd.DataFrame(x1)

x2 = np.random.randn(100, 2)
df2 = pd.DataFrame(x2)

writer = pd.ExcelWriter(path, engine = 'xlsxwriter')
df1.to_excel(writer, sheet_name = 'x1')
df2.to_excel(writer, sheet_name = 'x2')
writer.save()
writer.close()

Questo codice salva due DataFrame in due fogli, denominati rispettivamente "x1" e "x2". Se creo due nuovi DataFrame e provo a utilizzare lo stesso codice per aggiungere due nuovi fogli, "x3" e "x4", i dati originali vengono persi.

import pandas as pd
import numpy as np

path = r"C:\Users\fedel\Desktop\excelData\PhD_data.xlsx"

x3 = np.random.randn(100, 2)
df3 = pd.DataFrame(x3)

x4 = np.random.randn(100, 2)
df4 = pd.DataFrame(x4)

writer = pd.ExcelWriter(path, engine = 'xlsxwriter')
df3.to_excel(writer, sheet_name = 'x3')
df4.to_excel(writer, sheet_name = 'x4')
writer.save()
writer.close()

Voglio un file Excel con quattro fogli: "x1", "x2", "x3", "x4". So che "xlsxwriter" non è l'unico "motore", esiste "openpyxl". Ho anche visto che ci sono già altre persone che hanno scritto su questo problema, ma ancora non riesco a capire come farlo.

Qui un codice tratto da questo link

import pandas
from openpyxl import load_workbook

book = load_workbook('Masterfile.xlsx')
writer = pandas.ExcelWriter('Masterfile.xlsx', engine='openpyxl') 
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)

data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])

writer.save()

Dicono che funzioni, ma è difficile capire come. Non capisco cosa siano "ws.title", "ws" e "dict" in questo contesto.

Qual è il modo migliore per salvare "x1" e "x2", quindi chiudere il file, aprirlo di nuovo e aggiungere "x3" e "x4"?

Question 2

Grazie. Credo che un esempio completo potrebbe essere utile per chiunque altro abbia lo stesso problema:

import pandas as pd
import numpy as np

path = r"C:\Users\fedel\Desktop\excelData\PhD_data.xlsx"

x1 = np.random.randn(100, 2)
df1 = pd.DataFrame(x1)

x2 = np.random.randn(100, 2)
df2 = pd.DataFrame(x2)

writer = pd.ExcelWriter(path, engine = 'xlsxwriter')
df1.to_excel(writer, sheet_name = 'x1')
df2.to_excel(writer, sheet_name = 'x2')
writer.save()
writer.close()

Qui genero un file excel, dalla mia comprensione non importa se è generato tramite il motore "xslxwriter" o "openpyxl".

Quando voglio scrivere senza perdere i dati originali, allora

import pandas as pd
import numpy as np
from openpyxl import load_workbook

path = r"C:\Users\fedel\Desktop\excelData\PhD_data.xlsx"

book = load_workbook(path)
writer = pd.ExcelWriter(path, engine = 'openpyxl')
writer.book = book

x3 = np.random.randn(100, 2)
df3 = pd.DataFrame(x3)

x4 = np.random.randn(100, 2)
df4 = pd.DataFrame(x4)

df3.to_excel(writer, sheet_name = 'x3')
df4.to_excel(writer, sheet_name = 'x4')
writer.save()
writer.close()

questo codice fa il lavoro!

Question 3

Nell'esempio che hai condiviso stai caricando il file esistente booke impostando il writer.bookvalore su book. Nella riga writer.sheets = dict((ws.title, ws) for ws in book.worksheets)stai accedendo a ogni foglio della cartella di lavoro come ws. Il titolo del foglio è quindi in wsmodo da creare un dizionario di {sheet_titles: sheet}coppie chiave e valore. Questo dizionario viene quindi impostato su writer.sheets. Essenzialmente questi passaggi stanno solo caricando i dati esistenti 'Masterfile.xlsx'e popolando il tuo writer con essi.

Ora diciamo che hai già un file con x1e x2come fogli. Puoi usare il codice di esempio per caricare il file e quindi potresti fare qualcosa di simile per aggiungere x3e x4.

path = r"C:\Users\fedel\Desktop\excelData\PhD_data.xlsx"
writer = pd.ExcelWriter(path, engine='openpyxl')
df3.to_excel(writer, 'x3', index=False)
df4.to_excel(writer, 'x4', index=False)
writer.save()

Dovrebbe fare quello che stai cercando.

Question 4

Un semplice esempio per scrivere più dati per eccellere alla volta. E anche quando vuoi aggiungere dati a un foglio su un file Excel scritto (file Excel chiuso).

Quando è la prima volta che scrivi a un Excel. (Scrivendo "df1" e "df2" in "1st_sheet" e "2nd_sheet")

import pandas as pd 
from openpyxl import load_workbook

df1 = pd.DataFrame([[1],[1]], columns=['a'])
df2 = pd.DataFrame([[2],[2]], columns=['b'])
df3 = pd.DataFrame([[3],[3]], columns=['c'])

excel_dir = "my/excel/dir"

with pd.ExcelWriter(excel_dir, engine='xlsxwriter') as writer:    
    df1.to_excel(writer, '1st_sheet')   
    df2.to_excel(writer, '2nd_sheet')   
    writer.save()

Dopo aver chiuso excel, ma si desidera "aggiungere" dati sullo stesso file excel ma su un altro foglio, diciamo "df3" al nome del foglio "3rd_sheet".

book = load_workbook(excel_dir)
with pd.ExcelWriter(excel_dir, engine='openpyxl') as writer:
    writer.book = book
    writer.sheets = dict((ws.title, ws) for ws in book.worksheets)    

    ## Your dataframe to append. 
    df3.to_excel(writer, '3rd_sheet')  

    writer.save()

Si noti che il formato Excel non deve essere xls, è possibile utilizzare xlsx one.

Question 5

Ti consiglio vivamente di lavorare direttamente con openpyxl poiché ora supporta Pandas DataFrames .

Ciò ti consente di concentrarti sul codice Excel e Panda pertinente.

Question 6

Per creare un nuovo file

x1 = np.random.randn(100, 2)
df1 = pd.DataFrame(x1)
with pd.ExcelWriter('sample.xlsx') as writer:  
    df1.to_excel(writer, sheet_name='x1')

Per aggiungere al file, usa l'argomento mode='a'in pd.ExcelWriter.

x2 = np.random.randn(100, 2)
df2 = pd.DataFrame(x2)
with pd.ExcelWriter('sample.xlsx', engine='openpyxl', mode='a') as writer:  
    df2.to_excel(writer, sheet_name='x2')

L'impostazione predefinita è mode ='w'. Vedere la documentazione .

Question 7

Può farlo senza utilizzare ExcelWriter, utilizzando gli strumenti in openpyxl Ciò può rendere molto più semplice l'aggiunta di caratteri al nuovo foglio utilizzando openpyxl.styles

import pandas as pd
from openpyxl import load_workbook
from openpyxl.utils.dataframe import dataframe_to_rows

#Location of original excel sheet
fileLocation =r'C:\workspace\data.xlsx'

#Location of new file which can be the same as original file
writeLocation=r'C:\workspace\dataNew.xlsx'

data = {'Name':['Tom','Paul','Jeremy'],'Age':[32,43,34],'Salary':[20000,34000,32000]}

#The dataframe you want to add
df = pd.DataFrame(data)

#Load existing sheet as it is
book = load_workbook(fileLocation)
#create a new sheet
sheet = book.create_sheet("Sheet Name")

#Load dataframe into new sheet
for row in dataframe_to_rows(df, index=False, header=True):
    sheet.append(row)

#Save the modified excel at desired location    
book.save(writeLocation)

Question 8

Puoi leggere i fogli esistenti dei tuoi interessi, ad esempio "x1", "x2", in memoria e "riscriverli" prima di aggiungere altri fogli nuovi (tieni presente che i fogli in un file e i fogli in memoria sono due diversi cose, se non le leggi, andranno perse). Questo approccio utilizza solo "xlsxwriter", senza openpyxl.

import pandas as pd
import numpy as np

path = r"C:\Users\fedel\Desktop\excelData\PhD_data.xlsx"

# begin <== read selected sheets and write them back
df1 = pd.read_excel(path, sheet_name='x1', index_col=0) # or sheet_name=0
df2 = pd.read_excel(path, sheet_name='x2', index_col=0) # or sheet_name=1
writer = pd.ExcelWriter(path, engine='xlsxwriter')
df1.to_excel(writer, sheet_name='x1')
df2.to_excel(writer, sheet_name='x2')
# end ==>

# now create more new sheets
x3 = np.random.randn(100, 2)
df3 = pd.DataFrame(x3)

x4 = np.random.randn(100, 2)
df4 = pd.DataFrame(x4)

df3.to_excel(writer, sheet_name='x3')
df4.to_excel(writer, sheet_name='x4')
writer.save()
writer.close()

Se vuoi conservare tutti i fogli esistenti, puoi sostituire il codice sopra tra inizio e fine con:

# read all existing sheets and write them back
writer = pd.ExcelWriter(path, engine='xlsxwriter')
xlsx = pd.ExcelFile(path)
for sheet in xlsx.sheet_names:
    df = xlsx.parse(sheet_name=sheet, index_col=0)
    df.to_excel(writer, sheet_name=sheet)

Question 9

#This program is to read from excel workbook to fetch only the URL domain names and write to the existing excel workbook in a different sheet..
#Developer - Nilesh K
import pandas as pd
from openpyxl import load_workbook #for writting to the existing workbook

df = pd.read_excel("urlsearch_test.xlsx")

#You can use the below for the relative path.
# r"C:\Users\xyz\Desktop\Python\

l = [] #To make a list in for loop

#begin
#loop starts here for fetching http from a string and iterate thru the entire sheet. You can have your own logic here.
for index, row in df.iterrows():
    try: 
        str = (row['TEXT']) #string to read and iterate
        y = (index)
        str_pos = str.index('http') #fetched the index position for http
        str_pos1 = str.index('/', str.index('/')+2) #fetched the second 3rd position of / starting from http
        str_op = str[str_pos:str_pos1] #Substring the domain name
        l.append(str_op) #append the list with domain names

    #Error handling to skip the error rows and continue.
    except ValueError:
            print('Error!')
print(l)
l = list(dict.fromkeys(l)) #Keep distinct values, you can comment this line to get all the values
df1 = pd.DataFrame(l,columns=['URL']) #Create dataframe using the list
#end

#Write using openpyxl so it can be written to same workbook
book = load_workbook('urlsearch_test.xlsx')
writer = pd.ExcelWriter('urlsearch_test.xlsx',engine = 'openpyxl')
writer.book = book
df1.to_excel(writer,sheet_name = 'Sheet3')
writer.save()
writer.close()

#The below can be used to write to a different workbook without using openpyxl
#df1.to_excel(r"C:\Users\xyz\Desktop\Python\urlsearch1_test.xlsx",index='false',sheet_name='sheet1')

Question 10

Un altro modo abbastanza semplice per farlo è creare un metodo come questo:

def _write_frame_to_new_sheet(path_to_file=None, sheet_name='sheet', data_frame=None):
    book = None
    try:
        book = load_workbook(path_to_file)
    except Exception:
        logging.debug('Creating new workbook at %s', path_to_file)
    with pd.ExcelWriter(path_to_file, engine='openpyxl') as writer:
        if book is not None:
            writer.book = book
        data_frame.to_excel(writer, sheet_name, index=False)

L'idea qui è di caricare la cartella di lavoro in path_to_file se esiste e quindi aggiungere data_frame come nuovo foglio con sheet_name . Se la cartella di lavoro non esiste, viene creata. Sembra che né openpyxl o xlsxwriter accodamento, così come nell'esempio da @Stefano sopra, si hanno veramente a carico e poi riscrivere da aggiungere.