I want to convert a pdf file into excel and save it in local via python.I have converted the pdf to excel format but how should I save it local?
my code:
df = ("./Downloads/folder/myfile.pdf")tabula.convert_into(df, "test.csv", output_format="csv", stream=True)
Best Answer
You can specify your whole output path instead of only output.csv
df = ("./Downloads/folder/myfile.pdf")output = "./Downloads/folder/test.csv"tabula.convert_into(df, output, output_format="csv", stream=True)
Hope this answers your question!!!
In my case, the script below worked:
import tabuladf = tabula.read_pdf(r'C:\Users\user\Downloads\folder\3.pdf', pages='all')tabula.convert_into(r'C:\Users\user\Downloads\folder\3.pdf', r'C:\Users\user\Downloads\folder\test.csv' , output_format="csv",pages='all', stream=True)
i use google collab
install the packege needed
!pip install tabula-py!pip install pandas
Import the required Module
import tabulaimport pandas as pd
Read a PDF File
data = tabula.read_pdf("example.pdf", pages='1')[0] # "all" untuk semua data, pages diisi nomor halaman
convert PDF into CSV
tabula.convert_into("example.pdf", "example.csv", output_format="csv", pages='1') #"all" untuk semua data, pages diisi no halamanprint(data)
to convert to excell file
data1 = pd.read_csv("example.csv")data1.dtypes
now save to xlsx
data.to_excel('example.xlsx')
Documentation says that:
Output file will be saved into output_path
output_path is your second parameter, "test.csv". I guess it works fine, but you are loking it in the wrong folder. It will be located near to your script (to be strict - in current working directory) since you didn't specify full path.
PDF to .xlsx file:
for item in df:list1.append(item)df = pd.DataFrame(list1)df.to_excel('outputfile.xlsx', sheet_name='Sheet1', index=True)
you can also use camelot
in combination with pandas
import camelotimport pandastables = camelot.read_pdf(path_to_pdf, flavor='stream',pages='all')df = pandas.concat([table.df for table in tables])df.to_csv(path_to_csv)