I'd like to use Python to tell if a PDF was created by Google Docs. Is there any sort of metadata I can gather with PyPDF2 to determine this?

1

Best Answer


When doing pdf.getDocumentInfo() on a Document created by Google Docs, it returns {'/Producer': u'Skia/PDF m83'}. I tested this on a few Google docs, and it seems to check out. It makes sense - Skia is a Google project, so must be what they use to generate documents on their backend.

So you can simply do:

import PyPDF2GOOGLE_DOCS_PDF_METADATA = {'/Producer': u'Skia/PDF m83'}def file_is_google_doc(pdf_file_path) pdf = PyPDF2.PdfFileReader(pdf_file_path)return pdf.getDocumentInfo() == GOOGLE_DOCS_PDF_METADATA