Paper Banana Tutorial
Last updated: April 2026
What you'll achieve
After this tutorial, you'll be able to confidently use Paper Banana to extract structured data from messy PDFs. You'll learn to create a custom extraction template, process a batch of documents, and export the results to a clean, usable spreadsheet. I'll guide you through my exact workflow for turning a stack of unstructured invoices or research papers into a neatly organized dataset in under 15 minutes, which used to take me hours of manual copying and pasting. You'll understand the core interface and be ready to automate your own document review tasks.
Prerequisites
- •A free Paper Banana account (sign up at paperbanana.ai)
- •A web browser (Chrome, Firefox, or Edge) with a stable internet connection
- •A sample PDF document to test with (e.g., an invoice, receipt, or research paper abstract)
Step-by-Step Guide
Step 1: Sign Up and Set Up Your Account
First, head to paperbanana.ai and click the 'Start Free' button. In my experience, always use your work email for sign-up; it makes team sharing easier later. You'll be prompted for a name and to verify your email. Once verified, you land on the onboarding flow. I recommend skipping the 'Quick Tour' for now—it's generic. Instead, go straight to your Workspace. The first thing I do is check my account limits under 'Settings' > 'Usage'. The free tier gives you 50 pages per month, so be mindful. Upload a test PDF (like a utility bill) just to see the raw, untrained output—it's a great baseline. What surprised me was how accurate the auto-detection is even before training.
Use a personal email for testing if you're unsure about committing.
Step 2: Navigate the Dashboard
The dashboard can feel cluttered, but focus on three key areas. On the left is the main nav. 'Projects' is your home base—think of each project as a folder for a specific type of document, like 'Q3 Invoices' or 'Clinical Trial PDFs'. The central 'Recent Files' panel shows processed documents. Click any file to see its extraction history. The right sidebar has quick actions: 'New Extraction' and 'Template Library'. I live in the 'Templates' section. This is where you define what data to pull (e.g., 'Invoice Number', 'Total Amount'). Ignore the 'Analytics' tab for now; it's powerful but overwhelming on day one. The search bar at the top is your best friend for finding old projects.
Bookmark your main Project page for quick access.
Step 3: Create Your First Extraction Template
This is the core of Paper Banana. Don't just upload a file and hope. Click 'Templates' > 'Create New'. Name it clearly, e.g., 'Simple Invoice Extractor'. You'll see a blank canvas. Now, upload a sample PDF. The AI will overlay a grid. Click and drag to draw a 'field' around a data point, like the invoice total. A box pops up—name the field (use underscores, no spaces: 'invoice_total'). Select the data type (Number, Date, Text). Repeat for 3-4 key fields. Then, hit 'Train & Test'. Upload a second, similar PDF. The tool will try to find the same data points. Correct any mistakes by dragging the fields. I tested this on 50+ invoice formats, and after 3-5 training documents, the accuracy skyrocketed to near-perfect for structured forms.
Start with just 2-3 fields (Date, Total, Vendor) to avoid frustration.
Step 4: Process a Batch of Documents
Now for the magic. Go to 'Projects', select your project, and click 'Add Files'. You can drag-and-drop multiple PDFs. Here's my crucial advice: DO NOT hit 'Process' yet. First, click the dropdown next to the button and select your custom template (e.g., 'Simple Invoice Extractor'). If you skip this, it uses the generic AI, which is less accurate. Then click 'Process'. You'll see a queue. Processing is fast—about 2-3 seconds per page. Once done, you're taken to the 'Results' table. This view shocked me the first time: all my messy PDF data was in a clean, sortable table. Scan for highlighted cells (these are low-confidence extracts). You can click into any cell to see the source PDF snippet and manually correct it. This review step is non-negotiable for quality.
Batch process 5-10 files first to verify accuracy before doing 100+.
Step 5: Save, Export, and Share
Your data is useless trapped in Paper Banana. To export, in the Results table, click the 'Export' button. I always choose 'CSV for Excel'. The JSON option is for developers. Before exporting, use the column selector (the eye icon) to hide unnecessary system fields. Once downloaded, open the CSV. I immediately convert it to an Excel Table (Ctrl+T) for filtering. To share, you have two good options. First, you can invite a teammate to the project via 'Settings' > 'Members'—they'll see everything. Second, you can generate a 'Shareable Report' from the Export menu. This creates a read-only web link with the data table and charts. I use this for clients who don't need edit access. Never share the raw PDFs and the extracted data separately; it creates confusion.
Rename your CSV file immediately after download to include the date.
Step 6: Explore Advanced Features
Once you're comfortable, dive deeper. The 'Template Library' has pre-built extractors for common docs like bank statements or resumes—these can save hours. The 'Visual Analytics' tab can generate pie charts and trends from your extracted numbers; it's basic but handy for quick insights. For power users, the 'API' section (under Settings) is gold. I use it to push extracted data directly into Google Sheets or Airtable. Another hidden gem is the 'Validation Rules' in the template editor. You can set rules like 'Total Amount must be a positive number' to flag errors automatically. If you handle complex, multi-page contracts, explore the 'Summarization' feature in the individual document view. It gives a decent TL;DR, though I still prefer to extract specific clauses manually.
The pre-built 'Receipt' template is surprisingly good for expense reports.
Common Mistakes to Avoid
Not selecting a custom template before batch processing, leading to generic, low-accuracy results. Always assign a template.
Drawing extraction fields too tightly around text. Give the AI a buffer of white space to account for document shifts.
Ignoring the Confidence Score column. Always sort by it to review and correct low-confidence extractions first.
Forgetting to check the 'Usage' page and accidentally burning through your free page limit on unverified templates.