PDF Tips

How to Extract Text from PDF Files

March 23, 2026 6 min read

Extract text from PDF files online for free using ConvertKr's PDF to text tool

A teacher at my brother’s college shares his lectures as PDFs. Long ones — 40-50 pages. The problem is they’re locked. Not password locked, but the text can’t be selected or copied. You can read the PDF but if you try to highlight anything and copy it, nothing happens.

My brother needed to copy some definitions and formulas into his own notes. He was literally typing them out word by word from the PDF. I watched him do this for 10 minutes before I said “bro stop, there’s a faster way.”

Wait — this depends on what kind of PDF you have

Okay so here’s the thing nobody tells you. There are two kinds of PDFs and they look exactly the same but they’re completely different inside.

The first kind is when someone typed something in Word or Google Docs and exported it as PDF. The text is real. It’s actual letters sitting in the file. You can usually highlight and copy it. Extracting text from these is instant — the tool just grabs what’s already there.

The second kind is when someone scanned a paper or took a phone photo and saved it as PDF. This looks like text to your eyes but the computer just sees a picture. There are no actual letters in the file — it’s a photograph. Trying to extract text from this is like trying to copy text from a JPEG. Doesn’t work.

How do you tell which one you have? Open the PDF and try to highlight a word with your mouse. If individual words highlight, it’s real text. If the whole page selects as one block or nothing highlights at all, it’s a scan.

Most documents from banks, offices, universities — those are real text PDFs. Those are easy. Scanned stuff needs OCR which is a whole different process and honestly hit-or-miss depending on the scan quality.

How to extract text

Go to convertkr.com/pdf-to-text. Upload your PDF. The tool pulls out all the text from every page and shows it to you. Copy it, paste it wherever you need.

For my brother’s locked lecture PDF — the text was actually there in the file, the professor had just disabled selection. The extraction tool ignores those restrictions and pulls the text out anyway. My brother got all his definitions in 5 seconds instead of typing for an hour.

What people use this for

Copying from locked PDFs. My brother’s situation. Some PDFs have restrictions — you can view but can’t select, copy, or print. Banks do this with statements sometimes. Government documents too. The text is in the file, you’re just not “allowed” to copy it through normal means. The extractor doesn’t care about those restrictions.

Getting text from old documents. My uncle had a bunch of Urdu poetry PDFs from the early 2000s. He wanted to share specific poems on WhatsApp but couldn’t copy the text. The PDFs were text-based (not scanned) but made with some old software that made text selection buggy. Extraction worked and he got clean text he could paste into WhatsApp.

Converting PDF content to Word. A neighbor works at a small office. They received a contract as a PDF and needed to make changes. No Word version available. She copied the extracted text into a new Word document and edited from there. Not perfect — formatting doesn’t carry over — but the text was all there. Better than retyping 15 pages.

Research and quoting. When you’re writing something and need to quote from a PDF — a research paper, a report, a book chapter. Instead of retyping the quote, just extract and copy. I do this all the time when referencing stuff for blog posts.

Data from PDF tables. This one’s tricky. If a PDF has a table — like a price list or a schedule — the extracted text comes out as plain text, not as a table. The rows and columns aren’t preserved. But at least you get the text, and rearranging it into a spreadsheet is faster than typing every cell.

What it can’t do

I’m going to be honest here because I don’t want anyone getting frustrated.

Scanned documents. If the PDF is a scan — someone took a photo of a paper and made it a PDF — there’s no text to extract. The tool will either return nothing or return garbage. You need OCR for scans. ConvertKr has an OCR tool that handles scanned documents, but that’s a different process and the results depend heavily on the scan quality.

Handwritten content. If someone handwrote notes and scanned them, no text extractor will help. Even OCR struggles with handwriting unless it’s very neat.

Formatting. You get plain text. No bold, no italics, no headings, no colors. Just the raw words. If the PDF has complex layouts — multiple columns, text boxes, sidebars — the extracted text might be jumbled because the tool reads left to right, top to bottom, and columns confuse that order.

Images in PDFs. Text inside images — like a photo of a whiteboard, or text baked into a graphic — won’t be extracted. Only actual text characters in the PDF structure come out.

For most regular documents — letters, reports, statements, contracts, articles — it works perfectly.

Tips

Check if it’s a scan first. Open the PDF and try to select text with your mouse. If you can highlight individual words (even if you can’t copy them), it’s a text-based PDF and extraction will work. If clicking and dragging selects the whole page as one image, it’s a scan and you need OCR instead.

For large documents, be patient. A 200-page PDF might take a few seconds to process. The tool is reading through every page and pulling text. On a slower device it might feel sluggish with really big files.

Clean up the output. Extracted text often has weird line breaks where the PDF had line breaks mid-sentence. If you’re pasting into Word or Google Docs, do a find-and-replace to clean up extra spaces and line breaks. I usually paste into Notepad first, fix the obvious issues, then paste into the final destination.

Headers and footers come out too. If the PDF has “Company Name” at the top of every page and “Page 1 of 15” at the bottom, those show up in the extracted text. For every page. You’ll need to clean those out manually if they bother you.

The OCR question

Somebody always asks “does this do OCR?” so let me address it.

This text extraction tool does NOT do OCR. It extracts existing text from the PDF structure. If there’s no text in the PDF (because it’s a scan), there’s nothing to extract.

For scanned documents, use the OCR tool instead. It actually looks at the image, recognizes the characters, and gives you text. It supports multiple languages and works on most printed text scans. Handwriting is still iffy.

Think of it this way: text extraction is opening a book and reading the words that are already printed. OCR is showing a photo of a book to someone and asking them to type out what they see. Very different processes.

FAQ

Can I extract text from a specific page only?
The tool extracts from all pages. If you only need text from certain pages, extract everything and then copy just the section you need. Or split the PDF first to isolate the pages you want.

The extracted text has weird characters. What happened?
Some PDFs use custom fonts that map characters differently. The PDF looks fine visually but the underlying text data is garbled. This is common with very old PDFs or PDFs created by unusual software. OCR might work better in those cases since it reads what’s visible, not what’s in the text data.

Is this legal?
Extracting text from a PDF you own or have legitimate access to is fine. If the PDF has copy restrictions, that’s usually a soft lock — it’s not encryption. The document creator chose to disable copy/paste but the text is still in the file. Whether you should respect that restriction depends on your situation. For personal study notes? Nobody’s going to come after you.

Need to get text out of a PDF? Open the text extractor — upload, copy, done.