Devonian Times Masthead

The DEVONtechnologies Blog

How to Deal With PDF Searchability

April 18, 2023 — Jim Neumann
Screenshot showing the concordance inspector with a word cloud in DEVONthink.

When it comes to PDF documents, what you see and what is actually there can be two different things. Just because you can read words in the document doesn’t mean it is searchable. Or perhaps you have a searchable PDF but it’s still not found in a search in DEVONthink. Here are a few ways to deal with such PDFs.

Is the PDF searchable?

First, make sure your PDF is searchable at all. If your PDF doesn’t contain a text layer, you won’t be able to search for text in it and have to run OCR first. You can check this at several points in DEVONthink:

  • In the item list, look at the Kind column. If the document is reported as PDF document, it likely has no text layer, or an unsearchable one. A searchable PDF is reported as PDF+Text.
  • There, you can also select View > List Columns > Word Count to show a column reporting the number of words in your documents. If there’s no number shown, there are no words in this document.
  • Above the view/edit pane is a thin pane called the Navigation bar. Toward the right side you will see statistics, e.g., 1000 w. For a PDF, this is the number of words in the document. If there is nothing reported here, there is no text layer to be searched in.
  • If you’re running the Pro or Server edition, check the Tools > Inspectors > Concordance inspector. If there are no words in the List or Cloud views, the document has no text layer.

Check the text layer

If your document is a PDF+Text but you’re not finding it in a search, the text layer may not have the actual words you’re looking for. Check the Tools > Inspectors > Concordance word list or cloud and look for the word you’re trying to find. If it’s not there, the document can’t be found in a search for this particular word.

For those running any edition of DEVONthink, you can export the text via Data > Convert > To Plain Text. This exports the text layer as a separate text document. Open it and you can check if what you’re searching for is actually present. Perhaps the OCR didn’t recognize a word correctly, so you can’t find it when searching for it.

A note about macOS’ Live Text

With the inclusion of the Live Text feature in macOS, it may seem that you have a searchable PDF because you can select — or even copy and paste — text in it. However, Live Text does not create a real text layer and therefore the document does not have searchable content. Instead, to create a proper and searchable text layer, run an OCR.

If you don’t want to use the Live Text feature, it can be disabled in System Settings > General > Language & Region > Live Text.

On a side note, the issue with Live Text does not affect PDFs in DEVONthink To Go.