In PDF 1.0 (1993), a PDF file consisted of a mix of ASCII characters for the PDF syntax and binary code for objects such as images. A page stream would contain visible PDF operators and operands, for instance:
56.7 748.5 m
136.2 748.5 l
This code tells you that a line has to be drawn (
S) between the coordinate
(x = 56.7; y = 748.5) because that's where the cursor is moved to with the
m operator, and the coordinate
(x = 136.2; y = 748.5) because a path was constructed using the
l operator that adds a line.
Starting with PDF 1.2 (1996), one could start using filters for such content streams (page content streams, form XObjects). In most cases, you'll discover a
/Filter entry with value
/FlateDecode in the stream dictionary. You'll hardly find any "modern" PDFs of which the contents aren't compressed.
Up until PDF 1.5 (2003), all indirect objects in a PDF document, as well as the cross-reference stream were stored in ASCII in a PDF file. Starting with PDF 1.5, specific types of objects can be stored in an objects stream. The cross-reference table can also be compressed into a stream. iText's
PdfReader has an
isNewXrefType() method to check if this is the case. Maybe that's what you're looking for. Maybe you have PDFs that need to be read by software that isn't able to read PDFs of this type, but... you're not telling us.
Maybe we're completely misinterpreting the question. Maybe you want to know if you're receiving an actual PDF or a zip file with a PDF. Or maybe you want to data-mine the different filters used inside the PDF. In short: your question isn't very clear, and I hope this answer explains why you should clarify.