Can I generate a PDF from a URL instead of from a file on disk?
You can generate a PDF from any HTML InputStream
. In most of the examples, we have used a FileOutputStream
, but in chapter 4, we have created reports that existed only in memory as a byte[]
. In that case, we used a ByteArrayInputStream
. We can also use an InputStream
that was created from a URL
object.
Suppose that we use this URL:
public static final String ADDRESS = "https://stackoverflow.com/help/on-topic";
If we open this URL in a browser, we see the following page:
In the C07E04_CreateFromURL (Java/.NET) example, we use ADDRESS
to create a Java URL
object:
new C07E04_CreateFromURL().createPdf(new URL(ADDRESS), DEST);
We use the following createPdf()
method:
public void createPdf(URL url, String dest) throws IOException {
HtmlConverter.convertToPdf(url.openStream(), new FileOutputStream(dest));
}
The openStream()
method gives us an InputStream
that will be used by iText to get the HTML - obviously, this only works on a machine that has access to the internet.
For pages with lots of pictures, it can take a while for iText to download all the resources, but this FAQ page from Stack Overflow should load quickly, and the result will look like this:
Maybe an A4 page isn't the ideal page size for a web page, because the complete sidebar is missing. Let's adapt the example, and introduce a media query.
The createPdf()
method of the C07E05_CreateFromURL2 (Java/.NET) example looks like this:
public void createPdf(URL url, String dest) throws IOException {
PdfWriter writer = new PdfWriter(dest);
PdfDocument pdf = new PdfDocument(writer);
PageSize pageSize = new PageSize(850, 1700);
pdf.setDefaultPageSize(pageSize);
ConverterProperties properties = new ConverterProperties();
MediaDeviceDescription mediaDeviceDescription =
new MediaDeviceDescription(MediaType.SCREEN);
mediaDeviceDescription.setWidth(pageSize.getWidth());
properties.setMediaDeviceDescription(mediaDeviceDescription);
HtmlConverter.convertToPdf(url.openStream(), pdf, properties);
}
We use a custom page size of 850 by 1700 user units, and we use the Screen
media type as done in chapter 2. Now the content fits the page, and we get a much better result:
Sure, there are still some imperfections. For instance: the items in the header bar are shown as a list, instead of as items in a menu bar, but we plan to solve these issues in future versions of pdfHTML.
We could also have used the media type PRINT
instead of SCREEN
. See the C07E06_CreateFromURL3 (Java/.NET) example:
public void createPdf(URL url, String dest) throws IOException {
ConverterProperties properties = new ConverterProperties();
MediaDeviceDescription mediaDeviceDescription =
new MediaDeviceDescription(MediaType.PRINT);
properties.setMediaDeviceDescription(mediaDeviceDescription);
HtmlConverter.convertToPdf(url.openStream(), new FileOutputStream(dest), properties);
}
Because of the print.css
used by Stack Overflow, we now have a couple of bare bones pages in which the sidebar is omitted deliberately. Maybe that's exactly what we want:
Important: pdfHTML is a work in progress. If you have tried printing a web page to paper pages from a browser, you notice that the results aren't always quite as good as you'd want them to be. The same will be true when using pdfHTML as a URL2PDF tool. Most HTML pages aren't meant to be printed, but with pdfHTML, we're doing a continuous effort to improve the conversion process.