I want to save html file generated by ASP.NET to PDF.
I was pointed to
itextsharp open source project.
I found a few links, discussing how to do it:
http://www.velocityreviews.com/forums/t72716-using-itextsharp-to-generate-pdf-from-aspnet.html
iTextSharp Tutorial Chapter 7: XML and (X)HTML
iTextSharp Demo(asp.net 2.0):http://rubypdf.com/itextsharp/tutorial01/ap07Chap0707.cs.html introduces HtmlParser.Parse.(see the source code here)
We tried to use it.
HtmlParser.Parse does NOT throw any error , but the pdf file generated from this could be blank/empty.
Debug output shows the messages from parser, if Html file has invalid structure.
This is a big problem: HtmlParser.Parse is very strict and any minor mistakes in HTML causes exceptions or almost silent creation of empty PDF file.
The post of Creating pdf in .NET from html has a lot of interesting comments, including suggestion to use HTML Agility Pack.
We are going to try how HtmlParser.Parse will be tolerant to html, regenerated from HTML Agility Pack.
The thread [ 1819614 ] Error parsing images in HTML files has description of the fix
Another option is always use XML complient HTML, verified by http://validator.w3.org/#validate_by_input ,but it could take some time to tidy up the HTML generated from ASP.NET
http://www.google.com.au/search?source=ig&hl=en&rlz=&q=HtmlParser.Parse&meta=
Links to other products:
Generate PDF from ASP.NET gives a few references to different products including iTextSharp
Dynamically Generating PDFs in .NET : http://www.developerfusion.co.uk/show/6623/
Another option is to try (and possibly buy) commercial product abcpdf
I saw a suggestion to use http://www.htmldoc.org/ -the command line version of HTMLDoc to convert HTML to PDF, but it is not good for programmatic access.