Converting Pdf To Html Format

Export PDF files to HTML using Acrobat XI Learn how to convert PDF to HTML, so the editable HTML file maintains images, tables, hyperlinks, and table of contents.

Easily export or convert one or more PDFs to different file formats, including Microsoft Word, Excel, and PowerPoint. The available formats include both text and image formats. (For a full list of conversion options, see File format options.)

Note:

This document provides instructions for Acrobat DC and Acrobat 2017. If you're using Acrobat XI, see Acrobat XI Help.

Convert PDFs to Word, RTF, spreadsheets,PowerPoint, or other formats

Note:

You cannot export PDF Portfolios, or PDFs within them, to other file formats.

  1. Open the PDF in Acrobat, and then choose Tools > Export PDF.

    The various formats to which you can export the PDF file are displayed.

    Choose the format to which you want to export your PDF.

    Note:

    You can also choose File > Export To > [File Type] to export the PDF file to a desired format.

  2. Select the file format to which you want to export the PDF file and a version (or format), if available. For example, if you choose to export the PDF file to Word format, you will get an option to export the PDF into Word Document (.docx) or Word 97-2003 Document (.doc) version.

    Note:

    You can configure the conversion settings by clicking the gear icon adjacent to the selected file format. Conversion settings can also be edited by selecting the Convert From PDF category in the Preferences dialog box.

  3. Export your PDF document to a local folder or Adobe Document Cloud.

  4. In the Export dialog box, select a location where you want to save the file.

  5. Click Save to export the PDF to the selected file format.

    Bydefault, the source filename is used with the new extension, andthe exported file is saved in the same folder as the source file.

    Note:

    When you save a PDF in an image format, each page is saved as a separate file, and each filename is appended with the page number.

You can configure conversion options before you save the file. By default, the conversion options specified under Preferences are used.

  1. Choose Edit > Preferences > Convert From PDF.
  2. Choose a format from the Converting From PDF list, and then click Edit Settings.
  3. Choose the conversion settings, and then click OK.

In addition to saving every page (all text, images, and vector objects on a page) to an image format using the File > Export To > Image > [Image Type] command, you can export each image in a PDF to a separate image file.

Note:

You can export raster images, but notvector objects.

  1. Open the PDF in Acrobat, and then choose Tools > Export PDF.

    The various formats to which you can export the PDF file are displayed.

  2. Click Image and then choose the image file format that you want to save the images in.

    Choose the format that you want to save the exported images in.

  3. To configure the conversion settings for the selected file format, click the gear icon .

  4. In the Export All Images As [selected file format] Settings dialog box, specify the File Settings, Color Management, Conversion, and Extraction settings for the file type.

  5. In the Extraction settings, for Exclude Images Smaller Than, select the smallest size of image to be extracted. Select No Limit to extract all images.

  6. Click OK to return to the return to the Export Your PDF To Any Format screen.

  7. Select the Export All Images option to extract and save only the images from the PDF file.

    Note:

    If you do not select the Export All Images option, all pages within the PDF are saved in the selected image file format.

  8. In the Export dialog box, select a location where you want to save the file.

  9. Click Save to save only the images from the PDF to the selected file format.

If you need just a part of the PDF file in another format, you don’t need to convert the entire file and then extract the relevant content. You can select text in a PDF file and save it in one of the supported formats: DOCX, DOC, XLSX, RTF, XML, HTML, or CSV.

  1. Right-click the selected text and choose Export Selection As.

    Right-click the selected text, and choose Export Selection As from the pop-up menu.

  2. Select a format from the Save As Type list and click Save.

More like this

Twitter™ and Facebook posts are not covered under the terms of Creative Commons.

Legal Notices | Online Privacy Policy

Active1 month ago

Is there a .dll I can use which uses a PDF file as an input and HTML file as an output?I want to convert from PDF to HTML. My colleague says that it's very difficult going step by step, getting text/font/image/margins/links etc. from PDF and then creating new HTML file with the same content. He says it's nearly impossible. So I was thinking - if there's some dll which I can use as a reference to do that?

wonea
2,15115 gold badges65 silver badges125 bronze badges
Converting Pdf To Html Formatpetko_stankoskipetko_stankoski
4,87432 gold badges105 silver badges194 bronze badges

closed as not constructive by Richard, L.B, Dustin Davis, Enigma State, ChrisFNov 14 '11 at 17:01

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance. If this question can be reworded to fit the rules in the help center, please edit the question.

3 Answers

Writing a program to do it is definitely not trivial. If you don't find any .NET Library to do this (I couldn't, at least not free), I would just download this and invoke it programmatically to get my html.

If you have the time to spare and/or PDFToHtml does not produce acceptable output for you, you could use iText to write the program yourself. It's a very mature free pdf library. I've used it in the past to manipulate PDFs (merge, create, etc).

UPDATE

As noted in the comment by Quandary, the PDFSharp library offers a more relaxed license (MIT) compared to the Commercial or AGPL license offered by iText. Keep this is mind when choosing your library. I have not used the PDFSharp library myself and I don't know how they compare in terms of functionality.

IcarusIcarus
56k12 gold badges77 silver badges107 bronze badges

You can download this free tool: PDFToHTML

Then in your program just fork a new process and run the executable passing the PDF file. I just tested it now and it seems to work ok.

TudorTudor
54.3k11 gold badges80 silver badges126 bronze badges

If you don't mind paying, Aspose offers a very good solution, this is what we use at my company.

Calum

Change Pdf To Html Format

Calum
1,5372 gold badges13 silver badges34 bronze badges

Converting Pdf To Html Format To Excel

Not the answer you're looking for? Browse other questions tagged c#htmlpdfdll or ask your own question.