Previous Thread
Next Thread
Print Thread
Converting PDF to Word #1379 24 Oct 07 03:58 PM
Joined: Jun 2001
Posts: 713
S
Steven Shatz Offline OP
Member
OP Offline
Member
S
Joined: Jun 2001
Posts: 713
Does anyone know of a Linux program that can be invoked from the command line to convert a PDF file to MS Word format? The PDF files are almost entirely text. The resulting files must be editable using MS Word.

Re: Converting PDF to Word #1380 24 Oct 07 10:50 PM
A
Anonymous
Unregistered
Anonymous
Unregistered
A
pdftoword is not free but is an option. There's also pdftotext (may already be installed; it's part of the poppler-utils and xpdf-utils debian packages that ubuntu likes) which generates text files (UNIX style; no RETURN on end of line) that can be opened by msword.

Re: Converting PDF to Word #1381 25 Oct 07 11:59 AM
Joined: Jun 2001
Posts: 713
S
Steven Shatz Offline OP
Member
OP Offline
Member
S
Joined: Jun 2001
Posts: 713
Mike, thanks for the info, but I'm confused. When I google "pdftoword" I find a Windows program at www.verypdf.com. Is there a linux version of this, too? Could you send me a link to the Linux download?

The other program, "pdftotext" is probably not suitable for our needs since we want the resulting Word doc to look as much like the original PDF file as possible, including a graphic logo and signatures.

Re: Converting PDF to Word #1382 01 Nov 07 09:59 AM
A
Anonymous
Unregistered
Anonymous
Unregistered
A
Sorry, Steven, I assumed you just wanted the text. There's not much available for Word under Linux but then think of the mentality: converting a proprietary format to another proprietary format just isn't kosher or The GNU/Linux Way or something (is this religious or political?)

Any way, I couldn't find anything under Linux except for a lot of other people looking to do the same thing. There may be a workaround though: there is a PDF to HTML converter available at http://pdftohtml.sourceforge.net/ that will get you a document editable by Word or other (Linux) word processors or HTML editors. The above page shows an example of a PDF document converted to HTML.

I'm looking at html2doc which is actually a php script to get the file all the way to Word format. Oddly enough, it still requires Windows but there seems to have been a linux conversion that I'm tracking down now. However, things just got real busy here and I might not finish before I leave for the weekend.

Re: Converting PDF to Word #1383 01 Nov 07 12:00 PM
Joined: Jun 2001
Posts: 713
S
Steven Shatz Offline OP
Member
OP Offline
Member
S
Joined: Jun 2001
Posts: 713
Thank you for doing all that research, Mike. I've concluded that there is no good way to use Linux to convert a PDF to Word format. Instead, I will have to use one of the many Windows-based programs and interface with it.

Essentially, one A-Shell program will place PDF files on a Samba share where a Perl program will detect and submit them to the converter. While another A-Shell program polls a results location to retrieve and process the DOC files. (Can an A-Shell program do what the Perl program would do? That is, invoke a Windows program running on a network server?)

Re: Converting PDF to Word #1384 01 Nov 07 12:23 PM
Joined: Jun 2001
Posts: 11,645
J
Jack McGregor Online Content
Member
Online Content
Member
J
Joined: Jun 2001
Posts: 11,645
It depends on the mechanism/interface required. You said that the Perl program will "submit them to the converter". If the converter is an executable running on the local machine, no problem., just HOSTEX it. If the submit is itself a file-based operation, again, no problem. If it is a TCP or FIFO, you can use TCPX.SBR or FIFO.SBR. But if it is an RPC operation, A-Shell doesn't support that directly (although you could probably create the equivalent mechanism by running an A-Shell program on the network server that your A-Shell/Linux program talked to.)

You can also execute Perl scripts just by using HOSTEX (under UNIX) or MX_SHELLEX (under Windows) on them, provided Perl is installed. I've given some thought to supporting embedded Perl statements inside of Basic, but I'm not sure it's worth the additional dependencies/baggage (involving dynamic or static links between A-shell and Perl libraries) that that would require.

Re: Converting PDF to Word #1385 01 Nov 07 12:34 PM
A
Anonymous
Unregistered
Anonymous
Unregistered
A
Check out the socket programming chapter in the A-Shell development guide. I've done socket programming from a Softworks Basic client to a Windows server using Micro Sabio's tcpx subroutine. IIRC I had to send a close connection to get the service (written in Visual Basic) to send it's response. I always felt that was a problem with .net, VB itself, or the VB programmer's code, not TCPX.


Moderated by  Jack McGregor, Ty Griffin 

Powered by UBB.threads™ PHP Forum Software 7.7.3