Portál AbcLinuxu, 10. května 2025 06:04
txt2pdbdoc -d George\ Orwell\ -\ 1984.pdb George_Orwell-1984.txtPak se podivam na kodovani vysledneho souboru:
file -i George_Orwell-1984.txt George_Orwell-1984.txt: text/plain; charset=unknown-8bitProtoze si myslim, ze je to kodovani cp1250 (tedy klasicke windowsovske) prekonvertuju to do utf8:
iconv -f cp1250 -t utf-8 George_Orwell-1984.txt > George_Orwell-1984_utf8.txtVysledny soubor je pekne citelny na mem pocitaci a ma deklarovane kodovani:
file -i George_Orwell-1984_utf8.txt George_Orwell-1984_utf8.txt: text/plain; charset=utf-8Pri pokusu o prevedeni do kodovani iso8859-2 mi vsak vraci chybu:
iconv -f utf-8 -t iso-8859-2 George_Orwell-1984_utf8.txt > George_Orwell-1984_iso.txt iconv: illegal input sequence at position 3552Na teto pozici se nachazi znak dolni uvozovky s kody:
201A SINGLE LOW-9 QUOTATION MARK = low single comma quotation mark binary 10000000011010 UTF-8: 0xe2 0x80 0x9aPokusim se tedy o transliteraci, coz dopadne dobre:
iconv -f utf-8 -t iso-8859-2//TRANSLIT George_Orwell-1984_utf8.txt > George_Orwell-1984_iso.txtS tim problemem, ze vysledny soubor je v kodovani iso-8859-1:
file -i George_Orwell-1984_iso.txt George_Orwell-1984_iso.txt: text/plain; charset=iso-8859-1A tento soubor (a ani zadny z mezikroku) muj telefon nezpracuje. Zkousel jsem prevadet i do cp1250, ale vysledek obdobny. Totez s utf-8.
Pouzity sw/hw: Linux, Debian Sqeeze, locale: LANG=en_US.utf8 LC_CTYPE="en_US.utf8" LC_NUMERIC="en_US.utf8" LC_TIME="en_US.utf8" LC_COLLATE="en_US.utf8" LC_MONETARY="en_US.utf8" LC_MESSAGES="en_US.utf8" LC_PAPER="en_US.utf8" LC_NAME="en_US.utf8" LC_ADDRESS="en_US.utf8" LC_TELEPHONE="en_US.utf8" LC_MEASUREMENT="en_US.utf8" LC_IDENTIFICATION="en_US.utf8" LC_ALL= Telefon: Nokia E52 MobiReaderNenapada nekoho z vas co s tim? Rad bych si par knizek precetl v mobilu, ale se spatnym kodovanim to neni zrovna pohodlne. Prikladam zdrojovy pdb soubor.
Tiskni
Sdílej:
ISSN 1214-1267, (c) 1999-2007 Stickfish s.r.o.