Portál AbcLinuxu, 12. května 2025 12:29
Zdravim nespavcov :) Uz dlhsi cas sa mordujem so spracovanim vstupnych dat v tvare:
#attr2:attr3:attr6:attr5 value2r1:value3r1:value6r1:value5r1 #attr1:attr5:attr3:attr2:attr6 value1r2:value5r2:value3r2:value2r2:value6r2 value1r3:value5r3:value3r3:value2r3:value6r3 #attr8:attr2:attr3:attr4:attr5:attr7:attr1 #attr1:attr2:attr3:attr5 value1r4:value2r4:value3r4:value5r4
Pravidla pre tvar vstupneho suboru:
Mojim cielom je dostat ich do tvaru:
#attr1:attr2:attr3:attr4:attr5:attr6:attr7:attr8 :value2r1:value3r1::value5r1:value6r1 value1r2:value2r2:value3r2::value5r2:value6r2 value1r3:value2r3:value3r3::value5r3:value6r3 value1r4:value2r4:value3r4::value5r4
pripadne:
#attr1:attr2:attr3:attr4:attr5:attr6:attr7:attr8 :value2-1:value3-1::value5-1:value6-1:: value1-2:value2-2:value3-2::value5-2:value6-2:: value1-3:value2-3:value3-3::value5-3:value6-3:: value1-4:value2-4:value3-4::value5-4:::
cize, vertikalne ich zarovnat a zoradit podla attrX
, ktore budu vypisane v prvom riadku vystupneho suboru ako hlavicka. Cielom je predpriprava suboru na dalsie spracovanie tabulkovym kalkulatorom. Viacero dvojbodiek na konci riadka ako je ukazane vyssie nicomu neprekaza, ale ani nie je potrebnych. (vychadzam z toho ze ich povolenim by sa dal zjednodusit kod, a tabulkovy kalkulator ich aj tak odignoruje
Zatial som dospel k nasledovnemu:
awk ' BEGIN { OFS=FS=":"; record=0 } # spracovanie zaciatku bloku; nazvy premennych ukladam do attr[i] { if ($0 ~ /^#/) { columns=split(substr($0,2),attr); #print"\n"; for(i in attr) print "attr["i"] = "attr[i]; #debug1 next; } # spracovanie riadkov s hodnotami; hodnoty ukladam do r[record, attr[i]] record++; numvalues=split($0,value); for(i=1;i<=numvalues;i++) { r[record SUBSEP attr[i]]= value[i]; print "r ["record", "attr[i]"] = "r[record SUBSEP attr[i]]; #debug2 } } END{ print "-----" #debug3 for (combined in r) { #print combined; #debug4 num=split(combined, separate, SUBSEP); #print separate[1], separate[2], r[separate[1] SUBSEP separate[2]]; #debug5 # a co dal? }
Vystupom z debug2
je
r [1, attr2] = value2r1 r [1, attr3] = value3r1 r [1, attr6] = value6r1 r [1, attr5] = value5r1 r [2, attr1] = value1r2 r [2, attr5] = value5r2 r [2, attr3] = value3r2 r [2, attr2] = value2r2 r [2, attr6] = value6r2 r [3, attr1] = value1r3 r [3, attr5] = value5r3 r [3, attr3] = value3r3 r [3, attr2] = value2r3 r [3, attr6] = value6r3 r [4, attr1] = value1r4 r [4, attr2] = value2r4 r [4, attr3] = value3r4 r [4, attr5] = value5r4, takze verim, ze data mam rozparsovane a ulozene v asociativnom poli v poriadku.
Problemom pre mna je, ako z tohoto vyskladat vysledny vypis. Potrebujem nejako rozumne toto pole vyiterovat, no napada ma len klasicka konstrukcia: dva zanorene fory, ktora mi tu ale fungovat nebude, kedze nemam dva pouzitelne indexy.
Nejake napady? :)
Řešení dotazu:
import sys attrs = set() data = [] for line in sys.stdin: line = line.strip() if not line: continue if line[0] == '#': currentattrs = line[1:].split(':') attrs.update(currentattrs) else: data.append(dict(zip(currentattrs, line.split(':')))) keys = list(sorted(attrs)) print '#' + ':'.join(keys) for row in data: print ':'.join(row.get(k, '') for k in keys)
BEGIN { OFS=FS=":"; record=0 } # spracovanie zaciatku bloku; nazvy premennych ukladam do attr[i] { if ($0 ~ /^#/) { columns = split(substr($0, 2), attr); for(c in attr) { # ulozim si vsechny attributy all[attr[c]] = attr[c]; } next; } # spracovanie riadkov s hodnotami; hodnoty ukladam do r[record, attr[i]] record++; numvalues = split($0, value); for(i = 1; i <= numvalues; i++) { r[record, attr[i]] = value[i]; } } END{ nc = asort(all, sort); # vypis hlavicky ORS = ""; print("#"); ORS = ":"; for (c = 1; c < nc; c++) { print(sort[c]); } ORS = "\n"; print(sort[c]); # vypis obsahu for (l = 1; l <= record; l++) { ORS = ":"; for (c = 1; c < nc; c++) { print(r[l, sort[c]]); } ORS = "\n"; print(r[l, sort[c]]); } }Vysledek:
$ awk -f s.awk < in.txt #attr1:attr2:attr3:attr4:attr5:attr6:attr7:attr8 :value2r1:value3r1::value5r1:value6r1:: value1r2:value2r2:value3r2::value5r2:value6r2:: value1r3:value2r3:value3r3::value5r3:value6r3:: value1r4:value2r4:value3r4::value5r4:::
Tiskni
Sdílej:
ISSN 1214-1267, (c) 1999-2007 Stickfish s.r.o.