-
Notifications
You must be signed in to change notification settings - Fork 17
/
Copy pathSpreadsheet-Read.html
1280 lines (886 loc) · 53.1 KB
/
Spreadsheet-Read.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Spreadsheet::Read - Read the data from a spreadsheet</title>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<link rev="made" href="mailto:[email protected]" />
</head>
<body>
<ul id="index">
<li><a href="#NAME">NAME</a></li>
<li><a href="#SYNOPSIS">SYNOPSIS</a></li>
<li><a href="#DESCRIPTION">DESCRIPTION</a>
<ul>
<li><a href="#Supported-spreadsheets">Supported spreadsheets</a>
<ul>
<li><a href="#OpenOffice-and-LibreOffice-ODS-and-SXC">OpenOffice and LibreOffice (ODS and SXC)</a></li>
<li><a href="#Microsoft-Excel-XLSX-and-XLS">Microsoft Excel (XLSX and XLS)</a></li>
<li><a href="#CSV-CSV">CSV (CSV)</a></li>
<li><a href="#Gnumeric-gnumeric">Gnumeric (gnumeric)</a></li>
<li><a href="#SquirrelCalc-sc">SquirrelCalc (sc)</a></li>
</ul>
</li>
<li><a href="#Data-structure">Data structure</a></li>
<li><a href="#Formatted-vs-Unformatted">Formatted vs Unformatted</a></li>
<li><a href="#Functions-and-methods">Functions and methods</a>
<ul>
<li><a href="#new">new</a></li>
<li><a href="#ReadData">ReadData</a></li>
<li><a href="#col2label">col2label</a></li>
<li><a href="#cr2cell">cr2cell</a></li>
<li><a href="#cell2cr">cell2cr</a></li>
<li><a href="#row">row</a></li>
<li><a href="#cellrow">cellrow</a></li>
<li><a href="#rows">rows</a></li>
<li><a href="#parses">parses</a></li>
<li><a href="#parsers">parsers</a></li>
<li><a href="#Version">Version</a></li>
<li><a href="#sheets">sheets</a></li>
<li><a href="#sheet">sheet</a></li>
<li><a href="#add">add</a></li>
</ul>
</li>
<li><a href="#Methods-on-sheets">Methods on sheets</a>
<ul>
<li><a href="#maxcol">maxcol</a></li>
<li><a href="#maxrow">maxrow</a></li>
<li><a href="#cell">cell</a></li>
<li><a href="#attr1">attr</a></li>
<li><a href="#col2label1">col2label</a></li>
<li><a href="#cr2cell1">cr2cell</a></li>
<li><a href="#cell2cr1">cell2cr</a></li>
<li><a href="#col">col</a></li>
<li><a href="#cellcolumn">cellcolumn</a></li>
<li><a href="#row1">row</a></li>
<li><a href="#cellrow1">cellrow</a></li>
<li><a href="#cellrange">cellrange</a></li>
<li><a href="#range">range</a></li>
<li><a href="#rows1">rows</a></li>
<li><a href="#merged_from">merged_from</a></li>
<li><a href="#label">label</a></li>
<li><a href="#active">active</a></li>
<li><a href="#hidden">hidden</a></li>
</ul>
</li>
<li><a href="#Using-CSV">Using CSV</a></li>
<li><a href="#Cell-Attributes">Cell Attributes </a>
<ul>
<li><a href="#Merged-cells">Merged cells </a></li>
</ul>
</li>
<li><a href="#Streams-from-web-resources">Streams from web-resources</a></li>
</ul>
</li>
<li><a href="#TOOLS">TOOLS</a>
<ul>
<li><a href="#xlscat">xlscat</a></li>
<li><a href="#xlsgrep">xlsgrep</a></li>
<li><a href="#xlsx2csv">xlsx2csv</a></li>
<li><a href="#xls2csv">xls2csv</a></li>
<li><a href="#ss2tk">ss2tk</a></li>
<li><a href="#ssdiff">ssdiff</a></li>
</ul>
</li>
<li><a href="#Vulnerabilities">Vulnerabilities</a></li>
<li><a href="#TODO">TODO</a></li>
<li><a href="#SEE-ALSO">SEE ALSO</a></li>
<li><a href="#AUTHOR">AUTHOR</a></li>
<li><a href="#COPYRIGHT-AND-LICENSE">COPYRIGHT AND LICENSE</a></li>
</ul>
<h1 id="NAME">NAME</h1>
<p>Spreadsheet::Read - Read the data from a spreadsheet</p>
<h1 id="SYNOPSIS">SYNOPSIS</h1>
<pre><code>use Spreadsheet::Read;
my $book = ReadData ("test.csv", sep => ";");
my $book = ReadData ("test.sxc");
my $book = ReadData ("test.ods");
my $book = ReadData ("test.xls");
my $book = ReadData ("test.xlsx");
my $book = ReadData ("test.xlsm");
my $book = ReadData ("test.gnumeric");
my $book = ReadData ($fh, parser => "xls");
Spreadsheet::Read::add ($book, "sheet.csv");
my $sheet = $book->[1]; # first datasheet
my $cell = $book->[1]{A3}; # content of field A3 of sheet 1
my $cell = $book->[1]{cell}[1][3]; # same, unformatted
# OO API
my $book = Spreadsheet::Read->new ("file.csv");
my $sheet = $book->sheet (1);
my $cell = $sheet->cell ("A3");
my $cell = $sheet->cell (1, 3);
$book->add ("test.xls");</code></pre>
<h1 id="DESCRIPTION">DESCRIPTION</h1>
<p>Spreadsheet::Read tries to transparently read *any* spreadsheet and return its content in a universal manner independent of the parsing module that does the actual spreadsheet scanning.</p>
<p>The parser has to be available and is not provided by this module.</p>
<h2 id="Supported-spreadsheets">Supported spreadsheets</h2>
<h3 id="OpenOffice-and-LibreOffice-ODS-and-SXC">OpenOffice and LibreOffice (<code>ODS</code> and <code>SXC</code>)</h3>
<p>For OpenOffice and/or LibreOffice this module uses <a href="https://metacpan.org/pod/Spreadsheet::ParseODS">Spreadsheet::ParseODS</a> or <a href="https://metacpan.org/release/Spreadsheet-ReadSXC">Spreadsheet::ReadSXC</a></p>
<h3 id="Microsoft-Excel-XLSX-and-XLS">Microsoft Excel (<code>XLSX</code> and <code>XLS</code>)</h3>
<p>For Microsoft Excel this module uses <a href="https://metacpan.org/release/Spreadsheet-ParseExcel">Spreadsheet::ParseExcel</a>, <a href="https://metacpan.org/release/Spreadsheet-ParseXLSX">Spreadsheet::ParseXLSX</a>, <a href="https://metacpan.org/release/Excel-ValueReader-XLSX">Excel::ValueReader::XLSX</a>, or <a href="https://metacpan.org/release/Spreadsheet-XLSX">Spreadsheet::XLSX</a> (strongly discouraged).</p>
<h3 id="CSV-CSV">CSV (<code>CSV</code>)</h3>
<p>For CSV this module uses <a href="https://metacpan.org/release/Text-CSV_XS">Text::CSV_XS</a> or <a href="https://metacpan.org/release/Text-CSV">Text::CSV_PP</a>.</p>
<h3 id="Gnumeric-gnumeric">Gnumeric (<code>gnumeric</code>)</h3>
<p>For Gnumeric this module uses <a href="https://metacpan.org/release/Spreadsheet-ReadGnumeric">Spreadsheet::ReadGnumeric</a>.</p>
<h3 id="SquirrelCalc-sc">SquirrelCalc (<code>sc</code>)</h3>
<p>For SquirrelCalc there is a very simplistic built-in parser</p>
<h2 id="Data-structure">Data structure</h2>
<p>The data is returned as an array reference:</p>
<pre><code>$book = [
# Entry 0 is the overall control hash
{ sheets => 2,
sheet => {
"Sheet 1" => 1,
"Sheet 2" => 2,
},
parsers => [ {
type => "xls",
parser => "Spreadsheet::ParseExcel",
version => 0.59,
}],
error => undef,
},
# Entry 1 is the first sheet
{ parser => 0,
label => "Sheet 1",
maxrow => 2,
maxcol => 4,
cell => [ undef,
[ undef, 1 ],
[ undef, undef, undef, undef, undef, "Nugget" ],
],
attr => [],
merged => [],
active => 1,
hidden => 0,
A1 => 1,
B5 => "Nugget",
},
# Entry 2 is the second sheet
{ parser => 0,
label => "Sheet 2",
:
:</code></pre>
<p>To keep as close contact to spreadsheet users, row and column 1 have index 1 too in the <code>cell</code> element of the sheet hash, so cell "A1" is the same as <code>cell</code> [1, 1] (column first). To switch between the two, there are helper functions available: <code>cell2cr ()</code>, <code>cr2cell ()</code>, and <code>col2label ()</code>.</p>
<p>The <code>cell</code> hash entry contains unformatted data, while the hash entries with the traditional labels contain the formatted values (if applicable).</p>
<p>The control hash (the first entry in the returned array ref), contains some spreadsheet meta-data. The entry <code>sheet</code> is there to be able to find the sheets when accessing them by name:</p>
<pre><code>my %sheet2 = %{$book->[$book->[0]{sheet}{"Sheet 2"}]};</code></pre>
<h2 id="Formatted-vs-Unformatted">Formatted vs Unformatted</h2>
<p>The difference between formatted and unformatted cells is that the (optional) format is applied to the cell or not. This part is <b>completely</b> implemented on the parser side. Spreadsheet::Read just makes both available if these are supported. Options provide means to disable either. If the parser does not provide formatted cells - like CSV - both values are equal.</p>
<p>To show what this implies:</p>
<pre><code>use Spreadsheet::Read;
my $file = "files/example.xlsx";
my $workbook = Spreadsheet::Read->new ($file);
my $info = $workbook->[0];
say "Parsed $file with $info->{parser}-$info->{version}";
my $sheet = $workbook->sheet (1);
say join "\t" => "Formatted:", $sheet->row (1);
say join "\t" => "Unformatted:", $sheet->cellrow (1);</code></pre>
<p>Might return very different results depending one the underlying parser (and its version):</p>
<pre><code>Parsed files/example.xlsx with Spreadsheet::ParseXLSX-0.27
Formatted: 8-Aug Foo & Barr < Quux
Unformatted: 39668 Foo & Barr < Quux
Parsed files/example.xlsx with Spreadsheet::XLSX-0.15
Formatted: 39668 Foo &amp; Barr &lt; Quux
Unformatted: 39668 Foo &amp; Barr &lt; Quux</code></pre>
<h2 id="Functions-and-methods">Functions and methods</h2>
<h3 id="new">new</h3>
<pre><code>my $book = Spreadsheet::Read->new (...) or die $@;</code></pre>
<p>All options accepted by ReadData are accepted by new.</p>
<p>With no arguments at all, $book will be an object where sheets can be added using <code>add</code></p>
<pre><code>my $book = Spreadsheet::Read->new ();
$book->add ("file.csv");
$book->add ("file.cslx");</code></pre>
<h3 id="ReadData">ReadData</h3>
<pre><code>my $book = ReadData ($source [, option => value [, ... ]]);
my $book = ReadData ("file.csv", sep => ',', quote => '"');
my $book = ReadData ("file.xls", dtfmt => "yyyy-mm-dd");
my $book = ReadData ("file.ods");
my $book = ReadData ("file.sxc");
my $book = ReadData ("content.xml");
my $book = ReadData ($content);
my $book = ReadData ($content, parser => "xlsx");
my $book = ReadData ($fh, parser => "xlsx");
my $book = ReadData (\$content, parser => "xlsx");</code></pre>
<p>Tries to convert the given file, string, or stream to the data structure described above.</p>
<p>Processing Excel data from a stream or content is supported through a <a href="https://metacpan.org/release/File-Temp">File::Temp</a> temporary file or <a href="https://metacpan.org/release/IO-stringy">IO::Scalar</a> when available.</p>
<p><a href="https://metacpan.org/release/Spreadsheet-ReadSXC">Spreadsheet::ReadSXC</a> does preserve sheet order as of version 0.20.</p>
<p>Choosing between <code>$content</code> and <code>\\$content</code> (with or without passing the desired <code>parser</code> option) may be depending on trial and terror. <code>ReadData</code> does try to determine parser type on content if needed, but not all combinations are checked, and not all signatures are builtin.</p>
<p>Currently supported options are:</p>
<dl>
<dt id="parser">parser </dt>
<dd>
<p>Force the data to be parsed by a specific format. Possible values are <code>csv</code>, <code>prl</code> (or <code>perl</code>), <code>sc</code> (or <code>squirrelcalc</code>), <code>sxc</code> (or <code>oo</code>, <code>ods</code>, <code>openoffice</code>, <code>libreoffice</code>) <code>xls</code> (or <code>excel</code>), and <code>xlsx</code> (or <code>excel2007</code>).</p>
<p>When parsing streams, instead of files, it is highly recommended to pass this option.</p>
<p>Spreadsheet::Read supports several underlying parsers per spreadsheet type. It will try those from most favored to least favored. When you have a good reason to prefer a different parser, you can set that in environment variables. The other options then will not be tested for:</p>
<pre><code>env SPREADSHEET_READ_CSV=Text::CSV_PP ...</code></pre>
<p>You can also directly pass the required backend, forcing the matching type, but this excludes version checking.</p>
<pre><code># Checks for minimal version
BEGIN { $ENV{SPREADSHEET_READ_CSV} = "Text::CSV_PP" }
my $book = ReadData ("test.csv", parser => "csv");
vs
# NO check for minimal version
my $book = ReadData ("test.csv", parser => "Text::CSV_PP");</code></pre>
</dd>
<dt id="cells">cells </dt>
<dd>
<p>Control the generation of named cells ("<code>A1</code>" etc). Default is true.</p>
</dd>
<dt id="rc">rc</dt>
<dd>
<p>Control the generation of the {cell}[c][r] entries. Default is true.</p>
</dd>
<dt id="attr">attr</dt>
<dd>
<p>Control the generation of the {attr}[c][r] entries. Default is false. See <a href="#Cell-Attributes">"Cell Attributes"</a> below.</p>
</dd>
<dt id="clip">clip</dt>
<dd>
<p>If set, <a href="#ReadData"><code>ReadData</code></a> will remove all trailing rows and columns per sheet that have no data, where no data means only undefined or empty cells (after optional stripping). If a sheet has no data at all, the sheet will be skipped entirely when this attribute is true.</p>
</dd>
<dt id="trim">trim</dt>
<dd>
</dd>
<dt id="strip">strip</dt>
<dd>
<p>If set, <a href="#ReadData"><code>ReadData</code></a> will remove trailing- and/or leading-whitespace from every field.</p>
<pre><code>strip leading strailing
----- ------- ---------
0 n/a n/a
1 strip n/a
2 n/a strip
3 strip strip</code></pre>
<p><code>trim</code> and <code>strip</code> are aliases. If passed both, <code>trim</code> is ignored because of backward compatibility.</p>
</dd>
<dt id="transpose">transpose</dt>
<dd>
</dd>
<dt id="pivot">pivot</dt>
<dd>
<p>Swap all rows and columns.</p>
<p>When a sheet contains data like</p>
<pre><code>A1 B1 C1 E1
A2 C2 D2
A3 B3 C3 D3 E3</code></pre>
<p>using <code>transpose</code> or <code>pivot</code> will return the sheet data as</p>
<pre><code>A1 A2 A3
B1 B3
C1 C2 C3
D2 D3
E1 E3</code></pre>
<p><code>transpose</code> and <code>pivot</code> are aliases. If passed both, <code>transpose</code> is ignored because of backward compatibility.</p>
</dd>
<dt id="sep">sep</dt>
<dd>
<p>Set separator for CSV. Default is comma <code>,</code>.</p>
</dd>
<dt id="quote">quote</dt>
<dd>
<p>Set quote character for CSV. Default is <code>"</code>.</p>
</dd>
<dt id="dtfmt">dtfmt</dt>
<dd>
<p>Set the format for MS-Excel date fields that are set to use the default date format. The default format in Excel is "<code>m-d-yy</code>", which is both not year 2000 safe, nor very useful. The default is now "<code>yyyy-mm-dd</code>", which is more ISO-like.</p>
<p>Note that date formatting in MS-Excel is not reliable at all, as it will store/replace/change the date field separator in already stored formats if you change your locale settings. So the above mentioned default can be either "<code>m-d-yy</code>" OR "<code>m/d/yy</code>" depending on what that specific character happened to be at the time the user saved the file.</p>
</dd>
<dt id="merge">merge</dt>
<dd>
<p>Copy content to all cells in merged areas.</p>
<p>If supported, this will copy formatted and unformatted values from the top-left cell of a merged area to all other cells in the area.</p>
</dd>
<dt id="debug">debug</dt>
<dd>
<p>Enable some diagnostic messages to STDERR.</p>
<p>The value determines how much diagnostics are dumped (using <a href="https://metacpan.org/release/Data-Peek">Data::Peek</a>). A value of <code>9</code> and higher will dump the entire structure from the back-end parser.</p>
</dd>
<dt id="passwd">passwd</dt>
<dd>
<p>Use this password to decrypt password protected spreadsheet.</p>
<p>Currently only supports Excel.</p>
</dd>
</dl>
<p>All other attributes/options will be passed to the underlying parser if that parser supports attributes.</p>
<h3 id="col2label">col2label</h3>
<pre><code>my $col_id = col2label (col);
my $col_id = $book->col2label (col); # OO</code></pre>
<p><code>col2label ()</code> converts a <code>(column)</code> (1 based) to the letters used in the traditional cell notation:</p>
<pre><code>my $id = col2label ( 4); # $id now "D"
my $id = col2label (28); # $id now "AB"</code></pre>
<h3 id="cr2cell">cr2cell</h3>
<pre><code>my $cell = cr2cell (col, row);
my $cell = $book->cr2cell (col, row); # OO</code></pre>
<p><code>cr2cell ()</code> converts a <code>(column, row)</code> pair (1 based) to the traditional cell notation:</p>
<pre><code>my $cell = cr2cell ( 4, 14); # $cell now "D14"
my $cell = cr2cell (28, 4); # $cell now "AB4"</code></pre>
<h3 id="cell2cr">cell2cr</h3>
<pre><code>my ($col, $row) = cell2cr ($cell);
my ($col, $row) = $book->cell2cr ($cell); # OO</code></pre>
<p><code>cell2cr ()</code> converts traditional cell notation to a <code>(column, row)</code> pair (1 based):</p>
<pre><code>my ($col, $row) = cell2cr ("D14"); # returns ( 4, 14)
my ($col, $row) = cell2cr ("AB4"); # returns (28, 4)</code></pre>
<h3 id="row">row</h3>
<pre><code>my @row = row ($sheet, $row)
my @row = Spreadsheet::Read::row ($book->[1], 3);
my @row = $book->row ($sheet, $row); # OO</code></pre>
<p>Get full row of formatted values (like <code>$sheet->{A3} .. $sheet->{G3}</code>)</p>
<p>Note that the indexes in the returned list are 0-based.</p>
<p><code>row ()</code> is not imported by default, so either specify it in the use argument list, or call it fully qualified.</p>
<p>See also the <code>row ()</code> method on sheets.</p>
<h3 id="cellrow">cellrow</h3>
<pre><code>my @row = cellrow ($sheet, $row);
my @row = Spreadsheet::Read::cellrow ($book->[1], 3);
my @row = $book->cellrow ($sheet, $row); # OO</code></pre>
<p>Get full row of unformatted values (like <code>$sheet->{cell}[1][3] .. $sheet->{cell}[7][3]</code>)</p>
<p>Note that the indexes in the returned list are 0-based.</p>
<p><code>cellrow ()</code> is not imported by default, so either specify it in the use argument list, or call it fully qualified or as method call.</p>
<p>See also the <code>cellrow ()</code> method on sheets.</p>
<h3 id="rows">rows</h3>
<pre><code>my @rows = rows ($sheet);
my @rows = Spreadsheet::Read::rows ($book->[1]);
my @rows = $book->rows (1); # OO</code></pre>
<p>Convert <code>{cell}</code>'s <code>[column][row]</code> to a <code>[row][column]</code> list.</p>
<p>Note that the indexes in the returned list are 0-based, where the index in the <code>{cell}</code> entry is 1-based.</p>
<p><code>rows ()</code> is not imported by default, so either specify it in the use argument list, or call it fully qualified.</p>
<h3 id="parses">parses</h3>
<pre><code>parses ($format);
Spreadsheet::Read::parses ("CSV");
$book->parses ("CSV"); # OO</code></pre>
<p><code>parses ()</code> returns Spreadsheet::Read's capability to parse the required format or <code>0</code> if it does not. <a href="#ReadData"><code>ReadData</code></a> will pick its preferred parser for that format unless overruled. See <a href="#parser"><code>parser</code></a>.</p>
<p><code>parses ()</code> is not imported by default, so either specify it in the use argument list, or call it fully qualified.</p>
<p>If <code>$format</code> is false (<code>undef</code>, <code>""</code>, or <code>0</code>), <code>parses ()</code> will return a sorted list of supported types.</p>
<pre><code>@my types = parses (""); # e.g: csv, ods, sc, sxc, xls, xlsx</code></pre>
<h3 id="parsers">parsers</h3>
<pre><code>my @p = parsers ();</code></pre>
<p><code>parsers ()</code> returns a list of hashrefs with information about supported parsers, each giving information about the parser, its versions and if it will be used as default parser for the given type, like:</p>
<pre><code>{ ext => "csv", # extension or type
mod => "Text::CSV_XS", # parser module
min => "0.71", # module required version
vsn => "1.45", # module installed version
def => "*", # is default for ext
}</code></pre>
<p>As the modules are actually loaded to get their version, do only use this to analyse prerequisites.</p>
<h3 id="Version">Version</h3>
<pre><code>my $v = Version ()
my $v = Spreadsheet::Read::Version ()
my $v = Spreadsheet::Read->VERSION;
my $v = $book->Version (); # OO</code></pre>
<p>Returns the current version of Spreadsheet::Read.</p>
<p><code>Version ()</code> is not imported by default, so either specify it in the use argument list, or call it fully qualified.</p>
<p>This function returns exactly the same as <code>Spreadsheet::Read->VERSION</code> returns and is only kept for backward compatibility reasons.</p>
<h3 id="sheets">sheets</h3>
<pre><code>my $sheets = $book->sheets; # OO
my @sheets = $book->sheets; # OO</code></pre>
<p>In scalar context return the number of sheets in the book.</p>
<p>In list context return the labels of the sheets in the book. This list only returns known unique labels in sorted order. Sheets could have no label or there can be more sheets with the same label (depends on the spreadsheet format and the parser used).</p>
<h3 id="sheet">sheet</h3>
<pre><code>my $sheet = $book->sheet (1); # OO
my $sheet = $book->sheet ("Foo"); # OO</code></pre>
<p>Return the numbered or named sheet out of the book. Will return <code>undef</code> if there is no match. Will not work for sheets <i>named</i> with a number between 1 and the number of sheets in the book.</p>
<p>With named sheets will first try to use the list of sheet-labels as stored in the control structure. If no match is found, it will scan the actual labels of the sheets. In that case, it will return the first matching sheet.</p>
<p>If defined, the returned sheet will be of class <code>Spreadsheet::Read::Sheet</code>.</p>
<h3 id="add">add</h3>
<pre><code>my $book = ReadData ("file.csv");
Spreadsheet::Read::add ($book, "file.xlsx");
my $book = Spreadsheet::Read->new ("file.csv");
$book->add ("file.xlsx"); # OO</code></pre>
<h2 id="Methods-on-sheets">Methods on sheets</h2>
<h3 id="maxcol">maxcol</h3>
<pre><code>my $col = $sheet->maxcol;</code></pre>
<p>Return the index of the last in-use column in the sheet. This index is 1-based.</p>
<h3 id="maxrow">maxrow</h3>
<pre><code>my $row = $sheet->maxrow;</code></pre>
<p>Return the index of the last in-use row in the sheet. This index is 1-based.</p>
<h3 id="cell">cell</h3>
<pre><code>my $cell = $sheet->cell ("A3");
my $cell = $sheet->cell (1, 3);</code></pre>
<p>Return the value for a cell. Using tags will return the formatted value, using column and row will return unformatted value.</p>
<h3 id="attr1">attr</h3>
<pre><code>my $cell = $sheet->attr ("A3");
my $cell = $sheet->attr (1, 3);</code></pre>
<p>Return the attributes of a cell. Only valid if attributes are enabled through option <code>attr</code>.</p>
<h3 id="col2label1">col2label</h3>
<pre><code>my $col_id = $sheet->col2label (col);</code></pre>
<p><code>col2label ()</code> converts a <code>(column)</code> (1 based) to the letters used in the traditional cell notation:</p>
<pre><code>my $id = $sheet->col2label ( 4); # $id now "D"
my $id = $sheet->col2label (28); # $id now "AB"</code></pre>
<h3 id="cr2cell1">cr2cell</h3>
<pre><code>my $cell = $sheet->cr2cell (col, row);</code></pre>
<p><code>cr2cell ()</code> converts a <code>(column, row)</code> pair (1 based) to the traditional cell notation:</p>
<pre><code>my $cell = $sheet->cr2cell ( 4, 14); # $cell now "D14"
my $cell = $sheet->cr2cell (28, 4); # $cell now "AB4"</code></pre>
<h3 id="cell2cr1">cell2cr</h3>
<pre><code>my ($col, $row) = $sheet->cell2cr ($cell);</code></pre>
<p><code>cell2cr ()</code> converts traditional cell notation to a <code>(column, row)</code> pair (1 based):</p>
<pre><code>my ($col, $row) = $sheet->cell2cr ("D14"); # returns ( 4, 14)
my ($col, $row) = $sheet->cell2cr ("AB4"); # returns (28, 4)</code></pre>
<h3 id="col">col</h3>
<pre><code>my @col = $sheet->column ($col);</code></pre>
<p>Get full column of formatted values (like <code>$sheet->{C1} .. $sheet->{C9}</code>)</p>
<p>Note that the indexes in the returned list are 0-based.</p>
<h3 id="cellcolumn">cellcolumn</h3>
<pre><code>my @col = $sheet->cellcolumn ($col);</code></pre>
<p>Get full column of unformatted values (like <code>$sheet->{cell}[3][1] .. $sheet->{cell}[3][9]</code>)</p>
<p>Note that the indexes in the returned list are 0-based.</p>
<h3 id="row1">row</h3>
<pre><code>my @row = $sheet->row ($row);</code></pre>
<p>Get full row of formatted values (like <code>$sheet->{A3} .. $sheet->{G3}</code>)</p>
<p>Note that the indexes in the returned list are 0-based.</p>
<h3 id="cellrow1">cellrow</h3>
<pre><code>my @row = $sheet->cellrow ($row);</code></pre>
<p>Get full row of unformatted values (like <code>$sheet->{cell}[1][3] .. $sheet->{cell}[7][3]</code>)</p>
<p>Note that the indexes in the returned list are 0-based.</p>
<h3 id="cellrange">cellrange</h3>
<pre><code>my $arrayref = $sheet->cellrange ("B3:D5");
my $arrayref = $sheet->cellrange (2, 3, 4, 5);
my $arrayref = $sheet->cellrange (-4, -5, -1, -1);</code></pre>
<p>Return an arrayref with the selected cells from <code>$sheet->{cell}</code>. When the range is given as (top-left, bottom-right) numeric CR pairs, negative values are allowed (count from rigth/bottom) and automatically clipped to be inside the existing data set.</p>
<h3 id="range">range</h3>
<pre><code>my $hashref = $sheet->range ("B3:D5");
my $hashref = $sheet->range (2, 3, 4, 5);
my $hashref = $sheet->range (-4, -5, -1, -1);</code></pre>
<p>Return a hashref with all the fields in the given range. When the range is given as (top-left, bottom-right) numeric CR pairs, negative values are allowed (count from rigth/bottom) and automatically clipped to be inside the existing data set.</p>
<h3 id="rows1">rows</h3>
<pre><code>my @rows = $sheet->rows ();</code></pre>
<p>Convert <code>{cell}</code>'s <code>[column][row]</code> to a <code>[row][column]</code> list.</p>
<p>Note that the indexes in the returned list are 0-based, where the index in the <code>{cell}</code> entry is 1-based.</p>
<h3 id="merged_from">merged_from</h3>
<pre><code>my $top_left = $sheet->merged_from ("C2");
my $top_left = $sheet->merged_from (3, 2);</code></pre>
<p>If the parser supports merged areas, this method will return the label of the top-left cell in the merged area the requested cell is part of.</p>
<p>If the requested ID is valid and withing the sheet cell range, but not part of a merged area, it will return <code>""</code>.</p>
<p>If the ID is not valid or out of range, it returns <code>undef</code>.</p>
<p>See <a href="#merged">Merged cells</a> for more details.</p>
<h3 id="label">label</h3>
<pre><code>my $label = $sheet->label;
$sheet->label ("New sheet label");</code></pre>
<p>Set a new label to a sheet. Note that the index in the control structure will <i>NOT</i> be updated.</p>
<h3 id="active">active</h3>
<pre><code>my $sheet_is_active = $sheet->active;</code></pre>
<p>Returns 1 if the selected sheet is active, otherwise returns 0.</p>
<p>Currently only works on XLS (as of Spreadsheed::ParseExcel-0.61). CSV is always active.</p>
<h3 id="hidden">hidden</h3>
<pre><code>my $sheet_is_hidden = $sheet->hidden;</code></pre>
<p>Returns 1 if the selected sheet is hidden, otherwise returns 0.</p>
<p>Fully depends on the backend supporting this. CSV and SC are never hidden.</p>
<h2 id="Using-CSV">Using CSV</h2>
<p>In case of CSV parsing, <a href="#ReadData"><code>ReadData</code></a> will use the first line of the file to auto-detect the separation character if the first argument is a file and both <code>sep</code> and <code>quote</code> are not passed as attributes. <a href="https://metacpan.org/release/Text-CSV_XS">Text::CSV_XS</a> (or <a href="https://metacpan.org/release/Text-CSV">Text::CSV_PP</a>) is able to automatically detect and use <code>\r</code> line endings.</p>
<p>CSV can parse streams too, but be sure to pass <code>sep</code> and/or <code>quote</code> if these do not match the default <code>,</code> and <code>"</code>.</p>
<p>When an error is found in the CSV, it is automatically reported (to STDERR). The structure will store the error in <code>$ss->[0]{error}</code> as anonymous list returned by <a href="https://metacpan.org/pod/Text::CSV_XS#error_diag"><code>$csv->error_diag</code></a>. See <a href="https://metacpan.org/pod/Text::CSV_XS">Text::CSV_XS</a> for documentation.</p>
<pre><code>my $ss = ReadData ("bad.csv");
$ss->[0]{error} and say $ss->[0]{error}[1];</code></pre>
<p>As CSV has no sheet labels, the default label for a CSV sheet is its filename. For CSV, this can be overruled using the <i>label</i> attribute:</p>
<pre><code>my $ss = Spreadsheet::Read->new ("/some/place/test.csv", label => "Test");</code></pre>
<h2 id="Cell-Attributes">Cell Attributes </h2>
<p>If the constructor was called with <code>attr</code> having a true value,</p>
<pre><code>my $book = ReadData ("book.xls", attr => 1);
my $book = Spreadsheet::Read->new ("book.xlsx", attr => 1);</code></pre>
<p>effort is made to analyze and store field attributes like this:</p>
<pre><code>{ label => "Sheet 1",
maxrow => 5,
maxcol => 2,
cell => [ undef,
[ undef, 1 ],
[ undef, undef, undef, undef, undef, "Nugget" ],
],
attr => [ undef,
[ undef, {
type => "numeric",
fgcolor => "#ff0000",
bgcolor => undef,
font => "Arial",
size => undef,
format => "## ##0.00",
halign => "right",
valign => "top",
uline => 0,
bold => 0,
italic => 0,
wrap => 0,
merged => 0,
hidden => 0,
locked => 0,
enc => "utf-8",
}, ],
[ undef, undef, undef, undef, undef, {
type => "text",
fgcolor => "#e2e2e2",
bgcolor => undef,
font => "Letter Gothic",
size => 15,
format => undef,
halign => "left",
valign => "top",
uline => 0,
bold => 0,
italic => 0,
wrap => 0,
merged => 0,
hidden => 0,
locked => 0,
enc => "iso8859-1",
}, ],
],
merged => [],
A1 => 1,
B5 => "Nugget",
},</code></pre>
<p>The entries <code>maxrow</code> and <code>maxcol</code> are 1-based.</p>
<p>This has now been partially implemented, mainly for Excel, as the other parsers do not (yet) support all of that. YMMV.</p>
<p>If a cell itself is not hidden, but the parser holds the information that either the row or the column (or both) the field is in is hidden, the flag is inherited into the cell attributes.</p>
<p>You can get the attributes of a cell (as a hash-ref) like this:</p>
<pre><code>my $attr = $book[1]{attr}[1][3]; # Direct structure
my $attr = $book->sheet (1)->attr (1, 3); # Same using OO
my $attr = $book->sheet (1)->attr ("A3"); # Same using OO</code></pre>
<p>To get to the <code>font</code> attribute, use any of these:</p>
<pre><code>my $font = $book[1]{attr}[1][3]{font};
my $font = $book->sheet (1)->attr (1, 3)->{font};
my $font = $book->sheet (1)->attr ("A3")->font;</code></pre>
<h3 id="Merged-cells">Merged cells </h3>
<p>Note that only <a href="https://metacpan.org/release/Spreadsheet-ReadSXC">Spreadsheet::ReadSXC</a> documents the use of merged cells, and not in a way useful for the spreadsheet consumer.</p>
<p>CSV does not support merged cells (though future implementations of CSV for the web might).</p>
<p>The documentation of merged areas in <a href="https://metacpan.org/release/Spreadsheet-ParseExcel">Spreadsheet::ParseExcel</a> and <a href="https://metacpan.org/release/Spreadsheet-ParseXLSX">Spreadsheet::ParseXLSX</a> can be found in <a href="https://metacpan.org/pod/Spreadsheet::ParseExcel::Worksheet">Spreadsheet::ParseExcel::Worksheet</a> and <a href="https://metacpan.org/pod/Spreadsheet::ParseExcel::Cell">Spreadsheet::ParseExcel::Cell</a>.</p>
<p>None of basic <a href="https://metacpan.org/release/Spreadsheet-XLSX">Spreadsheet::XLSX</a>, <a href="https://metacpan.org/release/Spreadsheet-ParseExcel">Spreadsheet::ParseExcel</a>, and <a href="https://metacpan.org/release/Spreadsheet-ParseXLSX">Spreadsheet::ParseXLSX</a> manual pages mention merged cells at all.</p>
<p>This module just tries to return the information in a generic way.</p>
<p>Given this spreadsheet as an example</p>
<pre><code>merged.xlsx:
A B C
+-----+-----------+
1| | foo |
+-----+ +
2| bar | |
| +-----+-----+
3| | urg | orc |
+-----+-----+-----+</code></pre>
<p>the information extracted from that undocumented information is returned in the <code>merged</code> entry of the sheet's hash as a list of top-left, bottom-right coordinate pars (col, row, col, row). For given example, that would be:</p>
<pre><code>$ss->{merged} = [
[ 1, 2, 1, 3 ], # A2-A3
[ 2, 1, 3, 2 ], # B1-C2
];</code></pre>
<p>To find the label of the top-left cell in a merged area, use the <a href="#merged_from"><code>merged_from</code></a> method.</p>
<pre><code>$ss->merged_from ("C2"); # will return "B1"</code></pre>
<p>When the attributes are also enabled, there is some merge information copied directly from the cell information, but again, that stems from code analysis and not from documentation:</p>
<pre><code>my $ss = ReadData ("merged.xlsx", attr => 1)->[1];
foreach my $row (1 .. $ss->{maxrow}) {
foreach my $col (1 .. $ss->{maxcol}) {
my $cell = cr2cell ($col, $row);
printf "%s %-3s %s ", $cell, $ss->{$cell},
$ss->{attr}[$col][$row]{merged};
}
print "\n";
}
A1 0 B1 foo 1 C1 1
A2 bar 1 B2 1 C2 1
A3 1 B3 urg 0 C3 orc 0</code></pre>
<p>In this example, there is no way to see if <code>B2</code> is merged to <code>A2</code> or to <code>B1</code> without analyzing all surrounding cells. This could as well mean <code>A2:A3</code>, <code>B1:C1</code>, <code>B2:C2</code>, as <code>A2:A3</code>, <code>B1:B2</code>, <code>C1:C2</code>, as <code>A2:A3</code>, <code>B1:C2</code>.</p>
<p>Use the <a href="#merged"><code>merged</code></a> entry described above to find out what fields are merged to what other fields or use <code>merge</code>:</p>
<pre><code>my $ss = ReadData ("merged.xlsx", attr => 1, merge => 1)->[1];
foreach my $row (1 .. $ss->{maxrow}) {
foreach my $col (1 .. $ss->{maxcol}) {
my $cell = cr2cell ($col, $row);
printf "%s %-3s %s ", $cell, $ss->{$cell},
$ss->{attr}[$col][$row]{merged};
}
print "\n";
}
A1 0 B1 foo B1 C1 foo B1
A2 bar A2 B2 foo B1 C2 foo B1
A3 bar A2 B3 urg 0 C3 orc 0</code></pre>
<h2 id="Streams-from-web-resources">Streams from web-resources</h2>
<p>If you want to stream a web-resource, and the underlying parser supports it, you could use a helper function like this (thanks Corion):</p>
<pre><code>use HTTP::Tiny;
use Spreadsheet::Read;
# Fetch data and return a filehandle to that data
sub fh_from_url {
my $url = shift;
my $ua = HTTP::Tiny->new;
my $res = $ua->get ($url);
open my $fh, "<", \$res->{content};
return $fh
} # fh_from_url
my $fh = fh_from_url ("http://example.com/example.csv");
my $sheet = Spreadsheet::Read->new ($fh, parser => "csv");</code></pre>
<h1 id="TOOLS">TOOLS</h1>
<p>This modules comes with a few tools that perform tasks from the FAQ, like "How do I select only column D through F from sheet 2 into a CSV file?"</p>
<p>If the module was installed without the tools, you can find them here: https://github.com/Tux/Spreadsheet-Read/tree/master/scripts</p>
<h2 id="xlscat"><code>xlscat</code></h2>
<p>Show (parts of) a spreadsheet in plain text, CSV, or HTML</p>
<pre><code>usage: xlscat [-s <sep>] [-L] [-n] [-A] [-u] [Selection] file.xls
[-c | -m] [-u] [Selection] file.xls
-i [-S sheets] file.xls
Generic options:
-v[#] Set verbose level (xlscat/xlsgrep)
-d[#] Set debug level (Spreadsheet::Read)
--list Show supported spreadsheet formats and exit
-u Use unformatted values
--strip[=#] Strip leading and/or traing spaces of all cells
# & 01 = leading, # & 02 = trailing, 3 = default
--clip=# Clip cells to max length #
--noclip Do not strip empty sheets and
trailing empty rows and columns
--no-empty Skip empty rows
--no-nl[=R] Replace all newlines in cells with R (default space)
-e <enc> Set encoding for input and output
-b <enc> Set encoding for input
-a <enc> Set encoding for output
-U Set encoding for output to utf-8 (short for -a utf-8)
Input CSV:
--in-sep=c Set input sep_char for CSV (c can be 'TAB')
Input XLS:
--dtfmt=fmt Specify the default date format to replace 'm-d-yy'
the default replacement is 'yyyy-mm-dd'
--passwd=pw Specify the password for workbook
if pw = -, read password from keyboard
--formulas Show the formula instead of the value
Output Text (default):
-s <sep> Use separator <sep>. Default '|', \n allowed
Overrules ',' when used with --csv
-L Line up the columns
-B --box Like -L but also add outer frame
-n [skip] Number lines (prefix with column number)
optionally skip <skip> (header) lines
-A Show field attributes in ANSI escapes
-h[#] Show # header lines
-D Dump each record with Data::Peek or Data::Dumper
--hash Like -D but as hash with first row as keys
Output CSV:
-c Output CSV, separator = ','
-m Output CSV, separator = ';'
Output Index only:
-i Show sheet names and size only
Output HTML:
-H Output HTML
Selection:
-S <sheets> Only print sheets <sheets>. 'all' is a valid set
Default only prints the first sheet
-R <rows> Only print rows <rows>. Default is 'all'
Ranges and lists supported as 2,4-7,8-
Trailing - is to end of data
Negative rows count from tail -8--2 is allowed
--head[=n] Alias for -R1..n where n defaults to 10
--tail[=n] Alias for -R-n- where n defaults to 10
-C <cols> Only print columns <cols>. Default is 'all'
-F <flds> Only fields <flds> e.g. -FA3,B16
Ordering (column numbers in result set *after* selection):
--sort=spec Sort output (e.g. --sort=3,2r,5n,1rn+2)
+# - first # lines do not sort (header)
# - order on column # lexical ascending
#n - order on column # numeric ascending
#r - order on column # lexical descending
#rn - order on column # numeric descending
Examples:
xlscat -i foo.xls
xlscat --in-sep=: --sort=3n -L /etc/passwd
xlsgrep pattern file.ods</code></pre>
<h2 id="xlsgrep"><code>xlsgrep</code></h2>
<p>Show (parts of) a spreadsheet that match a pattern in plain text, CSV, or HTML</p>
<pre><code>usage: xlsgrep [-s <sep>] [-L] [-n] [-A] [-u] [Selection] pattern file.xls
[-c | -m] [-u] [Selection] pattern file.xls
-i [-S sheets] pattern file.xls
Generic options:
-v[#] Set verbose level (xlscat/xlsgrep)
-d[#] Set debug level (Spreadsheet::Read)
--list Show supported spreadsheet formats and exit
-u Use unformatted values
--strip[=#] Strip leading and/or traing spaces of all cells
# & 01 = leading, # & 02 = trailing, 3 = default
--clip=# Clip cells to max length #
--noclip Do not strip empty sheets and
trailing empty rows and columns
--no-empty Skip empty rows
--no-nl[=R] Replace all newlines in cells with R (default space)
-e <enc> Set encoding for input and output
-b <enc> Set encoding for input