Skip to content

Commit

Permalink
add indexed-read doc
Browse files Browse the repository at this point in the history
  • Loading branch information
osiegmar committed Aug 29, 2024
1 parent 6d21441 commit 76fa060
Show file tree
Hide file tree
Showing 2 changed files with 64 additions and 27 deletions.
47 changes: 47 additions & 0 deletions docs/src/content/docs/guides/Examples/indexed-read.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
---
title: Indexed reading
---

import {Code} from '@astrojs/starlight/components';
import sourceCode from '../../../../../../example/src/main/java/example/ExampleIndexedCsvReader.java?raw';

CSV files are most commonly read sequentially from start to end. In some situations, like in graphical user interfaces,
you might need to read specific rows directly without having to read the entire file. FastCSV supports indexed reading
of CSV files, which allows you to navigate back and forth in the file in a random-access manner.

As CSV files do not include an index, FastCSV creates an index while reading the file. This index is stored in memory
and allows you to access rows directly. The index can optionally be stored in a file to avoid creating it every time
you read the CSV file.

:::note
The indexing process runs in the background while reading the CSV file. It is non-blocking, allowing you to start
reading the file while the index is still being created.
A status monitor is available to track indexing progress, making it ideal for graphical user interfaces.
:::

The main API classes and methods for indexed reading are:

```java
// Index the CSV file with up to 5 records per page
IndexedCsvReader<CsvRecord> csv = IndexedCsvReader.builder()
.pageSize(5)
.ofCsvRecord(file);

try (csv) {
// Find the last page in the index
int lastPage = csv.getIndex().getPageCount() - 1;

// Output the last page
List<CsvRecord> lastPageRecords = csv.readPage(lastPage);
lastPageRecords.forEach(System.out::println);
}
```

## Example

The following example demonstrates how to read a CSV file using FastCSV's indexed reader.

<Code code={sourceCode} title="ExampleIndexedCsvReader.java" lang="java" />

You also find this source code example in the
[FastCSV GitHub repository](https://github.com/osiegmar/FastCSV/blob/main/example/src/main/java/example/ExampleIndexedCsvReader.java).
44 changes: 17 additions & 27 deletions example/src/main/java/example/ExampleIndexedCsvReader.java
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@
import java.util.concurrent.TimeUnit;

import de.siegmar.fastcsv.reader.CollectingStatusListener;
import de.siegmar.fastcsv.reader.CommentStrategy;
import de.siegmar.fastcsv.reader.CsvIndex;
import de.siegmar.fastcsv.reader.CsvRecord;
import de.siegmar.fastcsv.reader.IndexedCsvReader;
Expand All @@ -26,22 +25,26 @@ public static void main(final String[] args) throws Exception {
simple(tmpFile);
reuseIndex(tmpFile);
statusMonitor(tmpFile);
advancedConfiguration(tmpFile);
}

private static Path prepareTestFile(final Duration timeToWrite) throws IOException {
private static Path prepareTestFile(final Duration timeToWrite)
throws IOException {

final Path tmpFile = createTmpFile();

int record = 1;
final long writeUntil = System.currentTimeMillis() + timeToWrite.toMillis();
final long writeUntil = System.currentTimeMillis()
+ timeToWrite.toMillis();

try (CsvWriter csv = CsvWriter.builder().build(tmpFile)) {
for (; System.currentTimeMillis() < writeUntil; record++) {
csv.writeRecord("record " + record, "containing standard ASCII, unicode letters öäü and emojis 😎");
csv.writeRecord("record " + record,
"containing ASCII, some umlauts öäü and an emoji 😎");
}
}

System.out.format("Temporary test file with %,d records and %,d bytes successfully prepared%n%n",
System.out.format("Temporary test file with %,d records and "
+ "%,d bytes successfully prepared%n%n",
record - 1, Files.size(tmpFile));

return tmpFile;
Expand Down Expand Up @@ -102,12 +105,15 @@ private static void reuseIndex(final Path file) throws IOException {
}

private static void statusMonitor(final Path file) throws IOException {
System.out.printf("# Read file with a total of %,d bytes%n", Files.size(file));
System.out.printf("# Read file with %,d bytes%n", Files.size(file));

final var statusListener = new CollectingStatusListener();

// Using the StatusListener, we can monitor the indexing process in the background
final var executor = Executors.newSingleThreadScheduledExecutor();
// Using the StatusListener, we can monitor the
// indexing process in the background
final var executor = Executors
.newSingleThreadScheduledExecutor();

executor.scheduleAtFixedRate(
() -> {
if (statusListener.isCompleted()) {
Expand All @@ -123,27 +129,11 @@ private static void statusMonitor(final Path file) throws IOException {
.ofCsvRecord(file);

try (csv) {
System.out.printf("Indexed %,d records%n", csv.getIndex().getRecordCount());
System.out.printf("Indexed %,d records%n",
csv.getIndex().getRecordCount());
}

System.out.println();
}

private static void advancedConfiguration(final Path file) throws IOException {
final IndexedCsvReader<CsvRecord> csv = IndexedCsvReader.builder()
.fieldSeparator(',')
.quoteCharacter('"')
.commentStrategy(CommentStrategy.NONE)
.commentCharacter('#')
.pageSize(5)
.ofCsvRecord(file);

try (csv) {
final List<CsvRecord> csvRecords = csv.readPage(2);

System.out.println("Parsed via advanced config:");
csvRecords.forEach(System.out::println);
}
}

}

0 comments on commit 76fa060

Please sign in to comment.