Skip to content

Commit

Permalink
Fix typos (#46)
Browse files Browse the repository at this point in the history
* Fix typos

* Fix broken link, use https instead of http
  • Loading branch information
deining authored Mar 11, 2024
1 parent b3d793b commit e79b304
Show file tree
Hide file tree
Showing 6 changed files with 13 additions and 13 deletions.
2 changes: 1 addition & 1 deletion content/en/docs/Contribution Guidelines/contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ If you’d like to report a bug but don’t have time to fix it, you can still p
Committers
----------

Merging a pull request requires being a comitter on the project.
Merging a pull request requires being a committer on the project.

How to merge a Pull request (have an apache and github-apache remote setup):

Expand Down
4 changes: 2 additions & 2 deletions content/en/docs/Contribution Guidelines/releasing.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ If you have problems, read the [publishing Maven artifacts documentation](https:

### Release process

Parquet uses the maven-release-plugin to tag a release and push binary artifacts to staging in Nexus. Once maven completes the release, the offical source tarball is built from the tag.
Parquet uses the maven-release-plugin to tag a release and push binary artifacts to staging in Nexus. Once maven completes the release, the official source tarball is built from the tag.

Before you start the release process:

Expand Down Expand Up @@ -153,7 +153,7 @@ Then add and commit the release artifacts:

#### 4\. Update parquet.apache.org

Update the downloads page on parquet.apache.org. Instructions for updating the site are on the [contribution page](http://parquet.apache.org/docs/contribution-guidelines/contributing/).
Update the downloads page on parquet.apache.org. Instructions for updating the site are on the [contribution page](https://parquet.apache.org/docs/contribution-guidelines/contributing/).

#### 5\. Send an ANNOUNCE e-mail to [[email protected]](mailto:[email protected]) and the dev list

Expand Down
6 changes: 3 additions & 3 deletions content/en/docs/File Format/Data Pages/compression.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ that writers refrain from creating such pages by default for better interoperabi
### LZO

A codec based on or interoperable with the
[LZO compression library](http://www.oberhumer.com/opensource/lzo/).
[LZO compression library](https://www.oberhumer.com/opensource/lzo/).

### BROTLI

Expand All @@ -73,11 +73,11 @@ switch to the newer, interoperable `LZ4_RAW` codec.
A codec based on the Zstandard format defined by
[RFC 8478](https://tools.ietf.org/html/rfc8478). If any ambiguity arises
when implementing this format, the implementation provided by the
[ZStandard compression library](https://facebook.github.io/zstd/)
[Zstandard compression library](https://facebook.github.io/zstd/)
is authoritative.

### LZ4_RAW

A codec based on the [LZ4 block format](https://github.com/lz4/lz4/blob/dev/doc/lz4_Block_format.md).
If any ambiguity arises when implementing this format, the implementation
provided by the [LZ4 compression library](http://www.lz4.org/) is authoritative.
provided by the [LZ4 compression library](https://www.lz4.org/) is authoritative.
4 changes: 2 additions & 2 deletions content/en/docs/File Format/Data Pages/encodings.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,7 +158,7 @@ repetition and definition levels.
Supported Types: INT32, INT64

This encoding is adapted from the Binary packing described in
["Decoding billions of integers per second through vectorization"](http://arxiv.org/pdf/1209.2137v5.pdf)
["Decoding billions of integers per second through vectorization"](https://arxiv.org/pdf/1209.2137v5.pdf)
by D. Lemire and L. Boytsov.

In delta encoding we make use of variable length integers for storing various
Expand Down Expand Up @@ -189,7 +189,7 @@ Each block contains
positive integers for bit packing)
* the bitwidth of each block is stored as a byte
* each miniblock is a list of bit packed ints according to the bit width
stored at the begining of the block
stored at the beginning of the block

To encode a block, we will:

Expand Down
2 changes: 1 addition & 1 deletion content/en/docs/File Format/Data Pages/encryption.md
Original file line number Diff line number Diff line change
Expand Up @@ -189,7 +189,7 @@ data set (table). This string is optionally passed by a writer upon file creatio
the AAD prefix is stored in an `aad_prefix` field in the file, and is made available to the readers.
This field is not encrypted. If a user is concerned about keeping the file identity inside the file,
the writer code can explicitly request Parquet not to store the AAD prefix. Then the aad_prefix field
will be empty; AAD prefixes must be fully managed by the caller code and supplied explictly to Parquet
will be empty; AAD prefixes must be fully managed by the caller code and supplied explicitly to Parquet
readers for each file.

The protection against swapping full files is optional. It is not enabled by default because
Expand Down
8 changes: 4 additions & 4 deletions content/en/docs/File Format/bloomfilter.md
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@ unsigned int32 i = (h_top_bits * z_as_64_bit) >> 32;
```

The first line extracts the most significant 32 bits from `h` and
assignes them to a 64-bit unsigned integer. The second line is
assigns them to a 64-bit unsigned integer. The second line is
simpler: it just sets an unsigned 64-bit value to the same value as
the 32-bit unsigned value `z`. The purpose of having both `h_top_bits`
and `z_as_64_bit` be 64-bit values is so that their product is a
Expand Down Expand Up @@ -205,7 +205,7 @@ boolean filter_check(SBBF filter, unsigned int64 x) {

The use of blocks is from Putze et al.'s [Cache-, Hash- and
Space-Efficient Bloom
filters](http://algo2.iti.kit.edu/documents/cacheefficientbloomfilters-jea.pdf)
filters](https://www.cs.amherst.edu/~ccmcgeoch/cs34/papers/cacheefficientbloomfilters-jea.pdf)

To use an SBBF for values of arbitrary Parquet types, we apply a hash
function to that value - at the time of writing,
Expand All @@ -217,14 +217,14 @@ with a seed of 0 and [following the specification version

The `check` operation in SBBFs can return `true` for an argument that
was never inserted into the SBBF. These are called "false
positives". The "false positive probabilty" is the probability that
positives". The "false positive probability" is the probability that
any given hash value that was never `insert`ed into the SBBF will
cause `check` to return `true` (a false positive). There is not a
simple closed-form calculation of this probability, but here is an
example:

A filter that uses 1024 blocks and has had 26,214 hash values
`insert`ed will have a false positive probabilty of around 1.26%. Each
`insert`ed will have a false positive probability of around 1.26%. Each
of those 1024 blocks occupies 256 bits of space, so the total space
usage is 262,144. That means that the ratio of bits of space to hash
values is 10-to-1. Adding more hash values increases the denominator
Expand Down

0 comments on commit e79b304

Please sign in to comment.