Skip to content

feat: add support for parsing general-format numbers as decimals #133

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,27 @@ By default, Creek will map cell names with letter and number(A1, B3 and etc). To
creek = Creek::Book.new file.path, with_headers: true
```

## Handling decimals with General format

When an Excel cell uses the *General* number format, Creek returns the raw
string from the file. Numeric values such as `0.001` may therefore appear as
`"1E-3"`. To work with the numeric value, cast it manually or enable automatic
parsing with the `parse_general_as_number` option:

```ruby
require 'bigdecimal'

creek = Creek::Book.new 'spec/fixtures/sample.xlsx', parse_general_as_number: true
sheet = creek.sheets[0]

sheet.rows.each do |row|
value = row['M2']
decimal = BigDecimal(value)
puts decimal.to_f # => 0.001
end
```

Without this option, you may cast `value` manually using `BigDecimal` or `Float`.

## Contributing

Expand Down
5 changes: 4 additions & 1 deletion lib/creek/book.rb
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,8 @@ module Creek
class Creek::Book
attr_reader :files,
:shared_strings,
:with_headers
:with_headers,
:parse_general_as_number

DATE_1900 = Date.new(1899, 12, 30).freeze
DATE_1904 = Date.new(1904, 1, 1).freeze
Expand All @@ -24,6 +25,7 @@ def initialize(path, options = {})
@files = Zip::File.open(path)
@shared_strings = SharedStrings.new(self)
@with_headers = options.fetch(:with_headers, false)
@parse_general_as_number = options.fetch(:parse_general_as_number, false)
end

def sheets
Expand Down Expand Up @@ -51,6 +53,7 @@ def sheets
sheetfile
)
sheet.with_headers = with_headers
sheet.parse_general_as_number = parse_general_as_number
sheet
end
end
Expand Down
6 changes: 4 additions & 2 deletions lib/creek/sheet.rb
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,8 @@ class Creek::Sheet
HEADERS_ROW_NUMBER = '1'
SPREADSHEETML_URI = 'http://schemas.openxmlformats.org/spreadsheetml/2006/main'

attr_accessor :with_headers
attr_accessor :with_headers,
:parse_general_as_number
attr_reader :book,
:name,
:sheetid,
Expand Down Expand Up @@ -161,7 +162,8 @@ def convert(value, type, style_idx)
def converter_options
@converter_options ||= {
shared_strings: @book.shared_strings.dictionary,
base_date: @book.base_date
base_date: @book.base_date,
parse_general_as_number: @book.parse_general_as_number
}
end

Expand Down
30 changes: 25 additions & 5 deletions lib/creek/styles/converter.rb
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# frozen_string_literal: true

require 'set'
require 'bigdecimal'

module Creek
class Styles
Expand Down Expand Up @@ -30,6 +31,7 @@ class Converter
# - base_date: from what date to begin, see method #base_date

DATE_TYPES = %i[date time date_time].to_set
NUMERIC_REGEXP = /\A-?(?:\d+(?:\.\d+)?|\d*\.\d+)(?:[eE][+-]?\d+)?\z/.freeze
def self.call(value, type, style, options = {})
return nil if value.nil? || value.empty?

Expand Down Expand Up @@ -59,9 +61,13 @@ def self.call(value, type, style, options = {})
##

when :string
value
if options[:parse_general_as_number] && numeric_string?(value)
convert_general_number(value)
else
value
end
when :unsupported
convert_unknown(value)
convert_unknown(value, options)
when :fixnum
value.to_i
when :float, :percentage
Expand All @@ -75,13 +81,15 @@ def self.call(value, type, style, options = {})

## Nothing matched
else
convert_unknown(value)
convert_unknown(value, options)
end
end

def self.convert_unknown(value)
if value.nil? or value.empty?
def self.convert_unknown(value, options = {})
if value.nil? || value.empty?
value
elsif options[:parse_general_as_number] && numeric_string?(value)
convert_general_number(value)
elsif value.to_i.to_s == value.to_s
value.to_i
elsif value.to_f.to_s == value.to_s
Expand Down Expand Up @@ -129,6 +137,18 @@ def self.round_datetime(datetime_string)

::Time.new(yyyy.to_i, mm.to_i, dd.to_i, hh.to_i, mi.to_i, ss.to_r).round(0)
end

def self.numeric_string?(value)
value.match?(NUMERIC_REGEXP)
end

def self.convert_general_number(value)
if defined?(BigDecimal)
BigDecimal(value)
else
value.to_f
end
end
end
end
end
19 changes: 19 additions & 0 deletions spec/test_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,25 @@
end
end

describe 'Creek parsing a file with large numbers with automatic conversion' do
before(:all) do
@creek = Creek::Book.new 'spec/fixtures/large_numbers.xlsx', parse_general_as_number: true
@expected_simple_rows = [{ 'A' => 783_294_732.0, 'B' => '783294732', 'C' => 783_294_732.0 }]
end

after(:all) do
@creek.close
end

it 'casts general numbers to floats' do
rows = []
@creek.sheets[0].simple_rows.each do |row|
rows << row
end
expect(rows[0]).to eq(@expected_simple_rows[0])
end
end

describe 'Creek parsing a sample XLSX file' do
before(:all) do
@creek = Creek::Book.new 'spec/fixtures/sample.xlsx'
Expand Down