Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

新增二进制数据结构计划 #3

Closed
hungtcs opened this issue Dec 16, 2020 · 1 comment
Closed

新增二进制数据结构计划 #3

hungtcs opened this issue Dec 16, 2020 · 1 comment
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@hungtcs
Copy link
Owner

hungtcs commented Dec 16, 2020

为了进一步压缩数据文件的大小,考虑新增二进制的数据格式。
初步概定为:每5个字节(40个bit)代表一天。

名称(从最高位到最低位排序) 位长度 备注
公历年 8 实际年份为当前年份 - 1900,如2020年数据为2020-1900=120(0x78)
公历月 4 12月0xC
公历日期 5 占5个bit
天干 4 索引值,如0B0
地支 4 索引值,如0B0
农历月 4 十一月0B1011
农历日期 5 占5个bit
是否闰月 1 第1个字节的第6个bit,闰月为0B1
二十四节气 5 第1个字节的前5个bit,没有节气填充0B00000

去掉了星期生肖数据,星期可以从公历日期计算,生肖地支对应。

例如:2020-12-15 庚子 十一月 初一 非闰月 无节气

0b 0111 1000 1100 0111 1011 0000 0101 1000 0100 0000
0x 7    8    C    7    B    0    5    8    4    0

用逗号按意义隔开位:
0b 0111 1000, 1100, 0111 1,011 0,000 0,101 1,000 01,0,0 0000
0x 7    8     C     7    B     0     5     8     4      0

对应到Uint8Array数组
new Uint8Array([0x78, 0xC7, 0xB0, 0x58, 0x40])

将所有数据按每天5个字节的形式,依次排列,写入到单个二进制文件,从1901年1月1日到2100年12月31日,共73048

(new Date('2100-12-31') - new Date('1901-01-01')) / 1000 / 60 / 60 / 24
// 73048

每天占5个字节,那么总大小为73048 * 5个字节,即365.24KB。对比csv文件(3.5M)和json文件(11M)有非常明显的提升。

@hungtcs hungtcs pinned this issue Dec 16, 2020
@hungtcs hungtcs changed the title 重新整理数据结构 新增二进制数据结构计划 Dec 16, 2020
@hungtcs hungtcs self-assigned this Dec 16, 2020
@hungtcs
Copy link
Owner Author

hungtcs commented Dec 16, 2020

二进制文件的生成

通过已有数据计算出5个字节的uint8类型数据

const { gregorian, lunar, solarTerm } = item;
const [ gan, zhi ] = lunar.year;
const lunarMonth = lunar.month.startsWith('閏') ? lunar.month.substr(1) : lunar.month;
newData.push(
  gregorian.year - 1900,
  (gregorian.month << 4) + (gregorian.date >>> 1),
  (gregorian.date << 7) + (TIAN_GAN.indexOf(gan) << 3) + (DI_ZHI.indexOf(zhi) >>> 1),
  (DI_ZHI.indexOf(zhi) << 7) + ((MONTHS.indexOf(lunarMonth) + 1) << 3) + ((DATES.indexOf(lunar.date) + 1) >>> 2),
  ((DATES.indexOf(lunar.date) + 1) << 6) + ((lunar.leapMonth ? 1 : 0) << 5) + (JIE_QI.indexOf(solarTerm) + 1),
);

二进制文件的读取

private slice(offset: number) {
  const dataView = new DataView(this.arrayBuffer, offset, 5);
  const [byte0, byte1, byte2, byte3, byte4] = [dataView.getUint8(0), dataView.getUint8(1), dataView.getUint8(2), dataView.getUint8(3), dataView.getUint8(4)];
  return new CompoundDate(
    byte0 + 1900,
    byte1 >>> 4,
    ((byte1 & 0x0F) << 1) + (byte2 >>> 7),
    ((byte2 >>> 3) & 0x0F),
    ((byte2 << 1) & 0x0F) + (byte3 >>> 7),
    (byte3 & 0x7F) >>> 3,
    ((byte3 & 0x07) << 2) + (byte4 >>> 6),
    ((byte4 >>> 5) & 0x01) === 0x01,
    byte4 & 0x1F,
  );
}

hungtcs added a commit that referenced this issue Dec 16, 2020
- add src/convert-to-binary.ts
- add examples/examples2.html

issues: #3
@hungtcs hungtcs added good first issue Good for newcomers enhancement New feature or request labels Dec 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

1 participant