|
2 | 2 | datatools
|
3 | 3 | =========
|
4 | 4 |
|
5 |
| -_datatools_ provides a variety of command line programs for working with |
6 |
| -data in different formats as well as to ease Posix shell scripting |
7 |
| -(e.g. writing scripts that run under Bash). The tools are group as data, |
8 |
| -strings and scripting. |
9 |
| - |
10 |
| -For data |
11 |
| --------- |
12 |
| - |
13 |
| -Command line utilities for simplifying work with CSV, JSON, TOML, YAML, |
14 |
| -Excel Workbooks and plain text files or content. |
15 |
| - |
16 |
| -+ [csv2json](docs/csv2json/) - a tool to take a CSV file and convert it into a JSON array or a list of JSON blobs one per line |
17 |
| -+ [csv2mdtable](docs/csv2mdtable/) - a tool to render CSV as a Github Flavored Markdown table |
18 |
| -+ [csv2tab](docs/csv2tab/) - a tool to take a CSV file and convert to tab separated values |
19 |
| -+ [csv2xlsx](docs/csv2xlsx/) - a tool to take a CSV file and add it as a sheet to a Excel Workbook |
20 |
| -+ [csvcleaner](docs/csvcleaner/) - normalize a CSV file by column and row including trimming spaces and removing comments |
21 |
| -+ [csvcols](docs/csvcols/) - a tool for formatting command line arguments into CSV row of columns or filtering CSV rows for specific columns |
22 |
| -+ [csvfind](docs/csvfind/) - a tool for filtering a CSV file rows by column |
23 |
| -+ [csvjoin](docs/csvjoin/) - a tool to join two CSV files on common values in designated columns, writes combined CSV rows |
24 |
| -+ [csvrows](docs/csvrows/) - a tool for formatting command line arguments into CSV columns of rows or filtering CSV for specific rows |
25 |
| -+ [json2toml](docs/json2toml/) - a tool for converting JSON to TOML |
26 |
| -+ [json2yaml](docs/json2yaml/) - a tool for converting JSON to YAML |
27 |
| -+ [jsoncols](docs/jsoncols/) - a tool for exploring and extracting JSON values into columns |
28 |
| -+ [jsonjoin](docs/jsonjoin/) - a tool for joining JSON object documents |
29 |
| -+ [jsonmunge](docs/jsonmunge/) - a tool to transform JSON documents into something else |
30 |
| -+ [jsonrange](docs/jsonrange/) - a tool for iterating over JSON objects and arrays (return keys or values) |
31 |
| -+ [tab2csv](docs/tab2csv/) - a tool to convert from tab separated values to comma separated values |
32 |
| -+ [toml2json](docs/toml2json/) - a tool for converting TOML to JSON |
33 |
| -+ [xlsx2csv](docs/xlsx2csv/) - a tool for converting Excel Workbooks sheets to CSV files |
34 |
| -+ [xlsx2json](docs/xlsx2json/) - a tool for converting Excel Workbooks to JSON files |
35 |
| -+ [yaml2json](docs/yaml2json/) - a tool for converting YAML files to JSON |
36 |
| -+ [codemeta2cff](codemeta2cff.1.html) - a tool to convert a codemeta.json file into a CITATION.cff file. |
37 |
| -+ [sql2csv](sql2csv.1.html) - a tool to execute a SQL query in MySQL or SQLIte3 and render the results in CSV encoding |
38 |
| - |
39 |
| - |
40 |
| -Compiled versions are provided for Linux (amd64), Mac OS X (amd64), |
41 |
| -Windows 10 (amd64) and Raspbian (ARM7). See https://github.com/caltechlibrary/datatools/releases. |
| 5 | +_datatools_ is a rich collection of command line programs targetting |
| 6 | +data conversion, cleanup and analysis directly from your favorite |
| 7 | +POSIX shell. It has proven useful for data collaberations where |
| 8 | +individual members of a project may prefer different toolsets in their |
| 9 | +analysis (e.g. Julia, R, Python) but want to work from a common baseline. |
| 10 | +It also has been used intensively for internal reporting from various |
| 11 | +Caltech Library metadata sources. |
| 12 | + |
| 13 | +The tools fall into three broad categories |
| 14 | + |
| 15 | +- data transformation and conversion |
| 16 | +- shell scripting helpers |
| 17 | +- "string", a tool providing the common string operations missing from shell |
| 18 | + |
| 19 | +See [user manual](user-manual.md) for a complete list of the command line |
| 20 | +programs. The data transformation tools include support for formats such as |
| 21 | +Excel XML, csv, tab delimited files, json, yaml and toml. |
| 22 | + |
| 23 | +Compiled versions of the datatools collection are provided for Linux |
| 24 | +(amd64), Mac OS X (amd64), Windows 10 (amd64) and Raspbian (ARM7). |
| 25 | +See https://github.com/caltechlibrary/datatools/releases. |
42 | 26 |
|
43 | 27 | Use "-help" option for a full list of options for each utility (e.g. `csv2json -help`).
|
44 | 28 |
|
| 29 | +Data transformation |
| 30 | +------------------- |
| 31 | + |
| 32 | +The tooling around transformation includes data conversion. These |
| 33 | +include tools that work with CSV, tab delimited, JSON, TOML, YAML |
| 34 | +and Excel XML. |
| 35 | + |
| 36 | +There is also tooling to change data shapes using JSON as the |
| 37 | +intermediate data format. |
| 38 | + |
| 39 | +For the shell |
| 40 | +------------- |
| 41 | + |
| 42 | +Various utilities for simplifying work on the command line. |
| 43 | + |
| 44 | ++ [findfile](docs/findfile/) - find files based on prefix, suffix or contained string |
| 45 | ++ [finddir](docs/finddir/) - find directories based on prefix, suffix or contained string |
| 46 | ++ [mergepath](docs/mergepath/) - prefix, append, clip path variables |
| 47 | ++ [range](docs/range/) - emit a range of integers (useful for numbered loops in Bash) |
| 48 | ++ [reldate](docs/reldate/) - display a relative date in YYYY-MM-DD format |
| 49 | ++ [reltime](docs/reltime/) - display a relative time in 24 hour notation, HH:MM:SS format |
| 50 | ++ [timefmt](docs/timefmt/) - format a time value based on Golang's time format language |
| 51 | ++ [urlparse](docs/urlparse/) - split a URL into parts |
| 52 | + |
45 | 53 | For strings
|
46 | 54 | -----------
|
47 | 55 |
|
@@ -71,26 +79,6 @@ Some of the features included
|
71 | 79 |
|
72 | 80 | See [string](docs/string/) for full details
|
73 | 81 |
|
74 |
| -For scripting |
75 |
| -------------- |
76 |
| - |
77 |
| -Various utilities for simplifying work on the command line. |
78 |
| - |
79 |
| -+ [findfile](docs/findfile/) - find files based on prefix, suffix or contained string |
80 |
| -+ [finddir](docs/finddir/) - find directories based on prefix, suffix or contained string |
81 |
| -+ [mergepath](docs/mergepath/) - prefix, append, clip path variables |
82 |
| -+ [range](docs/range/) - emit a range of integers (useful for numbered loops in Bash) |
83 |
| -+ [reldate](docs/reldate/) - display a relative date in YYYY-MM-DD format |
84 |
| -+ [reltime](docs/reltime/) - display a relative time in 24 hour notation, HH:MM:SS format |
85 |
| -+ [timefmt](docs/timefmt/) - format a time value based on Golang's time format language |
86 |
| -+ [urlparse](docs/urlparse/) - split a URL into parts |
87 |
| - |
88 |
| -Compiled versions are provided for Linux (amd64), Mac OS X (amd64), |
89 |
| -Windows 10 (amd64) and Raspbian (ARM7). See https://github.com/caltechlibrary/datatools/releases. |
90 |
| - |
91 |
| -Use the utilities try "-help" option for a full list of options. |
92 |
| - |
93 |
| - |
94 | 82 | Installation
|
95 | 83 | ------------
|
96 | 84 |
|
|
0 commit comments