@@ -10,7 +10,17 @@ Altinity-datasets requires Python 3.5 or greater. The `clickhouse-client`
10
10
executable must be in the path to load data.
11
11
12
12
Before starting you must install the altinity-datasets package using
13
- pip3. Following example shows install into a Python virtual environment.
13
+ pip3. Following example shows install into a Python virtual environment.
14
+ First command is only required if you don't have clickhouse-client already
15
+ installed on the host.
16
+
17
+ ```
18
+ sudo apt install clickhouse-client
19
+ sudo pip3 install altinity-datasets
20
+ ```
21
+
22
+ Many users will prefer to install within a Python3 virtual environment,
23
+ for example:
14
24
15
25
```
16
26
python3 -m venv my-env
@@ -184,24 +194,65 @@ python3 setup.py sdist
184
194
twine upload --repository-url https://upload.pypi.org/legacy/ dist/*
185
195
```
186
196
187
- Code conventions are kind of lax for now. Please keep the Python files
188
- need and properly documented.
197
+ Code conventions are enforced using yapf and flake8. Run the
198
+ dev-format-code.sh script to check formatting.
189
199
190
200
Run tests as follows with virtual environment set. You will need a
191
201
ClickHouse server with a null password on the default user.
202
+
192
203
```
193
204
cd tests
194
205
python3 -m unittest -v
195
206
```
196
207
208
+ ## Errors
209
+
210
+ ### Out-of-date pip3 causes installation failure
211
+
212
+ If pip3 installs with the message ` error: invalid command 'bdist_wheel' ` you
213
+ may need to upgrade pip. Run ` pip3 install --upgrade pip ` to correct the
214
+ problem.
215
+
216
+ ### Materialized views cannot be dumped
217
+
218
+ ad-cli will fail with an error if you try to dump a database that has
219
+ materialized views. The workaround is to omit them from the dump operation
220
+ using a table regex as shown in the following example:
221
+
222
+ ```
223
+ ad-cli dataset dump nyc_taxi_rides --repo-path=. --compress --parallel=6 \
224
+ --tables='^(tripdata|taxi_zones|central_park_weather_observations)$'
225
+ ```
226
+
227
+ ### --no-verify option fails on self-signed certs
228
+
229
+ When using ad-cli --secure together with --no-verify options you need
230
+ to also configure clickhouse-client to skip certificate verification.
231
+ This only applies when the certificate is self-signed. You must
232
+ change /etc/clickhouse-client/config.xml as follows to skip certificate
233
+ validation:
234
+
235
+ ```
236
+ <config>
237
+ <openSSL>
238
+ <client> <!-- Used for connection to server's secure tcp port -->
239
+ ...
240
+ <invalidCertificateHandler>
241
+ <name>AcceptCertificateHandler</name>
242
+ </invalidCertificateHandler>
243
+ </client>
244
+ </openSSL>
245
+ ...
246
+ </config>
247
+
248
+ ```
249
+
197
250
## Limitations
198
251
199
252
The most important are:
200
253
201
254
* Error handling is spotty. If clickhouse-client is not in the path
202
255
things may fail mysteriously.
203
- * There is no automatic way to populate large dataset like airline/ontime.
204
- You can add the extra data files yourself.
205
256
* Datasets have to be on the local file system. In the future we will
206
257
use cloud object storage such as S3.
207
258
0 commit comments