Skip to content

Commit 9a48ef5

Browse files
committed
Cleaned up README for 0.1.2 release
1 parent 73b488c commit 9a48ef5

File tree

1 file changed

+56
-5
lines changed

1 file changed

+56
-5
lines changed

README.md

Lines changed: 56 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,17 @@ Altinity-datasets requires Python 3.5 or greater. The `clickhouse-client`
1010
executable must be in the path to load data.
1111

1212
Before starting you must install the altinity-datasets package using
13-
pip3. Following example shows install into a Python virtual environment.
13+
pip3. Following example shows install into a Python virtual environment.
14+
First command is only required if you don't have clickhouse-client already
15+
installed on the host.
16+
17+
```
18+
sudo apt install clickhouse-client
19+
sudo pip3 install altinity-datasets
20+
```
21+
22+
Many users will prefer to install within a Python3 virtual environment,
23+
for example:
1424

1525
```
1626
python3 -m venv my-env
@@ -184,24 +194,65 @@ python3 setup.py sdist
184194
twine upload --repository-url https://upload.pypi.org/legacy/ dist/*
185195
```
186196

187-
Code conventions are kind of lax for now. Please keep the Python files
188-
need and properly documented.
197+
Code conventions are enforced using yapf and flake8. Run the
198+
dev-format-code.sh script to check formatting.
189199

190200
Run tests as follows with virtual environment set. You will need a
191201
ClickHouse server with a null password on the default user.
202+
192203
```
193204
cd tests
194205
python3 -m unittest -v
195206
```
196207

208+
## Errors
209+
210+
### Out-of-date pip3 causes installation failure
211+
212+
If pip3 installs with the message `error: invalid command 'bdist_wheel'` you
213+
may need to upgrade pip. Run `pip3 install --upgrade pip` to correct the
214+
problem.
215+
216+
### Materialized views cannot be dumped
217+
218+
ad-cli will fail with an error if you try to dump a database that has
219+
materialized views. The workaround is to omit them from the dump operation
220+
using a table regex as shown in the following example:
221+
222+
```
223+
ad-cli dataset dump nyc_taxi_rides --repo-path=. --compress --parallel=6 \
224+
--tables='^(tripdata|taxi_zones|central_park_weather_observations)$'
225+
```
226+
227+
### --no-verify option fails on self-signed certs
228+
229+
When using ad-cli --secure together with --no-verify options you need
230+
to also configure clickhouse-client to skip certificate verification.
231+
This only applies when the certificate is self-signed. You must
232+
change /etc/clickhouse-client/config.xml as follows to skip certificate
233+
validation:
234+
235+
```
236+
<config>
237+
<openSSL>
238+
<client> <!-- Used for connection to server's secure tcp port -->
239+
...
240+
<invalidCertificateHandler>
241+
<name>AcceptCertificateHandler</name>
242+
</invalidCertificateHandler>
243+
</client>
244+
</openSSL>
245+
...
246+
</config>
247+
248+
```
249+
197250
## Limitations
198251

199252
The most important are:
200253

201254
* Error handling is spotty. If clickhouse-client is not in the path
202255
things may fail mysteriously.
203-
* There is no automatic way to populate large dataset like airline/ontime.
204-
You can add the extra data files yourself.
205256
* Datasets have to be on the local file system. In the future we will
206257
use cloud object storage such as S3.
207258

0 commit comments

Comments
 (0)