【clickhouse系列】1、CK从入门到放弃
【编者的话】公司目前的数据存储,有用到clickhouse这一块,本人也有些研究,简单写一篇ck的入门文章(基于docker容器化搭建ck示例),权当抛转,欢迎一起讨论,沟通。
What is ClickHouse
– ClickHouse 诞生于俄罗斯最大的搜索公司Yandex,在clickhouse的配置文件中我们也会看到yandex的影子。
Tips: Yandex is one of the largest internet companies in Europe; operating Russia’s most popular search engine.
– ClickHouse 是面向联机分析处理(OLAP, On-Line Analytical Processing) 的分布式列式数据库管理系统(DBMS),在数据分析中能提供高性能的查询服务。
– ClickHouse 是一个完全面向列式的分布式数据库。数据通过列存储,在查询过程中,数据通过数组来处理(向量或者列Chunk)。
当进行查询时,操作被转发到数组上,而不是在特定的值上。
因此被称为”向量化查询执行”,相对于实际的数据处理成本,向量化处理具有更低的转发成本。
– ClickHouse主要使用向量化查询执行和有限的运行时代码生成支持。
– ClickHouse 官方简称“CK”,艺名“喀秋莎数据库”。战斗民族还有一个神器叫“AK”(如果非要加个代号的话就是AK 47),可以说是枪类中的绝对明星了。脑补了一下(图片见文末):喀秋莎拿着一把AK47,在一个叫clickhouse的屋子里大杀四方~~~快哉!快哉!
– 其他自行查阅吧:
– – ClickHouse官方文档
– – ClickHouse入门指北
– – ClickHouse源码
ClickHouse容器化安装
1、docker 安装方式介绍(docker没玩过的同学,请自行百度!)
– 写dockerfile文件(dockerfile文件没写过的同学,请自行百度!)
写dockerfile,文件名就是dockerfile或Dockerfile
[root@wangjie clickhouse]# vim dockerfile
FROM ubuntu:18.04
ARG repository=”deb http://repo.yandex.ru/clickhouse/deb/stable/ main/”
ARG version=19.1.10
ARG gosu_ver=1.10
RUN apt-get update \
&& apt-get install –yes –no-install-recommends \
apt-transport-https \
dirmngr \
gnupg \
&& mkdir -p /etc/apt/sources.list.d \
&& apt-key adv –keyserver keyserver.ubuntu.com –recv E0C56BD4 \
&& echo $repository > /etc/apt/sources.list.d/clickhouse.list \
&& apt-get update \
&& env DEBIAN_FRONTEND=noninteractive \
apt-get install –allow-unauthenticated –yes –no-install-recommends \
clickhouse-common-static=$version \
clickhouse-client=$version \
clickhouse-server=$version \
libgcc-7-dev \
locales \
tzdata \
wget \
&& rm -rf \
/var/lib/apt/lists/* \
/var/cache/debconf \
/tmp/* \
&& apt-get clean
ADD https://github.com/tianon/gosu/releases/download/1.10/gosu-amd64 /bin/gosu
RUN locale-gen en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US:en
ENV LC_ALL en_US.UTF-8
RUN mkdir /docker-entrypoint-initdb.d
COPY docker_related_config.xml /etc/clickhouse-server/config.d/
COPY entrypoint.sh /entrypoint.sh
RUN chmod +x \
/entrypoint.sh \
/bin/gosu
EXPOSE 9000 8123 9009
VOLUME /var/lib/clickhouse
ENV CLICKHOUSE_CONFIG /etc/clickhouse-server/config.xml
ENTRYPOINT [“/entrypoint.sh”]
– 构建clickhouse镜像(时间可能会久一点,如果出错,可能会更久–因为需要反复构建多次–不过这一切都是值得的,当你有了自己的镜像,你就可以any zuo,no die!)
[root@wangjie clickhouse]# docker build -t clickhouse-server-demo:1.0 . Sending build context to Docker daemon 3.072kB Step 1/18 : FROM ubuntu:18.04 18.04: Pulling from library/ubuntu 6cf436f81810: Pull complete 987088a85b96: Pull complete b4624b3efe06: Pull complete d42beb8ded59: Pull complete ....(通常能连外网就行).... The following additional packages will be installed: gnupg-l10n gnupg-utils gpg gpg-agent gpg-wks-client gpg-wks-server gpgconf gpgsm libasn1-8-heimdal libassuan0 libgssapi3-heimdal libhcrypto4-heimdal libheimbase1-heimdal libheimntlm0-heimdal libhx509-5-heimdal libkrb5-26-heimdal libksba8 libldap-2.4-2 libldap-common libnpth0 libreadline7 libroken18-heimdal libsasl2-2 libsasl2-modules-db libsqlite3-0 libwind0-heimdal pinentry-curses readline-common Suggested packages: dbus-user-session libpam-systemd pinentry-gnome3 tor parcimonie xloadimage scdaemon pinentry-doc readline-doc Recommended packages: libsasl2-modules The following NEW packages will be installed: apt-transport-https dirmngr gnupg gnupg-l10n gnupg-utils gpg gpg-agent gpg-wks-client gpg-wks-server gpgconf gpgsm libasn1-8-heimdal libassuan0 libgssapi3-heimdal libhcrypto4-heimdal libheimbase1-heimdal libheimntlm0-heimdal libhx509-5-heimdal libkrb5-26-heimdal libksba8 libldap-2.4-2 libldap-common libnpth0 libreadline7 libroken18-heimdal libsasl2-2 libsasl2-modules-db libsqlite3-0 libwind0-heimdal pinentry-curses readline-common 0 upgraded, 31 newly installed, 0 to remove and 3 not upgraded. ....(通常能连外网就行).... The following additional packages will be installed: gcc-7-base libasan4 libatomic1 libcilkrts5 libgomp1 libitm1 liblsan0 libmpx2 libpsl5 libquadmath0 libssl1.1 libtsan0 libubsan0 Recommended packages: libcap2-bin libc6-dev publicsuffix ca-certificates The following NEW packages will be installed: clickhouse-client clickhouse-common-static clickhouse-server gcc-7-base libasan4 libatomic1 libcilkrts5 libgcc-7-dev libgomp1 libitm1 liblsan0 libmpx2 libpsl5 libquadmath0 libssl1.1 libtsan0 libubsan0 locales tzdata wget 0 upgraded, 20 newly installed, 0 to remove and 3 not upgraded. ....(通常能连外网就行).... urrent default time zone: 'Etc/UTC' Local time is now: Wed Mar 6 08:51:38 UTC 2019. Universal Time is now: Wed Mar 6 08:51:38 UTC 2019. ....(如果有报错百度之).... Warning: apt-key output should not be parsed (stdout is not a terminal) Executing: /tmp/apt-key-gpghome.QG893kLkNP/gpg.1.sh --keyserver keyserver.ubuntu.com --recv E0C56BD4 gpg: keyserver receive failed: Server indicated a failure The command '/bin/sh -c apt-get update && apt-get install --yes --no-install-recommends apt-transport-https dirmngr gnupg && mkdir -p /etc/apt/sources.list.d && apt-key adv --keyserver keyserver.ubuntu.com --recv E0C56BD4 && echo $repository > /etc/apt/sources.list.d/clickhouse.list && apt-get update && env DEBIAN_FRONTEND=noninteractive apt-get install --allow-unauthenticated --yes --no-install-recommends clickhouse-common-static=$version clickhouse-client=$version clickhouse-server=$version libgcc-7-dev locales tzdata wget && rm -rf /var/lib/apt/lists/* /var/cache/debconf /tmp/* && apt-get clean' returned a non-zero code: 2 您在 /var/spool/mail/root 中有邮件
– 比如1:如上报错—怎么解决?
[root@wangjie clickhouse]# docker images REPOSITORY TAG IMAGE ID CREATED SIZE5b07a3f2033d 17 minutes ago 88.1MB ubuntu 18.04 47b19964fb50 4 weeks ago 88.1MB
可以看到镜像都是办成品,不过可以用这个半成品镜像5b07a3f2033d来调试上面的报错
[root@wangjie clickhouse]# docker run -it 5b07a3f2033d /bin/bash
root@186ew89a3lk8:/# 这里可以从出错的command(详见上面dockerfile文件命令菜单)开始执行check报错的原因并解决之
解决上述问题后,删除上面的镜像和容器,再次执行build命令,,等待直到成功或报错!
[root@wangjie clickhouse]# docker ps -a 查出刚刚的容器id
[root@wangjie clickhouse]# docker rm 容器id
[root@wangjie clickhouse]# docker rmi 5b07a3f2033d 删除半成品镜像
….(如果有报错百度之)….
Step 12/18 : COPY docker_related_config.xml /etc/clickhouse-server/config.d/
COPY failed: stat /var/lib/docker/tmp/docker-builder800178589/docker_related_config.xml: no such file or directory
– 比如2:如上报错—怎么解决?
[root@wangjie clickhouse]# docker images REPOSITORY TAG IMAGE ID CREATED SIZE300535bbbf56 4 minutes ago 474MB ubuntu 18.04 47b19964fb50 4 weeks ago 88.1MB
可以看到这次的半成品,474MB距离成功只差8步了。同样可以用这个半成品镜像300535bbbf56来调试上面的报错
[root@wangjie clickhouse]# docker run -it 300535bbbf56 /bin/bash
root@271df44d0fa9:/# 这里可以从出错的command(详见上面dockerfile文件命令菜单)开始执行check报错的原因并解决之
即COPY docker_related_config.xml /etc/clickhouse-server/config.d/
发现其实根源是没有docker_related_config.xml文件,所以要在本地新增这个文件,但是内容是啥呢?(用docker pull 镜像–后面有提到pull镜像的方式,然后run出容器cat相应文件即可)
比如:
[root@wangjie clickhouse]# docker run -it 76f15457b167 /bin/bash 任意new一个容器并进入
root@b1d85472110e:/# cat /etc/clickhouse-server/config.d/docker_related_config.xml 进入容器后查看文件内容后,复制出来即可
[root@wangjie clickhouse]# vim docker_related_config.xml
[root@wangjie clickhouse]# ll
总用量 8
-rw-r–r– 1 root root 1432 3月 6 15:10 dockerfile
-rw-r–r– 1 root root 315 3月 6 16:37 docker_related_config.xml
可以看到上面的dockerfile文件命令菜单中有两个COPY,所以entrypoint.sh文件也没有的,再次build会有如下报错
Step 13/18 : COPY entrypoint.sh /entrypoint.sh
COPY failed: stat /var/lib/docker/tmp/docker-builder219246264/entrypoint.sh: no such file or directory
– 可以看到还是报错,所以同上新增之即可,但是entrypoint.sh内容是啥呢?
比如: [root@wangjie clickhouse]# docker run -it 76f15457b167 /bin/bash 任意new一个容器并进入 root@b1d85472110e:/# cat /entrypoint.sh 进入容器后查看文件内容后,复制出来即可 COPY entrypoint.sh /entrypoint.sh [root@wangjie clickhouse]# vim entrypoint.sh
!/bin/bash
set some vars
CLICKHOUSE_CONFIG=”${CLICKHOUSE_CONFIG:-/etc/clickhouse-server/config.xml}”
USER=”$(id -u clickhouse)”
GROUP=”$(id -g clickhouse)”
port is needed to check if clickhouse-server is ready for connections
HTTP_PORT=”$(clickhouse extract-from-config –config-file $CLICKHOUSE_CONFIG –key=http_port)”
get CH directories locations
DATA_DIR=”$(clickhouse extract-from-config –config-file $CLICKHOUSE_CONFIG –key=path || true)”
TMP_DIR=”$(clickhouse extract-from-config –config-file $CLICKHOUSE_CONFIG –key=tmp_path || true)”
USER_PATH=”$(clickhouse extract-from-config –config-file $CLICKHOUSE_CONFIG –key=user_files_path || true)”
LOG_PATH=”$(clickhouse extract-from-config –config-file $CLICKHOUSE_CONFIG –key=logger.log || true)”
LOG_DIR=”$(dirname $LOG_PATH || true)”
ERROR_LOG_PATH=”$(clickhouse extract-from-config –config-file $CLICKHOUSE_CONFIG –key=logger.errorlog || true)”
ERROR_LOG_DIR=”$(dirname $ERROR_LOG_PATH || true)”
FORMAT_SCHEMA_PATH=”$(clickhouse extract-from-config –config-file $CLICKHOUSE_CONFIG –key=format_schema_path || true)”
ensure directories exist
mkdir -p \
“$DATA_DIR” \
“$ERROR_LOG_DIR” \
“$LOG_DIR” \
“$TMP_DIR” \
“$USER_PATH” \
“$FORMAT_SCHEMA_PATH”
if [ “$CLICKHOUSE_DO_NOT_CHOWN” != “1” ]; then
# ensure proper directories permissions
chown -R $USER:$GROUP \
“$DATA_DIR” \
“$ERROR_LOG_DIR” \
“$LOG_DIR” \
“$TMP_DIR” \
“$USER_PATH” \
“$FORMAT_SCHEMA_PATH”
fi
if [ -n “$(ls /docker-entrypoint-initdb.d/)” ]; then
gosu clickhouse /usr/bin/clickhouse-server –config-file=$CLICKHOUSE_CONFIG &
pid=”$!”
# check if clickhouse is ready to accept connections
# will try to send ping clickhouse via http_port (max 12 retries, with 1 sec delay)
if ! wget –spider –quiet –tries=12 –waitretry=1 –retry-connrefused “http://localhost:$HTTP_PORT/ping” ; then
echo >&2 ‘ClickHouse init process failed.’
exit 1
fi
clickhouseclient=( clickhouse-client –multiquery )
echo
for f in /docker-entrypoint-initdb.d/*; do
case “$f” in
*.sh)
if [ -x “$f” ]; then
echo “$0: running $f”
“$f”
else
echo “$0: sourcing $f”
. “$f”
fi
;;
*.sql) echo “$0: running $f”; cat “$f” | “${clickhouseclient[@]}” ; echo ;;
*.sql.gz) echo “$0: running $f”; gunzip -c “$f” | “${clickhouseclient[@]}”; echo ;;
*) echo “$0: ignoring $f” ;;
esac
echo
done
if ! kill -s TERM “$pid” || ! wait “$pid”; then
echo >&2 ‘Finishing of ClickHouse init process failed.’
exit 1
fi
fi
if no args passed to docker run
or first argument start with --
, then the user is passing clickhouse-server arguments
if [[ $# -lt 1 ]] || [[ “$1” == “–“* ]]; then
exec gosu clickhouse /usr/bin/clickhouse-server –config-file=$CLICKHOUSE_CONFIG “$@”
fi
Otherwise, we assume the user want to run his own process, for example a bash
shell to explore this image
exec “$@”
[root@wangjie clickhouse]# ll
总用量 12
-rw-r–r– 1 root root 1432 3月 6 15:10 dockerfile
-rw-r–r– 1 root root 315 3月 6 16:37 docker_related_config.xml
-rw-r–r– 1 root root 3189 3月 6 17:03 entrypoint.sh
然后删除半成品镜像
[root@wangjie clickhouse]# docker rmi 半成品镜像id
再次build即可
[root@wangjie clickhouse]# docker build -t clickhouse-server-demo:1.0 .
…(循环上面的操作, 直到不报错为止)…
Successfully built 7e0487ee8225
Successfully tagged clickhouse-server-demo:1.0
您在 /var/spool/mail/root 中有邮件
看到这里说明你成功了
其实,简单的构建镜像的方法是docker pull 镜像即可。如何获取自己需要的镜像?看 这里
– 先来pull clickhouse-server
[root@wangjie clickhouse]# docker pull yandex/clickhouse-server ...(等待,pull镜像的过程中也可能报错的,看网络了)...
运气好的话,成功之后,可以看到镜像了。
[root@wangjie clickhouse]# docker images |grep click
yandex/clickhouse-server latest 76f15457b167 2 days ago 475MB
– 再来pull clickhouse-client
[root@wangjie clickhouse]# docker pull yandex/clickhouse-client
运气好的话,成功之后,可以看到镜像了。
[root@wangjie clickhouse]# docker images |grep click
[root@wangjie clickhouse]# docker images | grep click
yandex/clickhouse-server latest 76f15457b167 2 days ago 475MB
yandex/clickhouse-client latest 52a6b316725a 13 days ago 450MB
启动一个实例即clickhouse容器
其次启动clickhouse客户端来连接服务端
[root@wangjie clickhouse]# docker run -it –rm –link demo-clickhouse-server:clickhouse-server_wj yandex/clickhouse-client –host clickhouse-server_wj (运行完就直接进入到clickhouse的交互式了)
上面的命令简单说明一下,–link 即将被连接的服务端容器名软链为clickhouse-server_wj(自定义即可),后面–host 用的就是这个软链名字
ClickHouse client version 19.1.9.
Connecting to clickhouse-server:9000.
Connected to ClickHouse server version 19.1.10 revision 54413.
38f0f69b5941 🙂
38f0f69b5941 🙂 exit
Bye.
如果退出了容器,如何继续玩耍?
重新执行docker run -it –rm –link demo-clickhouse-server:clickhouse-server_wj yandex/clickhouse-client –host clickhouse-server_wj 这个命令即可。
操作之
接上面38f0f69b5941 🙂 show databases;
SHOW DATABASES
┌─name────┐
│ default │
│ system │
└─────────┘
2 rows in set. Elapsed: 0.013 sec.
38f0f69b5941 🙂 use system
USE system
Ok.
0 rows in set. Elapsed: 0.006 sec.
38f0f69b5941 🙂 show tables;
SHOW TABLES
┌─name───────────────────────────┐
│ aggregate_function_combinators │
│ asynchronous_metrics │
│ build_options │
│ clusters │
│ collations │
│ columns │
│ contributors │
│ data_type_families │
│ databases │
│ dictionaries │
│ events │
│ formats │
│ functions │
│ graphite_retentions │
│ macros │
│ merge_tree_settings │
│ merges │
│ metrics │
│ models │
│ mutations │
│ numbers │
│ numbers_mt │
│ one │
│ parts │
│ parts_columns │
│ processes │
│ replicas │
│ replication_queue │
│ settings │
│ table_engines │
│ table_functions │
│ tables │
└────────────────────────────────┘
32 rows in set. Elapsed: 0.009 sec.
38f0f69b5941 🙂
综上,虽然可以玩了,但是配置都是默认的,可以看到客户端连接时没有指定用户名和密码的。
业务中是不可能让clickhouse裸奔的,那么配置文件怎么配置呢?配置文件怎么从本地映射到容器中呢?(在【2、docker-compose安装方式介绍】中会有类似说明)
2、docker-compose 安装方式介绍
– 通过1中的操作,你基本可以开始学习clickhouse的使用了,而且你也简单了解了clickhouse-server端和 clickhouse-client端(类似于mysql)
– 下面介绍稍微高级点的docker-compose 安装方式,同时把1中没说的clickhouse配置文件说明一下。
– 我们重新建一个目录来玩compose
– – 新建一个目录和相关文件
[root@wangjie clickhouse_compose]# cd clickhouse_server/ [root@wangjie clickhouse_server]# touch config.xml [root@wangjie clickhouse_server]# mkdir data [root@wangjie clickhouse_server]# mkdir log [root@wangjie clickhouse_server]# touch users.xml [root@wangjie clickhouse_server]# mv user.xml users.xml [root@wangjie clickhouse_server]# touch log/clickhouse-server.log [root@wangjie clickhouse_server]# touch log/clickhouse-server.err.log [root@wangjie clickhouse_server]# touch docker-compose.yml
– – 最终的结构类似这种
[root@wangjie clickhouse_server]# tree . . ├── config.xml ├── data ├── docker-compose.yml ├── log │ ├── clickhouse-server.err.log │ └── clickhouse-server.log └── users.xml
– – 然后填充相关文件的内容,先写config.xml文件
[root@wangjie clickhouse_server]# vim config.xmltrace /var/log/clickhouse-server/clickhouse-server.log /var/log/clickhouse-server/clickhouse-server.err.log 1000M 10 8123 9000 /etc/clickhouse-server/server.crt /etc/clickhouse-server/server.key /etc/clickhouse-server/dhparam.pem none true true sslv2,sslv3 true true true sslv2,sslv3 true RejectCertificateHandler 9009 # 这个被注释了,那么就无法远程连接了,如果想要远程连接,那么取消注释即可4096 3 100 8589934592 5368709120 /var/lib/clickhouse/ /var/lib/clickhouse/tmp/ users.xml default default localhost 9000 3600 3600 60 system query_log
7500 *_dictionary.xml /clickhouse/task_queue/ddl click_cost any 0 3600 86400 60 max 0 60 3600 300 86400 3600 /var/lib/clickhouse/format_schemas/
– – 写users.xml
[root@wangjie clickhouse_server]# vim users.xml10000000000 0 random 1 ::1 127.0.0.1 default default 8d969eef6ecad3c29a3a629280e686cf0c3f5d5a86aff3ca12020c923adc6c92 ::/0 readonly default 8d969eef6ecad3c29a3a629280e686cf0c3f5d5a86aff3ca12020c923adc6c92 ::/0 default default 3600 0 0 0 0 0
– – 写docker-compose.yml
[root@wangjie clickhouse_server]# vim docker-compose.yml version: '3' services: clickhouse-server: image: yandex/clickhouse-server container_name: clickhouse-server_wj hostname: clickhouse-server_wj ports: - 8123:8123 expose: - 9000 - 9009 volumes: - ./config.xml:/etc/clickhouse-server/config.xml - ./users.xml:/etc/clickhouse-server/users.xml - ./data:/var/lib/clickhouse - ./log/clickhouse-server.log:/var/log/clickhouse-server/clickhouse-server.log - ./log/clickhouse-server.err.log:/var/log/clickhouse-server/clickhouse-server.err.log
– – 在当前docker-compose.yml文件所在的目录执行 docker-compose up -d
[root@wangjie clickhouse_server]# docker-compose up -d Creating network "clickhouse_server_default" with the default driver Creating clickhouse-server_wj ... done
check一下服务是不是起来了,如下:
[root@wangjie clickhouse_server]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
dc9b85c22b6e yandex/clickhouse-server “/entrypoint.sh” 28 seconds ago Up 27 seconds 9000/tcp, 0.0.0.0:8123->8123/tcp, 9009/tcp clickhouse-server_wj
– – 然后通过客户端连接之
[root@wangjie clickhouse_server]# docker run -it --rm --link clickhouse-server_wj:clickhouse-server --net clickhouse_server_default yandex/clickhouse-client --host clickhouse-server --user seluser --password 8d969eef6ecad3c29a3a629280e686cf0c3f5d5a86aff3ca12020c923adc6c92 ClickHouse client version 19.1.9. Connecting to clickhouse-server:9000 as user seluser. Connected to ClickHouse server version 19.1.10 revision 54413. clickhouse-server_wj :) clickhouse-server_wj :) show databases; SHOW DATABASES ┌─name────┐ │ default │ │ system │ └─────────┘ 2 rows in set. Elapsed: 0.017 sec.
– – 上面命令中的其他参数不解释,值得一提的是–net clickhouse_server_default 这个net参数的名字从哪来的,可以查看运行的clickhouse-server_wj服务端容器的详细信息来拿到
[root@wangjie clickhouse_server]# docker inspect dc9b85c22b6e 重点看Networks的部分,如下:映射的就是clickhouse_server_default ...(此处省略n行)... "Networks": { "clickhouse_server_default": { "IPAMConfig": null, "Links": null, "Aliases": [ "clickhouse-server", "dc9b85c22b6e" ], "NetworkID": "ba7cf08769082f9afbe245f14ab2d7f9d7fc1f011e9d522ed2e75e9f6ebcee3f", "EndpointID": "5abb64ebb7f4fd8641e6e1598e888b44cd26b0b2bba8ef6d28ba62b638de6c56", "Gateway": "172.19.0.1", "IPAddress": "172.19.0.2", "IPPrefixLen": 16, "IPv6Gateway": "", "GlobalIPv6Address": "", "GlobalIPv6PrefixLen": 0, "MacAddress": "02:42:ac:13:00:02", "DriverOpts": null } }
– – 至此,就可以愉快的玩耍了,使用同1中类似操作即可。
3、用docker-clickhouse-client连线上clickhouse
略
ClickHouse实例
– todo
ClickHouse源码阅读
– todo