Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory leak when more request is received #4

Open
jiucaiProductions opened this issue Dec 18, 2019 · 13 comments
Open

memory leak when more request is received #4

jiucaiProductions opened this issue Dec 18, 2019 · 13 comments

Comments

@jiucaiProductions
Copy link

jiucaiProductions commented Dec 18, 2019

I found there is a meory leack issue when I used this on our production env.

We used nginx server with 4core , 4GB memory
When I used nginx without this moudle, on our production env, nginx used 20% memory.
When I just add serverlist_service at http , and no call on it , nginx used 50% memory.

serverlist_service url=http://anther-server/upstream/ conf_dump_dir=/data/temp/dump interval=5s timeout=2s;

When call serverlist_service in upstream, nginx memory will increasing up to almost 100%,

 serverlist apiServerList;
    server fake_server down;
    include /data/temp/apiServerList.conf*;

I had a screen shot for server:

mem-leak

nginx -V output with nginx-upstream-serverlist module:

nginx version: openresty/1.15.8.2
built by gcc 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.11) 
built with OpenSSL 1.1.1d  10 Sep 2019
TLS SNI support enabled
configure arguments: --prefix=/data/server/openresty/nginx --with-cc-opt=-O2 --add-module=../ngx_devel_kit-0.3.1rc1 --add-module=../echo-nginx-module-0.61 --add-module=../xss-nginx-module-0.06 --add-module=../ngx_coolkit-0.2 --add-module=../set-misc-nginx-module-0.32 --add-module=../form-input-nginx-module-0.12 --add-module=../encrypted-session-nginx-module-0.08 --add-module=../srcache-nginx-module-0.31 --add-module=../ngx_lua-0.10.15 --add-module=../ngx_lua_upstream-0.07 --add-module=../headers-more-nginx-module-0.33 --add-module=../array-var-nginx-module-0.05 --add-module=../memc-nginx-module-0.19 --add-module=../redis2-nginx-module-0.15 --add-module=../redis-nginx-module-0.3.7 --add-module=../rds-json-nginx-module-0.15 --add-module=../rds-csv-nginx-module-0.09 --add-module=../ngx_stream_lua-0.0.7 --with-ld-opt=-Wl,-rpath,/data/server/openresty/luajit/lib --pid-path=/data/server/openresty/nginx/var/nginx.pid --http-client-body-temp-path=/data/server/openresty/nginx/temp/client_body_temp --http-proxy-temp-path=/data/server/openresty/nginx/temp/proxy_temp --http-fastcgi-temp-path=/data/server/openresty/nginx/temp/fastcgi_temp --http-uwsgi-temp-path=/data/server/openresty/nginx/temp/uwsgi_temp --http-scgi-temp-path=/data/server/openresty/nginx/temp/scgi_temp --with-poll_module --with-threads --with-file-aio --with-http_v2_module --with-http_realip_module --with-http_addition_module --with-http_xslt_module --with-http_xslt_module=dynamic --with-http_image_filter_module --with-http_image_filter_module=dynamic --with-http_geoip_module --with-http_geoip_module=dynamic --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_auth_request_module --with-http_random_index_module --with-http_secure_link_module --with-http_degradation_module --with-http_slice_module --with-http_stub_status_module --with-mail=dynamic --with-mail_ssl_module --with-stream --with-stream=dynamic --with-stream_ssl_module --with-stream_realip_module --with-stream_geoip_module --with-stream_geoip_module=dynamic --with-stream_ssl_preread_module --with-pcre=/data/src/pcre-8.43 --with-zlib=/data/src/zlib-1.2.11 --with-openssl=/data/src/openssl-1.1.1d --add-module=/data/src/nginx-module-vts --add-module=/data/src/nginx-upstream-serverlist --with-stream --with-stream_ssl_preread_module --with-http_ssl_module

@jiucaiProductions
Copy link
Author

request to nginx with QPS > 4000, it will happend definitely, you may test it.
memory increase is slow, it used about 5 hours from 50% to 90% on our servers.

@abadcafe
Copy link
Owner

abadcafe commented Dec 18, 2019

这个插件只有一个已知的内存泄漏, 只发生在后端serverlist频繁刷新的时候, 但按说这种时候其泄漏量也很小基本上是稳定的. 其余时候在我们公司的线上是稳定的, 包括双十一期间, 我们的请求量在单机8000以上.

你能把你的serverlist内容贴一下么?

@jiucaiProductions
Copy link
Author

jiucaiProductions commented Dec 19, 2019

serverlist返回的就是很多行如下配置信息:

server server001:8000 weight=10 max_fails=5 fail_timeout=1s;

大概30个左右,内存泄露很明显。

@abadcafe
Copy link
Owner

能把整个文件附在附件里吗?

@jiucaiProductions
Copy link
Author

curl http://192.168.0.10:8080/upstream/apiServerList
server sapi-001:8081 weight=10 max_fails=5 fail_timeout=1s;
server sapi-002:8081 weight=55 max_fails=3 fail_timeout=1s;
server sapi-003:8081 weight=55 max_fails=3 fail_timeout=1s;
server sapi-004:8081 weight=55 max_fails=3 fail_timeout=1s;
server sapi-005:8081 weight=55 max_fails=3 fail_timeout=1s;
server sapi-006:8081 weight=55 max_fails=3 fail_timeout=1s;
server sapi-007:8081 weight=55 max_fails=3 fail_timeout=1s;
server sapi-008:8081 weight=55 max_fails=3 fail_timeout=1s;
server sapi-009:8081 weight=55 max_fails=3 fail_timeout=1s;
server sapi-010:8081 weight=55 max_fails=3 fail_timeout=1s;

其他见文件

ng-conf.zip

@abadcafe
Copy link
Owner

abadcafe commented Dec 19, 2019

OK, I saw, and the case is caused by the pool used by the serverlist parser, which is the main conf's never-destroyed pool. Every parse action may allocate some memory, and never free.

It is intentionally, because nginx is designed only support static conf, and the upstream servers struct in the static conf are used in many code corners, so free outdated upstream servers struct is very hard, except you can accept coredump ---- which is occurred in many other dynamic upstream modules, like this: GUI/nginx-upstream-dynamic-servers#30. and the pull request in this issue was incomplete either, it will coredump in keep-alive enabled connections.

As a workaround, you can use LAST-MODIFIED or ETAG http header and 304 http status code to tell the module the serverlist are not refreshed, and the module then will bypass parse serverlist, and bypass memory allocation eventually.
--------------------太长不看汉语版分割线----------------------------------------
每次解析serverlist的时候为了保证不出coredump, 只能用main conf的pool, 而这个pool是不能destroy的. 因此每次解析都会分配一些内存. 不是我想这样偷懒, 而是因为这个地方想要回收内存很难处理地很干净, 例如我贴出的另一个动态upstream插件, 它就尝试处理了, 但就会coredump. 那么你可以在你的server端返回LAST-MODIFIED或ETAG的http头, 并适当返回304的返回码, 这样就可以在serverlist没更新的时候避免解析动作从而避免分配内存. 当然, 有更新的时候还是会分配新的内存, 但这个量会少很多.

@jiucaiProductions
Copy link
Author

jiucaiProductions commented Dec 24, 2019

Last-Modified header 我已经加上了,短期观察内存确实比较稳定了,
运行一段时间内存占用开始涨了,两个机器分配的流量是一样的,机器配置也一样。
这段时间 serverlist 的返回值和 Last-Modified 没有任何变化。

图一:正常的
ng4

图2 :包含 nginx-upstream-serverlist 的:
ng5

@abadcafe
Copy link
Owner

你中间做了什么变更么? serverlists有极频繁变化和大量增加? 能不能确认返回值是304? 只要是304, 就一定不会分配新内存.

@zhaofeng0019
Copy link

你好,我想我已经解决了这个动态解析模块的memory crash 和 memory leak的问题,你可以尝试一下我的模块或者帮我检查一下吗? 谢谢
https://github.com/zhaofeng0019/nginx-upstream-dynamic-resolve-servers

@zhaofeng0019
Copy link

在原来的模块我也提了一个pr,专门解决了内存问题,你也可以看一下
谢谢
GUI/nginx-upstream-dynamic-servers#33

@cinquemb
Copy link

你好,我想我已经解决了这个动态解析模块的memory crash 和 memory leak的问题,你可以尝试一下我的模块或者帮我检查一下吗? 谢谢
https://github.com/zhaofeng0019/nginx-upstream-dynamic-resolve-servers

@zhaofeng0019 This looks pretty cool. There are a couple of things going on here: using a pool queue, avoiding exiting the process (and initializing that queue on starting a new process) and using ngx_http_upstream_init_round_robin vs init (as well as requiring a mod to the main source.

Do all these things need to be done together or each of the them alone solves different issues?

The reason I ask is that i use another 3rd party lib (https://github.com/gnosek/nginx-upstream-fair) and I'm running into issues (not really memory growing fast since I use the LAST-MODIFIED/ETAG) where after a while, nginx just starts trying to make connections against the dummy server: 2020/08/31 07:36:29 [error] 29352#0: *10 connect() to 127.255.255.255:80 failed (101: Network is unreachable) while connecting to upstream and I wonder if this has something to do with either the ngx_http_upstream_init_round_robin vs init thing or the exiting the process thing (seeing as this module is a bit diff from the nginx-upstream-dynamic-resolve-servers by replacing the servers inside of the upstream not using the resolver, but has some shared overlap when in concerns the changes you made).

@zhaofeng0019
Copy link

cinquemb, thank you for looking into my code.

mod to the main source is not necessary.
if you don't want to mod the main source ,
you can see my pr into the original project https://github.com/GUI/nginx-upstream-dynamic-servers/pull/33/commits
and try it. it may helpful.

And about the 3rd party lib (https://github.com/gnosek/nginx-upstream-fair)
nowdays i'm busy, and maybe don't have time to check it out.

thanks again.

@cinquemb
Copy link

cinquemb commented Sep 1, 2020

yeah i saw that pr @zhaofeng0019, the thing is that https://github.com/abadcafe/nginx-upstream-serverlist works differently than https://github.com/GUI/nginx-upstream-dynamic-servers so its not a 1 to 1 map with this project.

but I can test out a couple of the changes you made ( https://github.com/GUI/nginx-upstream-dynamic-servers/pull/33/commits) by porting them to this module and see if they clear up some things im seeing.

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants