pod watches stop sending data after a while, but remain connected in 1.18.4 #1755

blushingpenguin · 2020-07-29T13:43:30Z

What happened:

After upgrading from 1.16.10 to 1.18.4 watches stop working after a period of time. The connection to api server from the pod remains (i.e. it can be seen with lsof), but no further watch data is sent.

What you expected to happen:

With < 1.16.10 (back to 1.13), periodically the api server connection would drop after a period of time (which I believe was expected) -- in that case we'd just reconnect.

How to reproduce it (as minimally and precisely as possible):

I don't have an easy reproduction (the code is a custom job scheduler written in .NET using the k8s csharp client) -- I can probably boil it down if reproduction is necessary.

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version):
1.18.4
Size of cluster (how many worker nodes are in the cluster?)
3 x b2ms
General description of workloads in the cluster (e.g. HTTP microservices, Java app, Ruby on Rails, machine learning, etc.)
dotnet / node microservices

The text was updated successfully, but these errors were encountered:

ghost · 2020-07-29T13:43:33Z

Triage required from @Azure/aks-pm

ghost · 2020-08-08T22:01:24Z

Action required from @Azure/aks-pm

ghost · 2020-10-10T02:01:17Z

This issue has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs within 15 days of this comment.

blushingpenguin · 2020-10-10T05:50:00Z

this was still happening as of a few days ago

arsnyder16 · 2020-10-10T12:10:46Z

@blushingpenguin I was experiencing this as well. At one point it was very intermittent and then it started happening consistently. I was advised by the AKS team to configure my watches to be lower than 4 minutes. This is the default request timeouts for the azure load balancer, this did resolve my issue. Unfortunately these silent failure are problematic in that there are internal workloads with the cluster that have watches longer than 4 minutes.

From my understanding this the load balancer on the control plane side so its out of the users control on configuration, but maybe we could get some clarification from someone on the AKS team though @juan-lee @palma21

FWIW. I was on 1.17, andusing nodejs, I would see it very infrequently in previous version well. not sure if/how the k8s version correlates to the issues.

Here are some related issues
#1877
#1052

thomasfrederikhoeck · 2020-10-12T13:32:23Z

@blushingpenguin I'm experiencing it when I have long-lived applications in the cluster where the application deploys k8 jobs using inter-cluster comms. It times-out after 4 min silently.

I think the reason is that the load balancer that AKS is using for intercluster calls to the API has "TCP reset" disabled but I don't think there is anyway to change this.

bergeron · 2020-10-13T21:48:47Z

Seeing this too. I think it happens when the watcher receives no events for > 4 minutes.

I wrote a C# program that reproduces the problem: https://gist.github.com/bergeron/10c88fd26aa683619fc75cfd85f63acf

When I modify the watched resource <= every 4 minutes, everything works:

The watcher receives the modification events.
The watcher is closed after ~30 minutes (expected, the application should re-open the watch).

But when I modify the watched resource > every 4 minutes:

The watcher receives no modification events.
The watcher is never closed (no OnClosed event or exception).

This happens both when running the watcher outside the Kubernetes cluster (connecting to the API server via *.azmk8s.io:443), and also when running the watcher inside a pod in the Kubernetes cluster (connecting via KUBERNETES_SERVICE_HOST:KUBERNETES_SERVICE_PORT or 10.0.0.1:443).

I didn't find the AKS version to have any effect. It reproduced on AKS 1.16.7, 1.16.13, 1.16.15, 1.17.9, 1.17.11, 1.18.6, 1.18.8, 1.19.0. I don't see this problem on kubeadm 1.15. I haven't tested other Kubernetes environments.

Setting timeoutSeconds to < 4 minutes on the watch call seems to work though. The connection closes before it starts missing events and our application can restart the watch.

thomasfrederikhoeck · 2020-10-14T11:54:52Z

@bergeron Someone created an very similar script (Python though) on #1052 and only experienced it on AKS not on AWS EKS.

nilo19 · 2020-10-15T14:44:41Z

I use the simple csharp app from @bergeron and I noticed that there is a TCP RST package after ~5min idle. Actually, this behavior is by design that the AKS underlay SLB would send a RST signal to the client after ~4min idle. The client must handle the signal and re-establish the connection. There are several ways to handle this:

turn down the timeout time to less than 4min
config TCP keepalive

blushingpenguin · 2020-10-15T14:54:55Z

For #2 with the k8s c# client there is no way of turning on tcp keepalives as it uses HttpClient: dotnet/runtime#31267.
I'm not sure what you mean by #1.

I also don't see a TCP RST (if there was a RST, the connection would fail on the client -- that would be ok as it just needs to restart the watch -- however as no RST is sent the connection actually stays open from the client's perspective).

thomasfrederikhoeck · 2020-10-16T08:13:01Z

@nilo19 I never see a TCP RST. It almost seems like the SLB doesn't have RST enabled. I'm on AKS 1.17.7. NOTE: that all this is in-cluster comms.

If apply the following python script

import time
import logging
from kubernetes import config

logging.basicConfig(
    format='%(asctime)s %(levelname)-8s %(message)s',
    level = logging.DEBUG)

config.load_incluster_config()

from kubernetes import client
v1 = client.CoreV1Api()
logging.info('Calling 1st time')
v1.list_namespaced_pod('default')
logging.info('Sleeping 5 minutes')
time.sleep(300)
logging.info('Calling 2nd time')

# this call will timeout after 15 minutes
v1.list_namespaced_pod('default')
logging.info('OK')

I get the following logs.

2020-10-16 07:41:29,261 INFO     Calling 1st time
2020-10-16 07:41:29,344 DEBUG    response body: {Removed my AKS specifics}
2020-10-16 07:41:29,352 INFO     Sleeping 5 minutes
2020-10-16 07:46:29,447 INFO     Calling 2nd time
2020-10-16 08:02:07,806 WARNING  Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='10.0.0.1', port=443): Read timed out. (read timeout=None)")': /api/v1/namespaces/prefect/pods
2020-10-16 08:02:07,838 DEBUG    response body: {Removed my AKS specifics}

The following happens:

First call is a success
Waits for 5 min.
Call with same connection
Time-out after 15 min as no response was given from call.
Retyr connect.

marwanad · 2020-10-16T17:11:24Z

@thomasfrederikhoeck are you capturing the tcpdumps on the node or the pod itself?

thomasfrederikhoeck · 2020-10-19T07:15:58Z

@marwanad sorry, I don't understand what you mean. I run it as a Deployment on the cluster where I'm using the python:3.7 image as base image. I then look at the logs and get the what I have described above. You might need to set up a Role and Rolebinding to allow connecting for the pod to the k8 api.

marwanad · 2020-10-19T16:27:15Z

@marwanad sorry, I don't understand what you mean. I run it as a Deployment on the cluster where I'm using the python:3.7 image as base image. I then look at the logs and get the what I have described above. You might need to set up a Role and Rolebinding to allow connecting for the pod to the k8 api.

My bad if I wasn't clear enough. I was curious about the part "I never see a TCP RST." - how are you validating that?

edit: All I'm saying is looking at those logs isn't enough to tell you if you receive a RST packet or not - See kubernetes-client/python#1132. You need to capture tcpdump from the host and the pod.

thomasfrederikhoeck · 2020-10-20T10:14:23Z

@marwanad pardon my ignorance but what I meant was that the client never receives a message that the connection has been closed.

Isn't the issue you are pointing to the other way around, where the connection is intentionally closed by the client using the python kubernetes package (but then never really closed)?

I guess it would be pretty easy for someone from the AKS team to check if the SLB used for the k8 api has TCP Reset enabled or not, right?

nilo19 · 2020-10-20T12:11:10Z

@thomasfrederikhoeck the TCP reset on our LB is on by default and the idle timeout is 4min. I create a watch and let it idled for 4min. Then I could collect a TCP RST using tcpdump. I believe the issue is introduced because the k8s client failed to handle the RST packet.

thomasfrederikhoeck · 2020-10-20T13:02:24Z

@nilo19 Okay, that makes sense. Thank you (and @marwanad ) for the answers 😄

yangl900 · 2020-10-27T06:37:41Z

Hi @thomasfrederikhoeck , just want to let you know that I picked this issue and did some testing. I'm still doing more testing but I posted the current test code and results in this repo https://github.com/yangl900/knet#client-python-tests.

The findings I have is that:

Specifically the k8s python client does NOT have tcp keepalive on
Recently AKS deployed an improvement that RST packet will be sent to client on 4min and 10s, with my testing I can observe the RST packet, and I can see the k8s python client handled it well. I used the python script you shared and it passed my testing constantly.

I'm still trying to see if there is a workaround to enable keepalive for urllib3, not an expert on python :)

You can deploy the same script to have a test:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: pod-reader
rules:
- apiGroups: [""] # "" indicates the core API group
  resources: ["configmaps"]
  verbs: ["get", "watch", "list", "update", "patch", "delete", "create"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: read-pods
subjects:
# You can specify more than one "subject"
- kind: ServiceAccount
  name: default # "name" is case sensitive
  apiGroup: ""
roleRef:
  # "roleRef" specifies the binding to a Role / ClusterRole
  kind: Role #this must be Role or ClusterRole
  name: pod-reader # this must match the name of the Role or ClusterRole you wish to bind to
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: knet-apiserver-watcher-py
spec:
  selector:
    matchLabels:
      knet: apiserver-watcher-py
  replicas: 1
  template:
    metadata:
      labels:
        knet: apiserver-watcher-py
    spec:
      containers:
        - name: watcher
          image: yangl/apiserver-watcher-py
          imagePullPolicy: Always
        - name: tcpdump
          image: corfr/tcpdump
          command:
            - "/usr/sbin/tcpdump"
            - "-i"
            - "any"
            - "-nn"

yangl900 · 2020-10-30T07:09:52Z

Hi @blushingpenguin @bergeron @arsnyder16 - just want to let you know with recent AKS infrastructure release, you should see RST always. We still recommend client side turn on TCP keepalive, but I understand not all k8s client does that, e.g. python does not have it today. Please let me know if you still experience any issue with watches. I have some testing results posted on https://github.com/yangl900/knet and will keep posting.

thomasfrederikhoeck · 2020-11-03T07:24:46Z

@yangl900 sorry for the late reply. I had a change to check it out know and everything is working smoothly for me now. Thank you! :-)

ghost · 2020-11-05T12:02:15Z

Thanks for reaching out. I'm closing this issue as it was marked with "Answer Provided" and it hasn't had activity for 2 days.

ghost added the triage label Jul 29, 2020

ghost added the action-required label Jul 31, 2020

github-actions bot added the Top 10 Issue label Aug 3, 2020

ghost removed the triage label Aug 3, 2020

ghost added the Needs Attention 👋 Issues needs attention/assignee/owner label Aug 8, 2020

palma21 assigned juan-lee Aug 10, 2020

palma21 added Feedback General feedback question and removed Needs Attention 👋 Issues needs attention/assignee/owner Top 10 Issue action-required labels Aug 10, 2020

ghost added the stale Stale issue label Oct 10, 2020

ghost removed the stale Stale issue label Oct 10, 2020

yangl900 added the networking label Oct 30, 2020

yangl900 self-assigned this Oct 30, 2020

palma21 added the resolution/answer-provided Provided answer to issue, question or feedback. label Nov 3, 2020

ghost closed this as completed Nov 5, 2020

github-actions bot mentioned this issue Nov 15, 2020

[AKS] Release 2020-11-09 dev-obs/actus#272

Open

amotl mentioned this issue Nov 19, 2020

Networking robustness and resiliency on Azure and beyond (AWS, GCP, AliCloud) crate/crate#10779

Closed

brendandburns mentioned this issue Nov 25, 2020

kubernetes watch API is behaving oddly kubernetes-client/java#1370

Closed

Azure locked as resolved and limited conversation to collaborators Dec 5, 2020

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pod watches stop sending data after a while, but remain connected in 1.18.4 #1755

pod watches stop sending data after a while, but remain connected in 1.18.4 #1755

blushingpenguin commented Jul 29, 2020

ghost commented Jul 29, 2020

ghost commented Aug 8, 2020

ghost commented Oct 10, 2020

blushingpenguin commented Oct 10, 2020

arsnyder16 commented Oct 10, 2020

thomasfrederikhoeck commented Oct 12, 2020 •

edited

bergeron commented Oct 13, 2020 •

edited

thomasfrederikhoeck commented Oct 14, 2020

nilo19 commented Oct 15, 2020

blushingpenguin commented Oct 15, 2020

thomasfrederikhoeck commented Oct 16, 2020 •

edited

marwanad commented Oct 16, 2020

thomasfrederikhoeck commented Oct 19, 2020

marwanad commented Oct 19, 2020 •

edited

thomasfrederikhoeck commented Oct 20, 2020

nilo19 commented Oct 20, 2020

thomasfrederikhoeck commented Oct 20, 2020

yangl900 commented Oct 27, 2020 •

edited

yangl900 commented Oct 30, 2020

thomasfrederikhoeck commented Nov 3, 2020

ghost commented Nov 5, 2020

pod watches stop sending data after a while, but remain connected in 1.18.4 #1755

pod watches stop sending data after a while, but remain connected in 1.18.4 #1755

Comments

blushingpenguin commented Jul 29, 2020

ghost commented Jul 29, 2020

ghost commented Aug 8, 2020

ghost commented Oct 10, 2020

blushingpenguin commented Oct 10, 2020

arsnyder16 commented Oct 10, 2020

thomasfrederikhoeck commented Oct 12, 2020 • edited

bergeron commented Oct 13, 2020 • edited

thomasfrederikhoeck commented Oct 14, 2020

nilo19 commented Oct 15, 2020

blushingpenguin commented Oct 15, 2020

thomasfrederikhoeck commented Oct 16, 2020 • edited

marwanad commented Oct 16, 2020

thomasfrederikhoeck commented Oct 19, 2020

marwanad commented Oct 19, 2020 • edited

thomasfrederikhoeck commented Oct 20, 2020

nilo19 commented Oct 20, 2020

thomasfrederikhoeck commented Oct 20, 2020

yangl900 commented Oct 27, 2020 • edited

yangl900 commented Oct 30, 2020

thomasfrederikhoeck commented Nov 3, 2020

ghost commented Nov 5, 2020

thomasfrederikhoeck commented Oct 12, 2020 •

edited

bergeron commented Oct 13, 2020 •

edited

thomasfrederikhoeck commented Oct 16, 2020 •

edited

marwanad commented Oct 19, 2020 •

edited

yangl900 commented Oct 27, 2020 •

edited