Cellular IoT Devices Fail; Make Sure They Fail Gracefully

2023-01-30
关注

Cellular IoT Devices Fail
Illustration: © IoT For All

There is a sort of paradox built into big cellular IoT deployments. The instant you send your devices into the field, you lose a lot of control over performance. Now and then, at least, connections in your cellular IoT devices will fail. You can’t control what people do with your device. You can’t control user behavior, and some of that behavior will disrupt any network connection. 

Speaking of networks, there’s another thing you can’t control. The last big, global survey of mobile network operators (MNOs) on the subject of failure is from 2016. But even old news suggests the limits of network infrastructure. Back then, 30% percent of responding MNOs said they had network outages and service problems up to three times a year. Even more, 34 percent admitted to more than 15 outages or “service degradations” per year. More recently, J.D. Power found the proliferating cellular devices of 2022 leading to more widespread problems with “network quality.” 

Cellular connectivity is great for IoT, and it should get better and better as new technologies like 5G New Radio go mainstream. But it probably won’t ever be 100 percent. You can and should design devices for consistent connectivity. Still, when your products hit the market, expect the unexpected. If cellular IoT designers can’t prevent connectivity failures, they can program devices to fail gracefully. Here’s a glimpse of what that might look like.

“The instant you send your devices into the field, you lose a lot of control over performance. Now and then, at least, connections in your cellular IoT devices will fail.”

-Eseye

Designing for Graceful Failure

In the context of cellular IoT connectivity failure, what do we mean by graceful? Four things, really. Let’s look at each of these goals in turn. 

#1: Fail-Over Connectivity

As we mentioned, networks sometimes fail. But where one network fails, a well-designed device can fail over to a backup. Depending on the device, you may include fail-over modes that shift to WiFi, satellite, or just another cellular network. A successful fail-over will keep your device operating until it can reconnect to the primary network. But it’s also possible for redundant connections to fail, too. That can be dangerous. Picture smart traffic lights at a busy intersection, for example. That’s why you must also program firmware for another layer of protection, which brings us to the next item on our list.  

#2: Default Failure Modes

Your firmware must include instructions on what to do when connections are hopelessly lost. In the traffic light example, devices might default to a standard, non-smart but serviceable pattern that keeps cars from intersecting along with the intersection. Safe failure modes will look different from one device to another. The trick is to anticipate real-world scenarios and design basic device behavior that keeps users safe until a connection can be re-established. 

#3: Preventing Cascading Network Problems

Poorly programmed IoT devices are persistent if nothing else. They don’t just make mistakes; they repeat them to the point of disaster. Say a smart thermometer is programmed to send more frequent notifications the higher the temperature rises. Then, say the sensor breaks and the system interprets the lack of signal as a temperature of infinity. That device could start sending notifications every second; it could send so much data that the network becomes congested. Then other devices might start repeatedly re-sending their own failed transmissions. In the end, that single runaway device could cause a signal storm that brings down the whole network.  

That’s a problem for connectivity platforms, firmware designers, or both. Somewhere in the system, devices need a rate limit on data. No matter what the cause of a signal storm is, the effect will never be cascading network failure.    

#4: Graceful Recovery

Finally, devices must reconnect to the network without tearing it to the ground. The real risk, following a network outage, is the signal storm. If 100,000 devices try to reconnect to the network at once, you’ll have a congestion problem that could start a traffic fiasco all over again. The simplest way to ensure graceful recovery is to program reconnection attempts with an exponential back-off. A device can try to reconnect. If the connection doesn’t work the first time, it can try again. But between each attempt, there’s an exponentially increasing buffer of wait time. That helps to prevent network collisions that lead to signal storms.

Include Experts

Of course, we can’t stress enough how different every IoT deployment is. The examples we discuss above may or may not apply to your project. The best thing you can do to create cellular IoT products that fail gracefully is to confer with proven IoT experts. Get started with a list of IoT services every product creator needs. You may not be able to prevent the occasional connection failure, but you can control your device’s response—and limit the impact on users. That’s more than good design. It’s downright graceful.

Tweet

Share

Share

Email

  • 5G
  • Cellular
  • Connectivity

  • 5G
  • Cellular
  • Connectivity
  • IoT Business Strategy

参考译文
蜂窝物联网设备故障;确保他们优雅地失败
大型蜂窝物联网部署存在一种悖论。一旦你将设备发送到现场,你就失去了对性能的很多控制。至少,你的蜂窝物联网设备时不时会出现连接失败的情况。你无法控制人们用你的设备做什么。你不能控制用户的行为,而某些行为会破坏任何网络连接。说到网络,还有一件事你无法控制。最近一次针对移动网络运营商(MNOs)关于失败主题的大型全球调查是在2016年。但即使是旧新闻也表明了网络基础设施的局限性。当时,30%的受访MNOs表示,他们每年最多遇到三次网络中断和服务问题。甚至,34%的人承认每年有超过15次的服务中断或“服务降级”。最近,J.D. Power发现,2022年激增的蜂窝设备将导致更广泛的“网络质量”问题。蜂窝连接对物联网来说很好,随着5G新无线电等新技术成为主流,它应该会变得越来越好。但可能永远不会达到100%。您可以并且应该为一致的连接设计设备。不过,当你的产品进入市场时,要做好意想不到的准备。如果蜂窝物联网设计师不能防止连接故障,他们可以编程设备优雅地故障。下面是它可能的样子。“一旦你把设备送到现场,你就会失去对性能的很多控制。至少,你的蜂窝物联网设备时不时会出现连接失败的情况。“在蜂窝物联网连接故障的背景下,我们所说的优雅是什么意思?其实有四件事。让我们依次来看看这些目标。正如我们提到的,网络有时会出现故障。但是当一个网络出现故障时,设计良好的设备可以故障转移到备份。根据设备的不同,你可以设置切换到WiFi、卫星或其他蜂窝网络的故障转移模式。成功的故障转移将使您的设备继续运行,直到它重新连接到主网络。但冗余连接也有可能失败。这可能很危险。例如,想象一下繁忙十字路口的智能交通灯。这就是为什么你还必须为固件编写另一层保护,这就把我们带到列表上的下一项。您的固件必须包含当连接不可救药地丢失时该怎么做的说明。在交通灯的例子中,设备可能默认为一个标准的、非智能的但可用的模式,以防止车辆沿着十字路口交叉。每个设备的安全故障模式看起来都不一样。诀窍在于预测真实世界的场景,并设计基本的设备行为,以保证用户的安全,直到重新建立连接。编程糟糕的物联网设备是持久的。他们不仅会犯错;他们不断重复,直到酿成灾难。比如,一个智能温度计被设定为温度升高越频繁地发送通知。然后,假设传感器损坏,系统将信号缺失解释为温度无穷大。该设备可以开始每秒钟发送通知;它可能会发送太多的数据,导致网络拥塞。然后其他设备可能会开始反复重新发送它们自己失败的传输。最后,一个失控的设备可能会引发一场信号风暴,导致整个网络瘫痪。这对连接平台、固件设计者或两者都是一个问题。在系统的某个地方,设备需要对数据的速率进行限制。无论信号风暴的原因是什么,其影响永远不会是级联网络故障。 最后,设备必须在不破坏网络的情况下重新连接到网络。在网络中断之后,真正的风险是信号风暴。如果100,000台设备试图同时重新连接到网络,就会出现拥塞问题,可能会再次引发交通灾难。确保正常恢复的最简单方法是将重新连接尝试编程为指数级回退。设备可以尝试重新连接。如果第一次连接不成功,可以再试一次。但是在每次尝试之间,等待的缓冲时间呈指数级增长。这有助于防止网络碰撞导致信号风暴。当然,我们再怎么强调物联网部署的不同也不为过。我们上面讨论的例子可能适用于您的项目,也可能不适用。要创建优雅地失败的蜂窝物联网产品,最好的方法是与久经验证的物联网专家协商。从每个产品创造者都需要的物联网服务列表开始。您可能无法防止偶尔的连接故障,但您可以控制设备的响应,并限制对用户的影响。这不仅仅是好的设计。这简直太优雅了。
  • 蜂窝网络
  • 蜂窝数据
  • en
您觉得本篇内容如何
评分

评论

您需要登录才可以回复|注册

提交评论

iotforall

这家伙很懒,什么描述也没留下

关注

点击进入下一篇

人工智能磁悬浮列车:磁悬浮汽车的灵感来源

提取码
复制提取码
点击跳转至百度网盘