The use of publicly accessible MQTT brokers is prevalent across numerous verticals and technology fields. I was able to identify systems related to energy production, hospitality, finance, healthcare, pharmaceutical manufacturing, building management, surveillance, workplace safety, vehicle fleet management, shipping, construction, natural resource management, agriculture, smart homes and far more.
Hackers have been sounding alarms about this for years, but the message has not reached many parts of the Internet. Many of these systems are clearly involved in high-power and potentially dangerous operations, and I think it is a safe bet that miitaries have been probing these systems for years and have likely found many soft spots which could be used for battle.
Over the past year, I have spent a bit of time analyzing exposed MQTT brokers on the Internet. In this post, I will outline some of these findings including examples of data disclosures I was able to identify as well as others I could not. For a brief recap of MQTT, check out my post about a connected lock.
My MQTT discovery process was initially seeded by a Shodan export with some limited masscan to supplement my host set. My first step was to design a scanning harness. I considered incorporating standard Linux tools or something like paho-mqtt, but ultimately both of these methods have some performance trade-offs and added complexity.
Instead, I went the most direct route that I could and made a short Python script to send raw packets for the MQTT handshake and dump responses to a file. The script was configured to call recv up to 100 times given a two-second timeout. The goal here was to get enough data to recognize an interesting broker while avoiding excessive resource consumption.
Although sometimes it is easy to identify an organization associated with an open broker, as was the case with U-Tech, in other cases, it is exceedingly difficult. Generally, there are a few key markers which can help reveal the owner/operator of a vulnerable service.
A lot of this information can be obtained from IP search engines like Shodan, BinaryEdge or Censys. The general process I go through for identifying a server is as follows:
- Is there an associated DNS record?
- WHOIS record?
- Are there other services running that may reveal details about the system owner or the specific application?
- TLS certificates – Is there an FQDN or otherwise descriptive name?
- HTTP responses – Is there a website identifying the system or an interesting auth realm?
- Server banners – SMTP, FTP and certain other protocols may reveal an FQDN
- Do the MQTT topic names reveal or indicate an organization or product?
- Does the published data reveal anything?
- Google for things which look like model numbers or brand names.
- Search for email addresses and consider referenced domains.
- Look for number sets that may be GPS coordinates.
This process was effective for identifying many hosts, but there was also a high rate of failure. A large number of these exposed MQTT brokers are running on public cloud servers and without any strong indicators to identify the system owner. In many cases, I could identify what type of system I was looking at and/or where it is geographically, but I had no specific contact whom I could inform.
Once identified, I would reach out via email with a templated letter indicating the IP address and a brief description of the problem. In some cases, I would also make a phone call when I thought the risk could be substantial. This did not generally improve my chances of a successful resolution.
One of the systems I had identified turned out to be providing safety monitoring services for compressed natural gas (CNG) filling stations throughout the region. I had emailed, called repeatedly and invoked multiple CERTs for assistance, but it still took a full month (and a lot of personal energy) before I could get any response from the affected organization. Their response acknowledged the problem and explained that they were working on a fix, which would take some time.
They explained that fixing this issue involves rewriting part of a very old application for which they have no source code. They further reassured me that the data is not used in automated processes and so would not directly influence the monitored technology.
Presumably, this means there are human operators making the decisions, but it is unclear to me what capability they have to validate incoming data from the MQTT broker before acting on it.
However, some of the disclosures were far more successful, as was the case with this one:
I had identified this and another server as belonging to We Work based on the ‘WW’ naming convention and periodic ‘@wework.com’ email addresses. I further confirmed the identity by correlating addresses in the MQTT data with published We Work sites.
I reached out to We Work, which promptly disabled the system and explained that it had been a ‘non-prod hackathon project.’ I did not ask for further clarification, but a review of the exposed data leads me to believe they may have been using the S2 NetBox access control software. As far as I could tell, the data seemed to be legitimate.
After recognizing from my scan data that this host was interesting, I connected momentarily with mosquitto_sub, and I was receiving notifications as badges were swiped, doors were opened and alarms were silenced. The information was complete with full names, email addresses, entry times and specific details of which door was accessed at which building.
Regardless of whether this was a non-production system, the data did appear to be legitimate with individuals coming and going at times which were reasonable for their time zone. The system seemed to be sharing real-time data from across the world-wide network of co-working facilities. This system may not have been able to actuate systems or influence decisions, but if the data was real, it absolutely gave unauthorized access to a creepy level of detail about We Work customers.
Many of the interesting systems I came across related to vehicle tracking, but I could only ascertain limited information about what was being tracked or who was doing the tracking. In one case, I found URLs referencing, among other things, driver license photos.
In another case, I had initially thought I was looking at the backend of a ride sharing service in Mumbai based on references to rider referral programs and advertisement URLs. After an unsuccessful search to identify a specific organization utilizing this server, I moved onto the next host in a list of tens of thousands.
It wasn’t until many months later that I reviewed the data again for this blog post and made a crucial recognition. After closer review of the ad URLs, it occurred to me that the S3 bucket name in the URL might be related to the organization’s name. I Googled the S3 bucket acronym along with Mumbai and quickly identified the likely system owners.
What I thought was resold advertising space was actually the vendor advertising for itself. Although I never contacted this firm, the server is now locked down, and Shodan doesn’t indicate the same topic names on any new servers. It may be that I happened to be scanning during an early stage in their system deployment, or it may be that they independently noticed this configuration faux pas and resolved it.
While reviewing the scan data, I found countless systems handling opaque or obtuse data with little indication of how it might be attacked or what impact his may have. I have little doubt that the next system on my list is absolutely a target of interest for attackers.
This system is a national lottery with jackpots worth upwards of $6M USD. The messages on this broker seemed to indicate details about lottery ticket sales including which kiosk sold it, what software version the kiosk has, which attendant was working and some other binary data that I had no means to decipher.
I have speculated that this data may have something to do with the specific ticket purchased, but I had no means to confirm or disprove this. Interestingly, I did see that most of the lottery kiosks in this system were running ancient and unsupported versions of Android. I notified the lottery repeatedly, and although I never received a response, I did confirm that the server is no longer exposed.
Other Interesting Findings
Overall, I contacted or attempted to contact system owners for less than 50 IP addresses out of tens of thousands in my scan results. Some of the other interesting findings are briefly described below.
I came across a small number of MQTT brokers sharing structured data which looked similar to past work I had done with Tensor Flow-based image analysis. These systems appear to be ML/AI-enabled IoT cameras installed in retail businesses. Some research revealed that some of these are likely retail store security cameras. The published data would typically include a snapshot URL and metadata about the number, age and gender of people in the image. Some would also periodically indicate other attributes about the image such as details like whether someone was wearing a hat or glasses or had a beard.
A lot of my scan data referenced turbine speeds, kilowatts produced and other telemetry points pertinent to power generation. The data was very generic, and they were almost entirely hosted on public clouds. Some were definitely wind fields, but there also might have be solar farms or hydro plants, as well. I was not able to specifically identify a single one of these systems, and I honestly have no idea what impact there was.
It is unlikely that these systems were hardened against false or replayed telemetry data. The question remains as to what physical implications may be possible as the result of such an attack. For example, if a system handling CNG is identified, would it be possible for a terrorist to trigger an operator error similar to when 40 houses exploded due to excessive pressure on the lines from Columbia Gas of Massachusetts?