"We Live?" - Monitoring and Connecting HLS Streams to Your Own Uptime Monitor

I use Uptime Robot to monitor just about everything for our business. From our sites, to the sites of service-providers we use, to specific ports, to heartbeat-checks on our own bare-metal servers, but there's one thing that, until recently, I was having trouble using it for, live stream monitoring.

Our status page at status.midsouth.live

Now don't get me wrong, I had a monitor setup on the main domain of our CDN and our streaming provider, but, that's not really worth a whole lot. The main site and web-server could very easily be up, while streaming and HLS delivery was not operating at all, I wanted a better solution.

Enter HLSAnalyzer.com - a company I found through Google that, for a base-rate of $20/month for 5 streams, plus about $3.50/stream after that, will consistently monitor your HLS stream, report detailed stats, and send out alerts when there's a problem.

So I signed up and started testing and, well - it works really well. There's a few little nitpicks I have, namely, I wish it detected and resolved issues faster. See, it uses the buffer-size of your stream to determine issues, or to determine that issues have been resolved. This is good for accuracy, but bad for speed, as there are minimum thresholds involved in what constitutes an "alarm" state. Overall, it usually detects and reports an outage within 5 minutes of it occurring, and reports it resolved within 10 minutes of it being resolved. Not perfect, but not bad.

HLSAnalyzer Dashboard

The HLSAnalyzer service itself advertises itself as API-based, but with a web-interface that can do pretty much all that you'd need - and that's correct, I haven't had to do any coding or mess with any API calls to use the service.

Notifications can be delivered either through e-mail (that's what we'll be using) or HTTP post calls.

HLSAnalyzer Dashboard

So pretty much you just configure the parameters, link it up to your HLS stream, and it'll alert you of up and down events, really easy. You can also view detailed stats about the stream. And if that's where you want to end things, if e-mail alerts alone are good enough for you, thanks for reading!

But I wanted to implement this into my normal Uptime Monitor: Uptime Robot, for the benefit of just unifying everything, and also so it would feed to our status page. So let's dig into that...

Enter Zapier. If you haven't heard of Zapier before, it's basically a service for connecting different web-apps together. It works the same way as using various APIs, but it's for non-programmers. There's a free tier and some very reasonably-priced paid tiers.

Zapier

Now, keep in mind, I am using a Pro Zapier Plan with the "paths" feature, which I'll demonstrate in this post. But it should be possible to re-create what I've done here on a Free Zapier plan, just by using two separate Zaps, however, I can't speak to what other limitations you may run into with the free plan.

Also keep in mind there's plenty of other services that do what Zapier does, this is just what I use. Just Google "Zapier Alternatives" and you'll find a goldmine.

Moving on to the setup, you can make changes as desired, but here's what you'll need if you want to follow along exactly:

  • An HLSAnalyzer Account
  • A paid Zapier account with the 'paths' feature
  • A Google Drive / Docs Account
  • An Uptime Robot Account

So what I've done in Zapier is create a Zap that is triggered by an incoming E-mail, Zapier helpfully provides an e-mail address you can use to receive triggers. I've copied this e-mail address as one of the recipient addresses in HLSAnalyzer, this is how HLSAnalyzer communicates with Zapier, it sends notification e-mails to an inbox connected to the Zap.

Zapier E-mail Trigger

Entered in HLSAnalyzer

Next is the filter. Every HLSAnalyzer stream has a unique ID that it shows in the dashboard, and helpfully, also includes in it's Up and Down alert e-mails . Using this ID, we can introduce a filter into the Zapier chain so that only the stream we actually want to monitor will trigger the Zap to move forward. If you want to monitor if ANY stream goes down, you don't need this filter step.

Link ID Zapier Filter

Moving onto our final step, which is a path. This path conditionally runs either the 'Pass' or 'Fail' route, depending on what keyword the e-mail that was received contains.

Zapier Condition

The fail path looks for the keyword "Outage Alert" in the subject of the e-mail, if that keyword is found, the path continues.

Zapier Condition

Once it continues, it does one thing: update the first cell in a Google Spreadsheet (any spreadsheet will do) with the word "OFFLINE" - This is what our uptime monitor will watch to determine the status, this spreadsheet.

Update Spreadsheet

By now I think you're getting the picture, it's the same story for the "Pass" path, just give it the keyword to look for in the subject...

Zapier Condition

And tell it to write "ONLINE" to the first cell of the Google Sheet. (It needs to be the same cell in the same sheet that the "Offline" path uses, the status needs to change from one to the other, both should never be present.)

Update Spreadsheet

Now wrap things up, test your steps, name your Zap, and enable it, we're done with the Zapier part!

Finished Zap

Now we just need one thing before we proceed to our uptime monitor - the static HTML link to that spreadsheet. So let's navigate to our Google Spreadsheet, go to File > Share > Publish to Web. Go ahead and make sure "Automatically Republish When Changes Are Made" is ticked, click publish, and get and copy your URL.

Get the static URL

Now, finally, we're heading to UpTime Robot. We're going to create a new keyword monitor, it's going to alert us when the keyword (ONLINE) is NOT found, and it's going to point at, you guessed it, our spreadsheet.

Create uptime monitor

Setup your alert contacts and the rest of your standard fare for an uptime alert, and you're done! You now have an HLS Uptime Monitor that, as long as everything is working, will alert you within about 5-10 minutes of the stream going down, and 5-10 minutes of it coming back up.

Keep in mind we get these values partly because the previously mentioned way HLSAnalyzer works, but also because Google Sheets actually only updates the static output itself every 5 minutes, so that adds a bit of a variable. I actually did an alternate test to this, where instead of Google Sheets, Zapier creates an HTML file in Amazon S3 with the keyword, and it worked fine too, and updated instantly, so that's also an option.

As for 24/7 monitoring of just the server itself - I use Castr.io and their "Pre-Recorded Stream" feature, to broadcast a continuous feed of a 240p 500kbps test video to our streaming provider, and THAT is the HLS URL we monitor. We keep it such low quality because I mean, we pay for all the bandwidth we use.

Anyway, I hope this was helpful - take it easy!

By Michael W

Observations: The Power of SRT + Latency in Bandwidth Constrained Situations

Update (10/11/21) Added a note discussing the max supported latency of SRT

So to start this out, I need to establish some context: Over the past three or four days, I've been having internet issues at home. Particularly, with the upload connection.

With that in mind, early Sunday evening I was messing around and testing some off-site streaming decoders. I needed to send out a stream to test with, but quickly realized that wasn't a very feasible option with my internet in the state that it was.

However, out of curiosity, turning to my recent experience with the SRT protocol, I decided to try something just to see if it would work, and the results shocked me.

So what was that 'something'? We'll get to that.

The bulk of the rest of this story will be told through text and video, to better illustrate things.

I would recommend that you check out the videos in full-screen for clarity.

Also please mind that the controls don't block anything on screen, as there's a few points in the videos where information is displayed at the very bottom of the screen.

So to start out, lets take a look at my upload-speed prior to this little experiment, taken just minutes before on the same PC, same connection.


There's one additional thing to note, I'll be using a lot of latency with SRT. SRT allows you to define latency, and I'm taking advantage of that here. I'm not an expert, but it seems like the latency relates to some sort of buffer.

You can adjust the latency as shown in the video, the latency is measured in micro-seconds.

Update: Someone in a Facebook group that I shared this article in mentioned that the max supported latency of SRT, per the specification, is 8 seconds. Based on some quick Googling, this seems right. 20ms minimum, 8000 ms maximum. I haven't checked in any technical sheets. But it's what every manufacturer and developer has listed on their site, so I'm going with it. Keep things at 8 seconds or less. I used 10 seconds in this test, and 12 in others, but this was just a test, better safe than sorry in a production.


And of course, we'll want to discuss the encoding settings. They'll be the same for both the SRT and RTMP tests: H.264, Superfast, 2.3 Mbps, 1280x720p, 96 kbps audio. Pretty standard stuff. Part of the reason I used OBS for this test is that it allows more precise tuning of encoding for SRT.

The encoding settings look like this


So with that out of the way, lets get to the experiment: Streaming on this connection, reliably, with two different protocols: RTMP, and SRT.

And now on to the show, first up RTMP, lets see how it performs

RTMP Streaming Test


It's...not good. In fact it's unusable, over 90% of the frames sent to the CDN were dropped, as expected, we're trying to push 2.3 Mbps down a connection that's fluctuating between 400 and 900 Kbps, and God only knows what the packet-loss and jitter is like.

But, just for laughs, we're going to try the test again with SRT, and as discussed previously, we're going to give it a bunch of latency, 10 seconds in this instance.

SRT Streaming Test


It's kind of night and day isn't it? One works and one...doesn't. But more than that, the SRT stream is actually pretty flawless. There's very little quality degradation and, aside from the increased latency, it looks just like a regular RTMP stream would in this instance.

I can't explain how this works; how I was able to effectively push 2.3 Mbps through a pipe that was being throttled down to 400 Kbps at times. I guess it's a combination of the latency, buffer, and error-correction.

I likewise cannot guarantee this will be the case for you, or be replicable over time. But I've had similar experiences with SRT previously, and as someone that deals with a lot of customers broadcasting from rural areas, this is very encouraging. I'll continue experimenting with what can be accomplished with this protocol in relation to bandwidth-challenged situations.

Michael Wilson