Always visualize your data — I visualized my news feeds and had some observations

Disclaimer: I’m just trying this blog platform out.
Normally I would post this to Google Plus if it was not shutting down. I’m used to that social network and I like to post things as they are throw-away snippets and so this is not going to be a well formatted, structured or directly useful post like IT bloggers usually do.
I read news from RSS. Hundreds of feeds have thousands of new posts every day so in 2015 I made a tool that I now use to filter those feeds, keeping only the most important entries. The tool itself is not the point of this article. You may look at the example of the resulting feed for /r/programming or read a little relevant discussion at /r/rss. I’ve just finally visualized current articles “queue" to check that things are working as I imagine and that data is not messed up.

Here we see the posts — X-axis is post age in seconds and Y-axis is a score. Red posts are the main candidates to be chosen by my ranking system. The plot is fine and expected. The /r/ruby data looks similar.

When my filtering tool does the ranking in a multireddit it keeps in mind the specific subreddit the post was published in. This makes it smarter than the default Reddit algorithm. We can see that the three most upper left posts (Reddit would show them at the top of the multireddit view page) are not even red, just yellow.

This is a Retweet count metric of a single Twitter account. They are not spread uniformly like scores in Reddit. Either it is because Twitter does a good job by not trying to show subscribers only those tweets that are already popular or this is a botnet — I’m not sure because I’m subscribed only to one Twitter account and so have nothing to compare with.

The number of Likes has a strong correlation with the number of Retweets. Y=0 are retweets — you probably can’t Like the retweet itself. It is fine and good for ranking. I’m not sure how one retweet has got a 1 Like. I can’t easily find the exact entry in my database because I use array type fields in the Google Cloud Datastore and GQL can’t query by them.

The last one shows what happens to data when there were days when the program could not read the source. Usually, it is sent to my email from the Stackdriver Error Reporting but I don’t have time to fix hobby projects immediately so some days of data are lost.
If we had some metrics calculated incorrectly, for example being equal to 0, we would see these circles all at the Y=0 but no plots had such issue so everything is fine.
From what I see here on, you can’t delete the image once you have uploaded it. Also, it has no RSS feeds per tag.
If you have questions, pieces of advice or if you have spotted some huge English grammar mistakes I do, feel free to comment or maybe send me a message if has such a feature.