The world is full of information. Databases can store all sorts of data – users’ personal info, collections of pictures, common log entries, music playlists. For data to be easily readable, usable, and accessible we use database management systems (DBMS). Those programs do not only help with reading data but also managing it in different ways – change it to another form, link data together, sort it in a way you want, and finally analyze the information stored.
But before we begin the overview, let’s go over two major groups of databases – relational and non-relational. So how do they differ? The answer can be given with two single words – data structure.
Firstly, we have relational databases. These databases have their data structured in terms of rows which are grouped into tables (also called relations), and they use SQL (Structured Query Language) to interact with data, and that’s where the term “SQL databases” came from. The purpose of relational databases is to manage information organized in tables and to provide concepts of how those tables relate to one another.
In opposition to relational databases, we have non-relational databases, which do not shape stored data into tables. Even though the term “NoSQL” is often used to describe these databases, it does not mean that they do not use SQL. Part “No” in “NoSQL” stands for “Not Only” and indicates that they may or may not support SLQ-like languages. NoSQL uses different data structures (key-value pairs, lists, graphs, documents), which sometimes are considered more flexible than tables.
For the last decade number of non-relational databases grown and the gap between NoSQL and SQL databases shortened thanks to popular NoSQL DBMS’ such as MongoDB and Redis. But even considering that, relational databases are still holding strong leadership and are being used in 20% more cases. According to the information of popular internet portals (Stackoverflow, Percona, and DB-engines), the big three of DBMS’ used by developers all over the world are PostgreSQL, MongoDB, and MySQL. So we’ll take a look at those and two other popular DBMS’ Redis and Microsoft SQL Server.
PostgreSQL is an open-source DBMS used mostly for web applications. Although it was one of the earliest databases developed, its support for JSON gives it the ability to manage not only structured data but non-tabular data as well. It can be done to work on different OS’, provides easy tools for receiving information from other DBMS’, provides the choice of a procedural language to use, can work with huge chunks of information measured in TBs (although the speed still may suffer while performing large queries), and can be hosted in both physical and virtual environments. It should be noted that the documentation for PostgreSQL is not that straightforward, so it can be tricky to obtain specific information.
MongoDB is a NoSQL database that uses documents with JSON-like structure. It was designed to manage non-relational models, however, can also work with relational ones (although it may cause problems with performance). It does a great job in situations where other DBMS’ have difficulties, and the reason for this is that it was originally purposed to work with non-shaped data. There is an in-depth list of languages it can be used with, that’s why it is really simple to use no matter which ones you are working with. Moreover, users will have no trouble finding information about what they want to do because of meticulous documentation that is actively updating with new releases. The big downside is the fact that MongoDB has problems with transactions. Even though the latest versions added support for transactions, they still do not work well and may potentially lead to data corruption.
MySQL is yet another open-source DBMS commonly used for web and mobile development that can provide a big amount of functions for users. It is deployable on many OS’, can be managed to work with other Oracle databases if needed, and also has different user interfaces to be used with. But there also are some downsides – support is not available for the free version users, some features that other DBMS’ do automatically can’t be solved quickly with MySQL (such as incremental backups and work with OLAP), and it does not support very large database sizes. The greater focus was made on security, reliability, and low resource usage than on a big amount of features.
Redis is not only an open-source key-value database, it is a data store, cache, and message broker. The main advantage of Redis database is caching – its data is stored in cash (temporary storage), so it can be referred to faster when needed. It supports almost every popular language and tons of data structures – from strings and sets to hashes and maps of all kinds. This DBMS is easy to set up and work, although it lacks a better UI. But of course, there are some downsides with the first one being the high scaling of resource consumption as your data grows. Not only that but it also does not support multi-threading, so it won’t benefit from multi-core CPUs usage.
Microsoft SQL Server (MSSQL) is one of the most popular relational DBMS’ used, and there are reasons for that. First of all, it is simply easy to use, is extremely reliable, and has multiple data recovery options. Providing data to other applications is the most common usage of MSSQL, and T-SQL query language helps with that. Being one of the most used DBMS’, it has clear documentation and provides great support for users. It also gives a possibility to work with in-memory databases, but there is a downside – once you enable this option, it cannot be turned off easily. Another downside is that MSSQL does not provide data visualization and analysis tools. Although it has good performance, it also has high requirements when it comes to hardware, especially if you use the latest versions of it.
Sometimes it is important to not only save the information but also to see how it was changing over time. That kind of information is called time series data which means data with an attached timestamp, and is often used to store measurements of sensors, which is common to IoT. And there are DBMS’ that were specifically developed to work with this kind of information – time series databases. There are multiple databases that grant the ability to manage time series data, with InfluxDB holding the lead, and followed by TimescaleDB and Prometheus.
InfluxDB is a free database written with GO language, that provides InfluxQL language to work with measurements (shape of data used). It has a unique protocol for data that contains the name of the measurement, sets of specific tags and fields, and, finally, the timestamp that can have different accuracy. What makes it one of the most useful tools to use for managing time series data is that it has no limitations for the number of fields and their types, which makes its usage extremely flexible.
TimescaleDB would be an amazing option for you if PostgreSQL is already being used in your project, as it was built based on it. It operates with time series data quicker than its ancestor and is not hard to work with, so it would be an obvious choice.
Here is a good example of using the TimescaleDB in architecture for the Internet of Things projects – SensorOcean.com, an IoT platform that is based on a massive amount of time-series data from IoT devices and sensors.
And if you’re looking for a DBMS that can present time-related information in form of various graphs – Prometheus would be a great choice, as it is closely related to Grafana visualizing application.