NoSQL databases arose in response to the limitations of using SQL (Structured Query Language) for database queries. NoSQL databases store and manage data in ways that enable high operational speed and a level of flexibility not found in traditional relational database management systems (RDBMSs).
A recent report by Allied Market Research notes the demand for NoSQL databases is on the rise. In 2022, the worldwide NoSQL market generated $7.3 billion in sales, and is estimated to generate $86.3 billion by 2032—a compound annual growth rate of 28 percent for that period. Key factors driving global NoSQL market growth, according to the report, are the exploding demand for big data analytics, a need for more scalable and flexible enterprise database solutions, and the ubiquity of cloud computing platforms and technology.
If your enterprise is considering migrating to NoSQL, you may wonder how to choose the best NoSQL database for your data storage needs. With more than two dozen open source and commercial NoSQL databases available, you have plenty of options to choose from.
This article presents five questions to help guide your NoSQL database buying decision. See the end of the article for an overview of the leading NoSQL databases on the market today.
5 questions to ask before choosing a NoSQL database
- Is NoSQL the right choice?
- Which NoSQL data model do we need?
- What is the latency requirement?
- How important are scalability and data consistency?
- How do we want to deploy it?
Is NoSQL the right choice?
Before choosing a NoSQL database, it’s important to be certain that NoSQL is the best choice for your needs. Carl Olofson, research vice president at International Data Corp. (IDC), says “back office transaction processing, high-touch interactive application data management, and streaming data capture” are all good reasons for choosing NoSQL.
Even with these needs in mind, it is important to rule out the possibility that NoSQL is not the right fit for your enterprise, especially because there are tradeoffs to choosing NoSQL over a traditional RDBMS. “The first decision you need to make is why do you need a NoSQL database system,” says Craig Mullins, president and principal consultant at Mullins Consulting. “You need to first understand why an existing relational DBMS cannot fulfill your use case. Relational/SQL database systems are widely installed and most organizations have existing systems and applications deployed on RDBMS with skilled technicians to manage them.”
An alternative to replacing the RDBMS, says Mullins, is polyglot persistence—employing multiple data storage technologies within a single system so as to meet different data storage needs. Rather than “force-fitting everything into a relational mindset,” polyglot persistence lets developers and administrators “choose the appropriate data technology for each use case,” he says.
NoSQL’s core strength is likely its decentralized, scalable, fault-tolerant design, Mullins says. “Most NoSQL database technology is implemented to scale and survive outages,” he says. “Additionally, most NoSQL options are lightweight and require less overhead than a relational DBMS, in terms of CPU and support.”
Which NoSQL data model do we need?
The four main types of NoSQL data models are key-value, document, column store, and graph. Each one fits a different use case. Mullins summarized the strengths of each type as follows:
- A key-value database is designed to be good for the high-availability, low-latency requirements of applications such as retail and mobile.
- A document database is best suited for event logging, online shopping, content management, and in-depth analytical processing.
- A column store database is good for event logging, content management, and counting and/or categorizing for analytics. Column stores can also be set up to automatically expire data.
- A graph database is well-suited for applications where data elements are interconnected and the number of relationships between them is undetermined. Examples in this use case include social media networks, recommendation engines, logistics and routing, location-aware systems, public transportation links, and network topologies.
“Choosing the right model is essential,” says Noel Yuhanna, vice president and principal analyst at Forrester Research. “The document model is the most popular, including the ability to store JSON documents optimally. The graph model focuses on interconnected data, while the key-value model focuses on a simple key-value pair retrieval, which is not as widely used.”
What data will be stored and how it will be accessed are essential in deciding which data model to choose, Yuhanna says. “Also, some vendor products support all models, which is the multi-model database, offering the flexibility of having multiple models.”
What is the latency requirement?
Is the latency requirement millisecond, subsecond, seconds, minutes, or more?
“If the latency requirement is extremely small, as for a streaming data capture or real-time data-sharing application, one should look at a key-value store,” Olofson says. “Likewise if the data is a simple list or matrix.”
If the data is highly changeable in form and includes defined fields, a JSON document database might be more appropriate, Olofson says. This is also true for a high-touch interactive application, which is typically changed frequently to adjust for shifting requirements of the application and user.
“If the latency requirement is not so great and complex combinations must be supported, including bill-of-materials structures or complex groups of interrelated data, then one might consider a graph DBMS,” Olofson says.
How important are scalability and data consistency?
NoSQL databases can break down data into segments—or shards—which can be useful for large deployments running hundreds of terabytes, Yuhanna says.
“Sharding is an essential capability for NoSQL to scale databases,” Yuhanna says. “Customers often look for NoSQL solutions that can automatically expand and shrink nodes in horizontally scaled clusters, allowing applications to scale dynamically.”
Unlike relational databases, which focus on ensuring data consistency for every transaction using ACID compliance, with NoSQL, “you can choose data consistency to be eventually consistent or even relaxed,” Yuhanna says. “With eventual consistency, you can scale quickly and deliver high performance.”
How do we want to deploy it?
Some NoSQL databases can run on-premises, some only in the cloud, while others in a hybrid cloud environment, Yuhanna says.
“Also, some NoSQL has native integration with cloud architectures, such as running on serverless and Kubernetes environments,” Yuhanna says. “We have seen serverless as an essential factor for customers, especially those who want to deliver good performance and scale for their applications, but also want to simplify infrastructure management through automation.”
The leading NoSQL databases
Asking yourself and your organization the five questions introduced here will help you choose the right NoSQL database for your needs. Now, let’s look at some of the leading NoSQL databases on the market today.
Aerospike
Aerospike is an open source distributed, real-time, high-performance NoSQL database designed for applications that cannot tolerate downtime and need high read and write throughput.
Aerospike is a multi-model NoSQL and graph database that supports simultaneous data models, has unlimited scale, and enables organizations to act in real-time across billions of transactions. According to the product documentation, Aerospike uses massive parallelism and a unified storage model to ensure the smallest possible server footprint.
The platform ingests and acts on streaming data at the edge and can combine edge data with data from systems of record, third-party sources, data warehouses, or data lakes for operational, transactional, or analytical workloads. Aerospike can run on premises or as a cloud-managed service.
AWS DynamoDB
Amazon DynamoDB is a serverless, NoSQL, fully managed database service that provides single-digit millisecond response times at any scale. A strong selling point of this database is that it enables organizations to develop and run applications while only paying for what they use.
This cloud-based service offers encryption at rest to protect sensitive data. It also enables users to create database tables that can store and retrieve any amount of data and serve any level of request traffic. Users can scale a table’s throughput capacity up or down without downtime or performance degradation, according to AWS. Developers and admins can use the AWS Management Console to monitor resource utilization and performance metrics.
DynamoDB also provides on-demand backup capability, allowing users to create full backups of tables for long-term retention and for regulatory compliance needs.
Couchbase
Couchbase Server, distributed by Couchbase Inc., is a multi-model JSON document support database platform. It’s an open source NoSQL key-value and document database with built-in cache. It’s suitable for enterprises that need a database that can deliver performance, multi-model, scale, and automation.
Organizations use the platform to support social media and mobile applications, content and metadata stores, e-commerce transactions, and other applications. It provides full support for documents, flexible data model, indexing, full-text search, and MapReduce for real-time analytics.
DataStax
DataStax Astra DB is a fully managed, cloud-native, database-as-a-service built on Apache Cassandra. It scales dynamically and accelerates application development via a range of APIs and programming language options, so developers can build real-time applications fast and scale them without limits, according to the company.
Developers can readily ensure data security with Astra DB’s built-in security mechanisms such as Private Link, IP access controls, single sign-on, application tokens, and data encryption. Astra DB’s serverless architecture (built on microservices and API-first principles) scales automatically based on demand.
Google BigTable
Bigtable from Google is an enterprise-grade NoSQL database service with low single-digit millisecond latency, limitless scale, and 99.999% availability, according to the company. It supports multi-tenant, mixed operational, and real-time analytical workloads.
Google says Bigtable is a key-value and wide-column store, ideal for fast access to structured, semi-structured, or unstructured data. Latency-sensitive workloads such as personalization are also a good fit for the platform. Bigtable automatically scales resources to adapt to server traffic, handling the associated sharding, replication, and query processing as needed.
MarkLogic
MarkLogic Server is a multi-model database that combines document, semantic graph, geospatial, and relational models into a single, scalable, operational database, according to MarkLogic. It provides native storage for JSON, XML, text, RDF triples, geospatial, and binaries, with unified search-and-query interface capabilities.
The database has a search engine built into its core, providing a single platform to load data from silos and search across all the data. As such, it does not require a bolt-on search engine for full-text search. MarkLogic Server also offers enterprise data security controls such as data loss prevention.
Microsoft Azure Cosmos
Azure Cosmos DB is a Microsoft Azure database service that supports multiple NoSQL models and a variety of data formats including JSON and binary data. Microsoft says the database is also fully managed, with Microsoft Azure handling all the underlying infrastructure so that developers can focus on their applications and data.
Azure Cosmos DB offers security tools such as data encryption and data access controls. It features automatic and instant scalability, and open source APIs for MongoDB, Cassandra, and other NoSQL engines.
MongoDB
MongoDB, maintained by MongoDB Inc. and published under a combination of the Gnu Affero General Public License and the Apache License, is a free and open source, cross-platform, document-oriented database.
It uses JSON-like documents with schemas, and incorporates operational best practices learned from optimizing thousands of deployments at organizations of all sizes. The cloud-based offering can handle database management, setup and configuration, software patching, monitoring, and backups. It operates as a distributed database cluster. Key features and capabilities include fully managed backup, point-in-time recovery, a real-time performance panel, and customizable alerting.
Redis
Redis Enterprise, sponsored by Redis Labs, is an open source, key-value NoSQL in-memory database that supports both relaxed and strong consistency, a flexible schema-less model, high availability, and ease of deployment.
The platform supports key-value; a variety of data structures such as lists, sets, bitmaps, and hashes; and a variety of models through pluggable modules such as search, graph, JSON, and XML. Redis Enterprise includes a real-time indexing, querying, and full-text search engine available on-premises and as a managed service in the cloud.
Copyright © 2024 IDG Communications, Inc.