Cassandra Interview Questions and Answers

cassandra interview questions and answers

Share This Post

Best Cassandra Interview Questions and Answers

Cassandra is now highly preferred by most of the companies and so there is a huge demand for Cassandra certified professionals. As the need for Cassandra professionals are increasing day by day, they are paid highly too. This is a one-stop resource for all the aspirants who are preparing for Cassandra interview. Here, we have compiled a list of top 50 Cassandra interview questions and answers in guidance with the top recruiters in the industry. We have almost covered all the important topics of Cassandra that are frequently asked by the interviewers. So, just go through all these top 50 Cassandra interview questions and answers to brush up your Cassandra intellect before you attend an interview.

These top 50 Cassandra interview questions and answers will assist you in cracking the interviews more confidently. Both freshers and experienced Cassandra professionals can make use of these interview questions to upgrade their career. So, if you are an expert looking a career in the fields of Cassandra, then just make use of our top 50 Cassandra interview questions and answers and achieve your career goal. We wish you all success in your future career.

Cassandra is simply an open-source and free to use distributed NoSQL database management system by Apache which is highly utilized to handle and maintain huge volumes of data without any single point of failure. This big data model ensuring high availability and scalability was generally developed by Facebook. Among the several different NoSQL database management system, Cassandra comes under key-value store and column-oriented database.

MongoDB

Cassandra

It ensures a document like data model

It supports a data model like Google Bigtable

Multi-indexed is used to query a data

Data querying can be made using scan or key

Cassandra

Traditional RDBMS

It is masterless and therefore there is no single point of failure in Cassandra

It comes under master-slave core architecture and there might be a point of failure

It is highly available

It generally replicates with master-slave

It can hold dynamic, structured or unstructured data

It comprises of structured data and legacy RDBMS

Cassandra has several important features while some are listed below:

  • Efficient writes
  • Elastic scalability
  • High availability and fault tolerant
  • Easy data distribution
  • Flexible data storage
  • Tunable consistency
  • Cassandra query language

The different types of data models present in Cassandra are as follows:

  • Conceptual data model
  • Logical data model
  • Physical data model

Listed below are the different database elements found in Cassandra:

  • Cluster
  • Keyspace
  • Column family
  • CQL (Cassandra Query Language) Table

Windows and Linux are the two operating systems that are being supported by Cassandra.

Cluster is the outermost structure of Caasandra which acts as a container of keyspaces and can also be denoted as a ring and the main reason behind this is that Cassandra actually assigns particular data to the nodes present in the cluster by assembling them in the form of a ring which contains different types of replication of data.

The outermost container of data present in Cassandra is known as the keyspace which is a collection of column families. A keyspace is identical to a relational database which consists of a name and a group of attributes that denotes the keyspace-wide behavior.

The key parameters that are used in developing a keyspace in Cassandra are as follows:

  • Keyspace Name
  • Replication Strategy
  • Replication Factor &
  • Durable Writes

Looking for Best Cassandra Hands-On Training?

Get Cassandra Practical Assignments and Real time projects

CREATE KEYSPACE <identifier> WITH <properties> is the syntax used to create a keyspace in Cassandra.

Yes, we can either add or remove a column family in a cluster but look to that all the below listed conditions are met before you do so:

  • The commitlog should be cleared completely with the nodetool drain
  • Check whether commitlog is completely free from data by turning off the Cassandra
  • The SStables across the removed column families should be deleted for certain

get_range_slices is the command used to iterate all the rows found in a column family. The iteration can be initially started with an empty string, at the end of each iteration; the last key read will act as a start key for the following iteration.

Durable writes offers commands to Cassandra either to make use of commitlog or not in order to update the recent keyspace. The default value of durable writes will always be TRUE and this is not mandatory, it can be changed too.

If you are in need of changing the replica counts or to alter the durable writes property of the keyspace, we can make use of the ALTER KEYSPACE command.

This is one of the most awaited characteristics that have made Cassandra an essential preference by most of the Analysts, Developers and Big Data Architects. The main function of “Tunable consistency” is that is synchronizes and keeps all the data rows on their respective replicas up to date. This consistency level can be opted by the users as per their preference and requirement. There are two types of consistencies present in Cassandra namely – Strong consistency and Eventual consistency.

The “Capture” command in Cassandra is used to capture the output data and then affixes it to a particular file while the “Consistency” command in Cassandra either displays the current consistency level or it can be used to set a new preferred consistency level.

To write a query in Cassandra, we can make use of CQL which is known as the Cassandra Query Language. In order to effectively interact with the database as and when required, Cassandra makes use of CQLSH.

One of the Cassandra query languages that are highly used by the users to effectively communicate with the database is known as CQLSH. The main functionalities of CQLSH are as follows:

  • It can define any specific scheme
  • It can be used to insert a data
  • It can be effectively used to execute any particular query

In order to view all the tables that consist of data even from the keyspace, we can make use of the drop table command in CQLSH.

Become Cassandra Certified Expert in 35 Hours

Get Cassandra Practical Assignments and Real time projects

In order to truncate a table and then to permanently delete all the rows of the table, we can make use of the truncate table command in CQLSH.

One of the main configuration files of Cassandra is YAML file. If you have done any changes to the cassandra.yaml file consider rebooting the nodes such that the changes are updated.

Cassandra contains copies in the form of replicas of each of the rows in association with the row key. The term “replication factor” denotes the number of nodes that represents the copies or replicas of all the rows of data.

Yes, we can change the replication factor of a live cluster but it is necessary to run a repair in order to modify the existing data’s replica count.

The strategy that indicates how the replica will be assembled in the ring is known as the replication strategy. Cassandra comes with several types of replication strategies that determine which node will be provided with which set of copies of which specific key. The types of replication strategies are as follows:

  • Simple strategy
  • Network topology strategy

Simple strategy makes use of the simple single datacenter cluster which arranges the foremost replica in the node that was identified by the partitioner. The left over replicas will be assigned to the further nodes in the form of a ring in a clockwise manner without taking into account the datacenter location and the rack.

Network topology strategy is considered while deploying a cluster in association with several datacenters. This is the initial prerequisite to affix a replica. It can achieve the requirement without causing datacenter latency and it can also be used to handle failures.

A node is nothing but a single machine that can be used to run Cassandra while a cluster is a collection of nodes that consists of identical groups of data. We can group several nodes of cluster into various datacenters which can be used to serve clients located in wide geological areas.

The collection of or a group of sorted columns are known as a row in Cassandra which is the smallest unit that consists of the relevant data. All the components of a row can hold either a data or a metadata. And, the key elements of a row present in Cassandra are as follows:

  • Row key
  • Column keys
  • Column values

A column family is nothing but a collection of rows in an ordered manner which can itself be denoted as an ordered collection of columns too. As per our needs, we can add any number of columns at any time to the column family.

Become a master in Cassandra Course

Get Cassandra Practical Assignments and Real time projects

The values that can be stored in the Cassandra column are as follows:

  • Column Name
  • Value
  • Time Stamp

A column that is uniquely used to determine a row is known as the primary key and the three different types of primary keys are as follows:

  • Single primary key
  • Compound primary key
  • Composite partitioning key

The simple definitions of the different types of primary keys are illustrated below:

  • The single primary key has only a single column which can be defined as a primary key.
  • The data will be initially partitioned and will be later clustered in a compound primary key.
  • In order to develop several partitions to a particular data, this composite partitioning key can be used.

Partition is nothing but a hash function used in Cassandra which is located on every node. It actually hashes the tokens that are represented from particular values of the rows that are being affixed. A partition can also be used to convert a variable input length into a fixed length.

Partitioners in Cassandra are of different types and they are as follows:

  • Murmur3 Partitioner
  • Random Partitioner
  • Byte Ordered Partitioner

A static table is actually identical to a relational database table that makes use of a static set of column names while a dynamic CQL table grants permission to the users to pre-compute the result sets and then stores those sets in a single row in order to make it simpler to retrieve data as and when required.

Primary keys that consist of one or more table columns are highly important and are mandatory while creating a table in Cassandra.

Check to that all the below mentioned points are met while creating a new column in Cassandra:

  • The name of the new column name should not be similar to any other existing column names
  • Check to that the table is not limited to any particular compact option

Listed below are some of the ways with which Cassandra can write a data at ease:

  • Commitlog write
  • Memtable write
  • SStable write

A crash recovery technique that supports Cassandra in achieving its durability goals are known as commit log.

Looking for Cassandra Hands-On Training?

Get Cassandra Practical Assignments and Real time projects

SStable can also be defined a “Sorted String Table” which is one of the Cassandra data files whose main ativity is to store the data that has been flushed by the memtable. SStable is different from memtable as it will not delete any data or it will not add any further information once the data is written.

SStable in Cassandra is composed of the following:

  • Index file which includes Bloom filter and key offset pairs
  • Data file which includes the actual data of the column

An off-heap data structure that is associated with an SStable to detect the availability of data in the SStable in order to perform specific I/O disk operations is the main functionality of the bloom filter in Cassandra.

Memtable acts as a storage engine in Cassandra that holds the data written temporarily. It stores the data in the form of a key or a column. Each memtable has a separate column family and it extracts column data from any particular key that are specified.

A write request when reached the node, the following operations will be done:

  • The request first enters the commit log where the data will be collected and saved in to the memtable
  • When the memtable storage is full, it actually flushes the data into the SStable
  • The writes in Cassandra will be partitioned automatically and it will be replicated in the clusters. Cassandra frequently consolidates and discards the irrelevant data from the SStables.

The one that denotes to which rack or datacenter the specific node belongs to is known as the snitch.

Snitch is of different types and some of them are listed below:

  • Dynamic snitching
  • SimpleSnitch
  • RackInferringSnitch
  • Ec2Snitch
  • PropertyFileSnitch
  • GossipingPropertyFile
  • Ec2MultiRegionSnitch
  • GoogleCloudSnitch
  • CloudstackSnitch

No, Cassandra does not support ACID transactions while relational database does it.

Both column and super column are executed with the tuple concept that consists of both the names and the values but column has values in the form of string while super column is actually a map of columns that consists of several data types.

In Cassandra, the main aim of the source command is to run a file that consists of CQL statements.

🚀Fill Up & Get Free Quote