RTR IR4 Blog
  • About
  • Professional Summary
  • Style Guide

Cassandra

Cassandra

Setup Multi-node Cassandra cluster on a single machine We will create three node Cassandra cluster on single local machine. Cluster will have 2 Nodes in one datacenter and 3rd node on another datacenter. For partitioner...

11 Mar 2020

Hue Oozie Hdfs

Hue Oozie Hdfs

Hue provides UI for main hadoop features. Oozie Workflow Designer Launch and Monitor Oozie Workflow , Schedules ,Bundles Hive Query interface Pig Query interface HDFS File Browser Hadoop Shell Access Notebooks with Hive, Pig ,...

09 Mar 2020

Flink Streaming

Flink Streaming

Flink is a stream processing engine. Flink can run on standalone cluster, or on top of YARN or Mesos Flink is scalable (1000’s of nodes),Fault-tolerant survive failures while still guaranteeing exactly-once processing Flink has good...

21 Feb 2020

Presto Hive Cassandra

Presto Hive Cassandra

Presto can connect to many different “big data” databases and data stores at once, and query across them using SQL syntax. Presto is Optimized for OLAP – analytical queries, data warehousing. Presto exposes JDBC, Command-Line,...

12 Feb 2020

Spark PostGres Cassandra MongoDb

Spark PostGres Cassandra MongoDb

Integrate spark with Postgres, Cassandra,MongoDb Explore how to write spark streaming data into databases. You cannot write streams into these databases .We can write batches of data using foreachbatch MongoDB stores JSON documents in collections...

03 Feb 2020

Twitter Custom Spark Receiver

Twitter Custom Spark Receiver

Create your twitter feed filtering the handles you follow and also filtering out retweets by those handles using custom Spark receiver. Create Twitter Developer Account from https://developer.twitter.com/apps Get the credentials Create app then Keys and...

11 Jan 2020

Spark Partitioning

Spark Partitioning

Spark isn’t totally magic – you need to think about how your data is partitioned Operations that are expensive, and Spark won’t distribute data on its own. Use .partitionBy() on an RDD before running a...

01 Jan 2020

Kafka Cluster

Kafka Cluster

Kafka is simply a collection of topics split into one or more partitions. A Kafka partition is a linearly ordered sequence of messages, where each message is identified by their index (offset). Kafka broker leader...

03 Dec 2019

Understanding Consensus algorithms in Distributed Systems

Understanding Consensus algorithms in Distributed Systems

Consensus Algorithms Consensus algorithms allow a collection of machines to work as a coherent group that can survive the failures of some of its members. Collection of computers and want them all to agree on...

11 Nov 2019

Kafka Zookeeper Cluster Leader Election

Kafka Zookeeper Cluster Leader Election

This article will configure kafka zookeeper cluster and explore the leader election process. We will configure a hyperledger blockchain using kafka cluster to understand this topic. In this deployment We will standup Multi Node Apache...

02 Oct 2019
Previous
Next