Flink Streaming

Flink Streaming
  • Flink is a stream processing engine. Flink can run on standalone cluster, or on top of YARN or Mesos
  • Flink is scalable (1000’s of nodes),Fault-tolerant survive failures while still guaranteeing exactly-once processing
  • Flink has good Scala support, similar to Spark Streaming
  • Flink can process data based on event times,windowing system
  • Flink supports real-time streaming

In this example we will create app to read from socket stream and write the word count using Flink.

To be able to run Flink, the only requirement is to have a working Java 8 or 11 installation.

Folowing is an example of Run flink socket streaming word count application from intellij IDE.

Scala Code
    // set up the execution environment
    val conf: Configuration = new Configuration()
    val env =  StreamExecutionEnvironment.createLocalEnvironmentWithWebUI(conf)
    env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)


 // get socket  data
    val text = env.socketTextStream("XXX.XXX.1.XXX",1900)

    val counts: DataStream[(String, Int)] = text
      .flatMap(_.toLowerCase.split("\\W+"))
      .map((_, 1))
      .keyBy(0)
      .sum(1)

    // print result
    counts.print()


    // lazy execution
    env.execute("Flink Streaming")

Flink WebUI Can be accessed from http://localhost:8081

Jobs Section shows job details with DAG

Task Manager Details

Results are printed to Console

Dependencies
  "org.apache.flink" %% "flink-scala" % flinkVersion ,
  "org.apache.flink" %% "flink-streaming-scala" % flinkVersion ,
  "org.apache.flink" %% "flink-streaming-java" % flinkVersion,
  "org.apache.flink" %% "flink-runtime-web" % flinkVersion