Spark Streaming Tutorial Hosted Spark

Deploy Spark Streaming in 5 Minutes

A quick-start guide to deploying Apache Spark Structured Streaming with hosted Spark. Get real-time data processing running fast.

StreamPark Team

Apache Spark’s Structured Streaming brings the power of Spark’s unified engine to real-time data processing. With hosted Spark, you can skip the cluster management and deploy streaming applications in minutes.

What is Spark Structured Streaming?

Structured Streaming is Spark’s stream processing engine built on the Spark SQL engine. It lets you express streaming computations the same way you’d express batch computations on static data.

Key features:

  • Unified batch and streaming — Same API for both
  • Exactly-once semantics — Reliable processing guarantees
  • Event-time processing — Handle late and out-of-order data
  • SQL support — Query streams with SQL

Quick Start: Your First Streaming Job

Let’s deploy a Spark Streaming application that processes real-time events.

Step 1: Log In to StreamPark

Access your dashboard at app.streampark.space.

Step 2: Create a Streaming Application

Navigate to ApplicationsAdd Application and configure:

  • Name: event-processor
  • Execution Mode: Apache Spark
  • Application Type: Spark SQL or JAR

Step 3: Write Your Streaming Logic

For simple transformations, use Spark SQL:

CREATE TABLE events (
    event_id STRING,
    user_id STRING,
    event_type STRING,
    timestamp TIMESTAMP,
    properties MAP<STRING, STRING>
) USING kafka
OPTIONS (
    'kafka.bootstrap.servers' = 'kafka:9092',
    'subscribe' = 'events',
    'startingOffsets' = 'latest'
);

CREATE TABLE event_counts (
    window_start TIMESTAMP,
    window_end TIMESTAMP,
    event_type STRING,
    count BIGINT
) USING console;

INSERT INTO event_counts
SELECT
    window.start as window_start,
    window.end as window_end,
    event_type,
    count(*) as count
FROM events
GROUP BY
    window(timestamp, '1 minute'),
    event_type;

Step 4: Configure Resources

ParameterDevelopmentProduction
Executors1-23-10+
Executor Memory2G4-8G
Executor Cores1-22-4
Driver Memory1G2-4G

Step 5: Deploy

Click Submit and StreamPark will:

  1. Package your application
  2. Provision Spark resources
  3. Start your streaming job
  4. Begin processing data

Monitoring Your Stream

StreamPark provides visibility into your running application:

  • Input Rate — Events per second consumed
  • Processing Rate — Events per second processed
  • Batch Duration — Time to process each micro-batch
  • Scheduling Delay — Queue time before processing

Why Hosted Spark?

Running Spark yourself requires cluster management, resource allocation, dependency management, monitoring setup, and security configuration.

With hosted Spark, you skip the infrastructure and focus on data processing.

When to Choose Spark Streaming

Use Spark Structured Streaming when:

  • You have existing Spark expertise or batch Spark jobs
  • You want unified batch and streaming processing
  • Your use case fits micro-batch latencies (seconds, not milliseconds)

Consider Flink when you need true streaming with sub-second latency.

StreamPark supports both — use the right engine for each job.

Next Steps

You’ve deployed your first Spark streaming job. Here’s what to explore next:

  1. Connect to Kafka — Process real event streams
  2. Add checkpointing — Enable fault tolerance
  3. Tune performance — Optimize throughput and latency

Ready to deploy streaming at scale? Start your free trial and have Spark running in minutes.

S

StreamPark Team

Building the future of stream processing