Deploy Spark Streaming in 5 Minutes
A quick-start guide to deploying Apache Spark Structured Streaming with hosted Spark. Get real-time data processing running fast.
Apache Spark’s Structured Streaming brings the power of Spark’s unified engine to real-time data processing. With hosted Spark, you can skip the cluster management and deploy streaming applications in minutes.
What is Spark Structured Streaming?
Structured Streaming is Spark’s stream processing engine built on the Spark SQL engine. It lets you express streaming computations the same way you’d express batch computations on static data.
Key features:
- Unified batch and streaming — Same API for both
- Exactly-once semantics — Reliable processing guarantees
- Event-time processing — Handle late and out-of-order data
- SQL support — Query streams with SQL
Quick Start: Your First Streaming Job
Let’s deploy a Spark Streaming application that processes real-time events.
Step 1: Log In to StreamPark
Access your dashboard at app.streampark.space.
Step 2: Create a Streaming Application
Navigate to Applications → Add Application and configure:
- Name:
event-processor - Execution Mode: Apache Spark
- Application Type: Spark SQL or JAR
Step 3: Write Your Streaming Logic
For simple transformations, use Spark SQL:
CREATE TABLE events (
event_id STRING,
user_id STRING,
event_type STRING,
timestamp TIMESTAMP,
properties MAP<STRING, STRING>
) USING kafka
OPTIONS (
'kafka.bootstrap.servers' = 'kafka:9092',
'subscribe' = 'events',
'startingOffsets' = 'latest'
);
CREATE TABLE event_counts (
window_start TIMESTAMP,
window_end TIMESTAMP,
event_type STRING,
count BIGINT
) USING console;
INSERT INTO event_counts
SELECT
window.start as window_start,
window.end as window_end,
event_type,
count(*) as count
FROM events
GROUP BY
window(timestamp, '1 minute'),
event_type;
Step 4: Configure Resources
| Parameter | Development | Production |
|---|---|---|
| Executors | 1-2 | 3-10+ |
| Executor Memory | 2G | 4-8G |
| Executor Cores | 1-2 | 2-4 |
| Driver Memory | 1G | 2-4G |
Step 5: Deploy
Click Submit and StreamPark will:
- Package your application
- Provision Spark resources
- Start your streaming job
- Begin processing data
Monitoring Your Stream
StreamPark provides visibility into your running application:
- Input Rate — Events per second consumed
- Processing Rate — Events per second processed
- Batch Duration — Time to process each micro-batch
- Scheduling Delay — Queue time before processing
Why Hosted Spark?
Running Spark yourself requires cluster management, resource allocation, dependency management, monitoring setup, and security configuration.
With hosted Spark, you skip the infrastructure and focus on data processing.
When to Choose Spark Streaming
Use Spark Structured Streaming when:
- You have existing Spark expertise or batch Spark jobs
- You want unified batch and streaming processing
- Your use case fits micro-batch latencies (seconds, not milliseconds)
Consider Flink when you need true streaming with sub-second latency.
StreamPark supports both — use the right engine for each job.
Next Steps
You’ve deployed your first Spark streaming job. Here’s what to explore next:
- Connect to Kafka — Process real event streams
- Add checkpointing — Enable fault tolerance
- Tune performance — Optimize throughput and latency
Ready to deploy streaming at scale? Start your free trial and have Spark running in minutes.
StreamPark Team
Building the future of stream processing