博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
kafka 0.11 spark 2.11 streaming例子
阅读量:6070 次
发布时间:2019-06-20

本文共 3947 字,大约阅读时间需要 13 分钟。

 
""" Counts words in UTF8 encoded, '\n' delimited text received from the network every second. Usage: kafka_wordcount.py 
To run this on your local machine, you need to setup Kafka and create a producer first, see http://kafka.apache.org/documentation.html#quickstart and then run the example `$ bin/spark-submit --jars \ external/kafka-assembly/target/scala-*/spark-streaming-kafka-assembly-*.jar \ examples/src/main/python/streaming/kafka_wordcount.py \ localhost:2181 test`"""from __future__ import print_functionimport sysfrom pyspark import SparkContextfrom pyspark.streaming import StreamingContextfrom pyspark.streaming.kafka import KafkaUtilsif __name__ == "__main__": if len(sys.argv) != 3: print("Usage: kafka_wordcount.py
", file=sys.stderr) exit(-1) sc = SparkContext(appName="PythonStreamingKafkaWordCount") ssc = StreamingContext(sc, 1) zkQuorum, topic = sys.argv[1:] kvs = KafkaUtils.createStream(ssc, zkQuorum, "spark-streaming-consumer", {topic: 1}) lines = kvs.map(lambda x: x[1]) counts = lines.flatMap(lambda line: line.split(" ")) \ .map(lambda word: (word, 1)) \ .reduceByKey(lambda a, b: a+b) counts.pprint() ssc.start() ssc.awaitTermination()

 

/spark-kafka/spark-2.1.1-bin-hadoop2.6# ./bin/spark-submit --jars ~/spark-streaming-kafka-0-8-assembly_2.11-2.2.0.jar examples/src/main/python/streaming/kafka_wordcount.py localhost:2181 test

 其中:spark-streaming-kafka-0-8-assembly_2.11-2.2.0.jar在 http://search.maven.org/#search%7Cga%7C1%7Cspark-streaming-kafka-0-8-assembly 下载

kafka 使用0.11版本:

This tutorial assumes you are starting fresh and have no existing Kafka or ZooKeeper data. Since Kafka console scripts are different for Unix-based and Windows platforms, on Windows platforms use bin\windows\ instead of bin/, and change the script extension to .bat.

 the 0.11.0.0 release and un-tar it.
1
2
>
tar 
-xzf kafka_2.11-0.11.0.0.tgz
>
cd 
kafka_2.11-0.11.0.0

Kafka uses  so you need to first start a ZooKeeper server if you don't already have one. You can use the convenience script packaged with kafka to get a quick-and-dirty single-node ZooKeeper instance.

1
2
3
> bin
/zookeeper-server-start
.sh config
/zookeeper
.properties
[2013-04-22 15:01:37,495] INFO Reading configuration from: config
/zookeeper
.properties (org.apache.zookeeper.server.quorum.QuorumPeerConfig)
...

Now start the Kafka server:

1
2
3
4
> bin
/kafka-server-start
.sh config
/server
.properties
[2013-04-22 15:01:47,028] INFO Verifying properties (kafka.utils.VerifiableProperties)
[2013-04-22 15:01:47,051] INFO Property socket.send.buffer.bytes is overridden to 1048576 (kafka.utils.VerifiableProperties)
...

Let's create a topic named "test" with a single partition and only one replica:

1
> bin
/kafka-topics
.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic
test

We can now see that topic if we run the list topic command:

1
2
> bin
/kafka-topics
.sh --list --zookeeper localhost:2181
test

Alternatively, instead of manually creating topics you can also configure your brokers to auto-create topics when a non-existent topic is published to.

Kafka comes with a command line client that will take input from a file or from standard input and send it out as messages to the Kafka cluster. By default, each line will be sent as a separate message.

Run the producer and then type a few messages into the console to send to the server.

1
2
3
> bin
/kafka-console-producer
.sh --broker-list localhost:9092 --topic
test
This is a message
This is another message

Kafka also has a command line consumer that will dump out messages to standard output.

1
2
3
> bin
/kafka-console-consumer
.sh --bootstrap-server localhost:9092 --topic
test 
--from-beginning
This is a message
This is another message
本文转自张昺华-sky博客园博客,原文链接:http://www.cnblogs.com/bonelee/p/7435506.html
,如需转载请自行联系原作者
你可能感兴趣的文章
android实用测试方法之Monkey与MonkeyRunner
查看>>
「翻译」逐步替换Sass
查看>>
H5实现全屏与F11全屏
查看>>
处理excel表的列
查看>>
枸杞子也能控制脂肪肝
查看>>
Excuse me?这个前端面试在搞事!
查看>>
C#数据采集类
查看>>
quicksort
查看>>
检验函数运行时间
查看>>
【BZOJ2019】nim
查看>>
Oracle临时表空间满了的解决办法
查看>>
四部曲
查看>>
LINUX内核调试过程
查看>>
【HDOJ】3553 Just a String
查看>>
Java 集合深入理解(7):ArrayList
查看>>
2019年春季学期第四周作业
查看>>
linux环境配置
查看>>
ASP.NET MVC中从前台页面视图(View)传递数据到后台控制器(Controller)方式
查看>>
lintcode:next permutation下一个排列
查看>>
python 递归
查看>>