天道酬勤,学无止境

flume

channel lock error while configuring flume's multiple sources using FILE channels

问题 使用 FILE 通道为代理配置多个源并引发锁定错误。 下面是我的配置文件。 a1.sources = r1 r2 a1.sinks = k1 k2 a1.channels = c1 c3 #sources a1.sources.r1.type=netcat a1.sources.r1.bind=localhost a1.sources.r1.port=4444 a1.sources.r2.type=exec a1.sources.r2.command=tail -f /opt/gen_logs/logs/access.log #sinks a1.sinks.k1.type=hdfs a1.sinks.k1.hdfs.path=/flume201 a1.sinks.k1.hdfs.filePrefix=netcat- a1.sinks.k1.rollInterval=100 a1.sinks.k1.hdfs.fileType=DataStream a1.sinks.k1.hdfs.callTimeout=100000 a1.sinks.k2.type=hdfs a1.sinks.k2.hdfs.path=/flume202 a1.sinks.k2.hdfs.filePefix=execCommand- a1.sinks.k2.rollInterval=100 a1.sinks

2022-05-16 00:30:14    分类:技术分享    hadoop   flume   flume-ng

Create hive table error to load Twitter data

问题 我正在尝试创建外部表并尝试将 twitter 数据加载到表中。 创建表时出现以下错误,无法跟踪错误。 hive> ADD JAR /usr/local/hive/lib/hive-serdes-1.0-SNAPSHOT.jar > ; Added [/usr/local/hive/lib/hive-serdes-1.0-SNAPSHOT.jar] to class path Added resources: [/usr/local/hive/lib/hive-serdes-1.0-SNAPSHOT.jar] hive> CREATE EXTERNAL TABLE tweets ( > id BIGINT, > created_at STRING, > source STRING, > favorited BOOLEAN, > retweeted_status STRUCT< > text:STRING, > user:STRUCT<screen_name:STRING,name:STRING>, > retweet_count:INT>, > entities STRUCT< > urls:ARRAY<STRUCT<expanded_url:STRING>>, > user_mentions:ARRAY<STRUCT<screen_name:STRING,name

2022-05-12 15:06:11    分类:技术分享    hadoop   twitter   hive   flume   bigdata

channel lock error while configuring flume's multiple sources using FILE channels

Configuring multiple sources for an agent throwing me lock error using FILE channel. Below is my config file. a1.sources = r1 r2 a1.sinks = k1 k2 a1.channels = c1 c3 #sources a1.sources.r1.type=netcat a1.sources.r1.bind=localhost a1.sources.r1.port=4444 a1.sources.r2.type=exec a1.sources.r2.command=tail -f /opt/gen_logs/logs/access.log #sinks a1.sinks.k1.type=hdfs a1.sinks.k1.hdfs.path=/flume201 a1.sinks.k1.hdfs.filePrefix=netcat- a1.sinks.k1.rollInterval=100 a1.sinks.k1.hdfs.fileType=DataStream a1.sinks.k1.hdfs.callTimeout=100000 a1.sinks.k2.type=hdfs a1.sinks.k2.hdfs.path=/flume202 a1.sinks

2022-04-29 15:14:04    分类:问答    hadoop   flume   flume-ng

Create hive table error to load Twitter data

I am trying to create external table and trying to load twitter data into table. While creating the table I am getting following error and could not able to track the error. hive> ADD JAR /usr/local/hive/lib/hive-serdes-1.0-SNAPSHOT.jar > ; Added [/usr/local/hive/lib/hive-serdes-1.0-SNAPSHOT.jar] to class path Added resources: [/usr/local/hive/lib/hive-serdes-1.0-SNAPSHOT.jar] hive> CREATE EXTERNAL TABLE tweets ( > id BIGINT, > created_at STRING, > source STRING, > favorited BOOLEAN, > retweeted_status STRUCT< > text:STRING, > user:STRUCT<screen_name:STRING,name:STRING>, > retweet_count:INT>,

2022-04-29 06:36:52    分类:问答    hadoop   twitter   hive   flume   bigdata

How to read data files generated by flume from twitter

问题 回答1 从此链接下载文件 (hive-serdes-1.0-SNAPSHOT.jar) http://files.cloudera.com/samples/hive-serdes-1.0-SNAPSHOT.jar 然后把这个文件放在你的 $HIVE_HOME/lib 将罐子添加到蜂巢壳中 hive> ADD JAR file:///home/hadoop/work/hive-0.10.0/lib/hive-serdes-1.0-SNAPSHOT.jar 在 hive 中创建表 hive> CREATE TABLE tweets ( id BIGINT, created_at STRING, source STRING, favorited BOOLEAN, retweeted_status STRUCT< text:STRING, user:STRUCT<screen_name:STRING,name:STRING>, retweet_count:INT>, entities STRUCT< urls:ARRAY<STRUCT<expanded_url:STRING>>, user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>, hashtags:ARRAY<STRUCT<text:STRING>>>, text

2022-04-24 16:01:13    分类:技术分享    hadoop   twitter   flume

How to use flume for uploading zip files to hdfs sink

问题 回答1 Flume 将尝试逐行读取您的文件,除非您配置了特定的反序列化器。 反序列化器可让您控制文件如何解析和拆分为事件。 您当然可以遵循专为 PDF 等设计的 blob deserizalizer 的示例,但我知道您实际上想要解压缩它们,然后逐行阅读它们。 在这种情况下,您需要编写一个自定义的反序列化器来读取 Zip 并逐行写入事件。 这是文档中的参考: https://flume.apache.org/FlumeUserGuide.html#event-deserializers

2022-04-19 10:00:51    分类:技术分享    flume   flume-ng

How do I transform events in Flume and send them to another channel?

Flume has some ready components to transform events before pushing them further - like RegexHbaseEventSerializer you can stick into an HBaseSink. Also, it's easy to provide a custom serializer. I want to process events and send them to the next channel. Most close to what I want is Regex Extractor Interceptor , which accepts a custom serialiser for regexp matches. But it does not substitute event body, just appends new headers with results to events, thus making output flow heavier. I'd like to accept big sized events, like zipped html > 5KB, parse them and put many slim messages, like urls

2022-04-16 04:31:02    分类:问答    java   flume

水槽是如何分布的?(how is flume distributed?)

问题

2022-04-16 03:51:21    分类:技术分享    bigdata   flume