天道酬勤,学无止境

Should Compaction within gc_grace_seconds Preserve Tombstones?

If I delete a row (creating a tombstone), and run a major compaction within gc_grace_seconds, would one expect the tombstone to survive at least until gc_grace_seconds has elapsed?

标签

评论

Yes, the tombstone is expected to survive for gc_grace_seconds. The reason is that if a node is down at the point in time you delete the row, the delete must have a chance to get propagated to that node later on. When the node is back online and you run nodetool repair it can pick up the delete. If you do not run the repair within gc_grace_seconds then your deleted record might return from the dead.

IFF you are running a single-node cluster, then you can safely set gc_grace_seconds to 0. As there are no other nodes that might be missing deletes.

Have a look at this page on Cassandra operations, repair, and gc_grace_seconds.

受限制的 HTML

  • 允许的HTML标签:<a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd> <h2 id> <h3 id> <h4 id> <h5 id> <h6 id>
  • 自动断行和分段。
  • 网页和电子邮件地址自动转换为链接。

相关推荐
  • How default_time_to_live would delete rows without tombstones in Cassandra?
    From How is data deleted? Cassandra allows you to set a default_time_to_live property for an entire table. Columns and rows marked with regular TTLs are processed as described above; but when a record exceeds the table-level TTL, Cassandra deletes it immediately, without tombstoning or compaction. This is also answered here If a table has default_time_to_live on it then rows that exceed this time limit are deleted immediately without tombstones being written. And commented in LastPickle's post About deletes and tombstones Another clue to explore would be to use the TTL as a default value if
  • 卡桑德拉读取错误(Reading error in cassandra)
    问题 我在尝试从Cassandra表中读取数据时遇到堰错误。 我有一个默认安装的单节点安装。 这是我正在查询: SELECT component_id, reading_1, reading_2, reading_3, date FROM component_readings WHERE park_id=2 AND component_id IN (479) AND date >= '2016-04-09+0000' AND date <= '2016-05-08+0000'; component_readings是一个简单的表,没有聚类条件: CREATE TABLE component_readings ( park_id int, component_id int, date timestamp, reading_1 decimal, reading_2 decimal, ... PRIMARY KEY ((park_id), component_id, date) ); 使用某些component_id值,它可以工作,而使用其他值,它会失败。 这是我得到的错误: cassandra.ReadFailure: code=1300 [Replica(s) failed to execute read] message="Operation failed - received 0
  • Cassandra 2-使用CQL 3列出现有索引(Cassandra 2 - list existing indexes with CQL 3)
    问题 是否存在CQL查询以列出特定键空间或列族的所有现有索引? 回答1 您可以使用系统键空间来检索主键和辅助索引: SELECT column_name, index_name, index_options, index_type, component_index FROM system.schema_columns WHERE keyspace_name='samplekp'AND columnfamily_name='sampletable'; 以下面的表声明为例: CREATE TABLE sampletable ( key text, date timestamp, value1 text, value2 text, PRIMARY KEY(key, date)); CREATE INDEX ix_sample_value2 ON sampletable (value2); 上面提到的查询将得到以下结果: column_name | index_name | index_options | index_type | component_index -------------+------------------+---------------+------------+----------------- date | null | null | null | 0 key |
  • Cassandra Wide Vs Skinny Rows for large columns
    I need to insert 60GB of data into cassandra per day. This breaks down into 100 sets of keys 150,000 keys per set 4KB of data per key In terms of write performance am I better off using 1 row per set with 150,000 keys per row 10 rows per set with 15,000 keys per row 100 rows per set with 1,500 keys per row 1000 rows per set with 150 keys per row Another variable to consider, my data expires after 24 hours so I am using TTL=86400 to automate expiration More specific details about my configuration: CREATE TABLE stuff ( stuff_id text, stuff_column text, value blob, PRIMARY KEY (stuff_id, stuff
  • Kafka not deleting key with tombstone
    I create a kafka topic with below properties min.cleanable.dirty.ratio=0.01,delete.retention.ms=100,segment.ms=100,cleanup.policy=compact Let's say I insert k-v pairs in order 1111:1, 1111:2, 1111:null, 2222:1 What happens now is except last message, the log compaction runs on rest of the messages and clears first two but retains 1111:null Acc to the documentation, Kafka log compaction also allows for deletes. A message with a key and a null payload acts like a tombstone, a delete marker for that key. Tombstones get cleared after a period. So, I am hoping when delete.retention.ms is achieved
  • 大列的Cassandra宽Vs瘦行(Cassandra Wide Vs Skinny Rows for large columns)
    问题 我每天需要将60GB的数据插入cassandra。 这分解成 100套钥匙每套150,000个键每个密钥4KB数据 就写入性能而言,我最好使用每组1行,每行150,000个键每组10行,每行15,000个键每组100行,每行1,500个键每组1000行,每行150键 要考虑的另一个变量,我的数据在24小时后过期,因此我正在使用TTL = 86400自动过期 有关我的配置的更多具体细节: CREATE TABLE stuff ( stuff_id text, stuff_column text, value blob, PRIMARY KEY (stuff_id, stuff_column) ) WITH COMPACT STORAGE AND bloom_filter_fp_chance=0.100000 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.000000 AND gc_grace_seconds=39600 AND read_repair_chance=0.100000 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND compaction={'tombstone
  • Range query on secondary index in cassandra
    I am using cassandra 2.1.10. So First I will clear that I know secondary index are anti-pattern in cassandra.But for testing purpose I was trying following: CREATE TABLE test_topology1.tt ( a text PRIMARY KEY, b timestamp ) WITH bloom_filter_fp_chance = 0.01 AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' AND comment = '' AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'} AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND gc_grace
  • Cassandra timeout during read query at consistency ONE
    I have a problem with the cassandra db and hope somebody can help me. I have a table “log”. In the log table, I have inserted about 10000 rows. Everything works fine. I can do a select * from select count(*) from As soon I insert 100000 rows with TTL 50, I receive a error with select count(*) from Version: cassandra 2.1.8, 2 nodes Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded) Has someone a idea what I am doing wrong? CREATE TABLE test.log ( day text, date timestamp, ip text, iid int, request text, src text, tid int, txt text
  • cassandra:sorting problem,ordering is wrong
    I have a question about Cassandra. At present, "entities_by_time" is ok on the 18-bit uuid through column1 sorting, but there is something wrong with uuid ascending to the 19-bit sorting. Please help me. cqlsh:minds> select * from entities_by_time where key='activity:user:990192934408163330' order by column1 desc limit 10; key | column1 | value ----------------------------------+--------------------+-------------------- activity:user:990192934408163330 | 999979571363188746 | 999979571363188746 activity:user:990192934408163330 | 999979567064027139 | 999979567064027139 activity:user
  • Cassandra “no viable alternative at input”
    I am trying to insert a simple row into the table. Can someone point out what is happening here ? CREATE TABLE recommendation_engine_poc.user_by_category ( game_category text, customer_id text, amount double, game_date timestamp, PRIMARY KEY (game_category, customer_id) ) WITH CLUSTERING ORDER BY (customer_id ASC) AND bloom_filter_fp_chance = 0.01 AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' AND comment = '' AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'} AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress
  • “no viable alternative at input” error when querying cassndra table
    I have a table in Cassandra like this: CREATE TABLE vroc.sensor_data ( dpnode text, year int, month int, day int, data_timestamp bigint, data_sensor text, dsnode text, data_quality double, data_value blob, PRIMARY KEY ((dpnode, year, month, day), data_timestamp, data_sensor, dsnode) ) WITH read_repair_chance = 0.0 AND dclocal_read_repair_chance = 0.1 AND gc_grace_seconds = 864000 AND bloom_filter_fp_chance = 0.01 AND caching = { 'keys' : 'ALL', 'rows_per_partition' : 'NONE' } AND comment = '' AND compaction = { 'class' : 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max
  • Cassandra - Delete not working
    Sometimes; when I perform a DELETE; it doesn't work. My config : [cqlsh 5.0.1 | Cassandra 3.0.3 | CQL spec 3.4.0 | Native protocol v4] cqlsh:my_db> SELECT * FROM conversations WHERE user_id=120 AND conversation_id=2 AND peer_type=1; user_id | conversation_id | peer_type | message_map ---------+-----------------+-----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  • 如何还原Cassandra快照?(How can I restore Cassandra snapshots?)
    问题 我正在为Cassandra数据库构建备份和还原过程,以便在需要时就可以使用它,以便我了解细节,以便构建可用于生产的东西。 我在这里遵循Datastax的说明: http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_backup_restore_c.html。 首先,我将数据库植入开发箱中,然后尝试使备份/还原工作。 这是备份脚本: #!/bin/bash cd /opt/apache-cassandra-2.0.9 ./bin/nodetool clearsnapshot -t after_seeding makeyourcase ./bin/nodetool snapshot -t after_seeding makeyourcase cd /var/lib/ tar czf after_seeding.tgz cassandra/data/makeyourcase/*/snapshots/after_seeding 是的,也许tar不是最有效的方法,但是我只是想立即使某些东西起作用。 我检查了 tar,所有文件都在那里。 备份数据库后,我关闭了Cassandra和我的应用程序,然后关闭了rm -rf /var/lib/cassandra/以模拟完全丢失。 现在恢复数据库。
  • Prometheus时序数据库-磁盘中的存储结构
    Prometheus时序数据库-磁盘中的存储结构前言之前的文章里,笔者详细描述了监控数据在Prometheus内存中的结构。而其在磁盘中的存储结构,也是非常有意思的,关于这部分内容,将在本篇文章进行阐述。磁盘目录结构首先我们来看Prometheus运行后,所形成的文件目录结构 在笔者自己的机器上的具体结构如下:prometheus-data |-01EY0EH5JA3ABCB0PXHAPP999D (block) |-01EY0EH5JA3QCQB0PXHAPP999D (block) |-chunks |-000001 |-000002 ..... |-000021 |-index |-meta.json |-tombstones |-wal |-chunks_headBlock一个Block就是一个独立的小型数据库,其保存了一段时间内所有查询所用到的信息。包括标签/索引/符号表数据等等。Block的实质就是将一段时间里的内存数据组织成文件形式保存下来。 最近的Block一般是存储了2小时的数据,而较为久远的Block则会通过compactor进行合并,一个Block可能存储了若干小时的信息。值得注意的是,合并操作只是减少了索引的大小(尤其是符号表的合并),而本身数据(chunks)的大小并没有任何改变。meta.json我们可以通过检查meta
  • Cassandra:快速加载大数据(Cassandra: Load large data fast)
    问题 我们目前正在与Cassandra一起在单节点集群上进行测试,以测试其上的应用程序开发。 现在,我们有一个非常庞大的数据集,包含大约70M行文本,我们希望将其转储到Cassandra中。 我们尝试了以下所有方法: 使用python Cassandra驱动程序逐行插入卡桑德拉的复制命令将sstable的压缩设置为none 我们已经研究了sstable批量加载器的选项,但是我们没有合适的.db格式。 我们要加载的文本文件有70M行,如下所示: 2f8e4787-eb9c-49e0-9a2d-23fa40c177a4 the magnet programs succeeded in attracting applicants and by the mid-1990s only #about a #third of students who #applied were accepted. 我们打算插入的列族具有以下创建语法: CREATE TABLE post ( postid uuid, posttext text, PRIMARY KEY (postid) ) WITH bloom_filter_fp_chance=0.010000 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0
  • Cassandra cql: how to select the LAST n rows from a table
    I want to verify that rows are getting added to the table. What cql statement would show the last n rows from the table below? Table description below: cqlsh:timeseries> describe table option_data; CREATE TABLE option_data ( ts bigint, id text, strike decimal, callask decimal, callbid decimal, maturity timestamp, putask decimal, putbid decimal, PRIMARY KEY ((ts), id, strike) ) WITH bloom_filter_fp_chance=0.010000 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.100000 AND gc_grace_seconds=864000 AND index_interval=128 AND read_repair_chance=0.000000 AND replicate_on
  • Select 2000 most recent log entries in cassandra table using CQL (Latest version)
    How do you query and filter by timeuuid, ie assuming you have a table with create table mystuff(uuid timeuuid primary key, stuff text); ie how do you do: select uuid, unixTimestampOf(uuid), stuff from mystuff order by uuid desc limit 2000 I also want to be able to fetch the next older 2000 and so on, but thats a different problem. The error is: Bad Request: ORDER BY is only supported when the partition key is restricted by an EQ or an IN. and just in case it matters, the real table is actually this: CREATE TABLE audit_event ( uuid timeuuid PRIMARY KEY, event_time bigint, ip text, level text
  • Cassandra: Load large data fast
    We're currently working with Cassandra on a single node cluster to test application development on it. Right now, we have a really huge data set consisting of approximately 70M lines of texts that we would like dump into a Cassandra. We have tried all of the following: Line by line insertion using python Cassandra driver Copy command of Cassandra Set compression of sstable to none We have explored the option of the sstable bulk loader, but we don't have an appropriate .db format for this. Our text file to be loaded has 70M lines that look like: 2f8e4787-eb9c-49e0-9a2d-23fa40c177a4 the magnet
  • Cassandra CQL range query rejected despite equality operator and secondary index
    From the table schema below, I am trying to select all pH readings that are below 5. I have followed these three pieces of advice: Use ALLOW FILTERING Include an equality comparison Create a secondary index on the reading_value column. Here is my query: select * from todmorden_numeric where sensor_name = 'pHradio' and reading_value < 5 allow filtering; Which is rejected with this message: Bad Request: No indexed columns present in by-columns clause with Equal operator I tried adding a secondary index to the sensor_name column and was told that it was already part of the key and therefore
  • Cassandra CQL:如何从表中选择最后n行(Cassandra cql: how to select the LAST n rows from a table)
    问题 我想验证是否将行添加到表中。 什么cql语句将显示下表中的最后n行? 下表说明: cqlsh:timeseries> describe table option_data; CREATE TABLE option_data ( ts bigint, id text, strike decimal, callask decimal, callbid decimal, maturity timestamp, putask decimal, putbid decimal, PRIMARY KEY ((ts), id, strike) ) WITH bloom_filter_fp_chance=0.010000 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.100000 AND gc_grace_seconds=864000 AND index_interval=128 AND read_repair_chance=0.000000 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND default_time_to_live=0 AND speculative_retry='99