天道酬勤,学无止境

duplicate-removal

PostgreSQL delete all but the oldest records

I have a PostgreSQL database that has multiple entries for the objectid, on multiple devicenames, but there is a unique timestamp for each entry. The table looks something like this: address | devicename | objectid | timestamp --------+------------+---------------+------------------------------ 1.1.1.1 | device1 | vs_hub.ch1_25 | 2012-10-02 17:36:41.011629+00 1.1.1.2 | device2 | vs_hub.ch1_25 | 2012-10-02 17:48:01.755559+00 1.1.1.1 | device1 | vs_hub.ch1_25 | 2012-10-03 15:37:09.06065+00 1.1.1.2 | device2 | vs_hub.ch1_25 | 2012-10-03 15:48:33.93128+00 1.1.1.1 | device1 | vs_hub.ch1_25 | 2012

2021-06-15 01:17:24    分类:问答    sql   postgresql   duplicate-removal

How can I delete duplicates in MongoDb?

I have a large collection (~2.7 million documents) in mongodb, and there are a lot of duplicates. I tried running ensureIndex({id:1}, {unique:true, dropDups:true}) on the collection. Mongo churns away at it for a while before it decides that too many dups on index build with dropDups=true. How can I add the index and get rid of the duplicates? Or the other way around, what's the best way to delete some dups so that mongo can successfully build the index? For bonus points, why is there a limit to the number of dups that can be dropped?

2021-06-14 23:20:41    分类:问答    mongodb   indexing   duplicates   duplicate-removal

Delete Rows With Duplicate Data VBA

I am struggling with something that should be fairly straightforward, however, I have read at least 15 methods of doing this and cannot seem to get it to work. Here is a sample dataset: 9:30:01 584.7 9:30:01 590 9:30:01 595 9:30:02 584.51 9:30:03 584.62 9:30:04 584.44 9:30:05 584.05 I only want one row per second, so of the first 3 rows, only one needs to stay. I don't care if it is the first or the last, but the code I have been using keeps the last, 595 in this case. The way I am doing it is with a for loop that clears the contents of the row that has the same time as the row below it. I

2021-06-12 03:36:52    分类:问答    vba   excel   duplicate-removal

Remove all duplicate characters from NSString

How to do this using standard methods (without manual iteration through source string)? PS: At final I want to get sorted characters of source string. I tried to use NSCharacterSet, but can't find a method to convert character set to string (without iterating the set).

2021-06-11 20:49:56    分类:问答    objective-c   nsstring   duplicate-removal

单查询删除和显示重复记录(Single Query to delete and display duplicate records)

问题 采访中被问到的问题之一是, 一张表有 100 条记录。 其中50个是重复的。 是否可以通过单个查询从表中删除重复记录并选择并显示剩余的 50 条记录。 这可以在单个 SQL 查询中实现吗? 谢谢 SN 回答1 使用 SQL Server 你会使用这样的东西 DECLARE @Table TABLE (ID INTEGER, PossibleDuplicate INTEGER) INSERT INTO @Table VALUES (1, 100) INSERT INTO @Table VALUES (2, 100) INSERT INTO @Table VALUES (3, 200) INSERT INTO @Table VALUES (4, 200) DELETE FROM @Table OUTPUT Deleted.* FROM @Table t INNER JOIN ( SELECT ID = MAX(ID) FROM @Table GROUP BY PossibleDuplicate HAVING COUNT(*) > 1 ) d ON d.ID = t.ID OUTPUT 语句显示被删除的记录。 更新: 以上查询将删除重复项并为您提供已删除的行,而不是剩余的行。 如果这对您很重要(总而言之,剩余的 50 行应该与删除的 50 行相同),您可以使用 SQL Server

2021-06-11 12:06:41    分类:技术分享    sql   sql-server   duplicate-removal

How to remove duplicate objects in PDF using ghostscript?

Using command-line ghostscript, is it possible to remove duplicate embedded objects (images) in the PDF and replace them with a single instance? I have a 200+ pages PDF with a background image and some smaller logos on each page. The file is very large, because the very same background image and logo binaries are embedded in each individual page, instead of being embedded once and then referenced on each page. I am not the creator of the PDF so I can not solve the problem at it's source. (I do not want to shrink or reduce the image quality, and I do not want delete them completely.)

2021-06-11 04:40:53    分类:问答    pdf   command-line   ghostscript   duplicate-removal

AMQP Delay Delivery and Prevent Duplicate Messages

I have a system that will generate messages sporadically, and I would like to only submit either zero or one message every 5 minutes. If no message is generated, nothing would be processed by the queue consumer. If a hundred identical messages are generated within 5 minutes I only want one of those to be consumed from the queue. I am using AMQP(RabbitMQ), is there a way to accomplish this within rabbitmq or the AMQP protocol? Can I inspect a queue's contents to ensure that I don't insert a duplicate? It seems that queue inspection is a bad idea and not typically what should be done for a

2021-06-10 18:43:17    分类:问答    process   message-queue   delay   amqp   duplicate-removal

如何删除 NSMutableArray 中的重复值(how to remove duplicate value in NSMutableArray)

问题 我正在使用 NSMutableArray 扫描 wifi 信息,但出现的重复值很少,所以我尝试使用以下代码但仍然获得重复值, if([scan_networks count] > 0) { NSArray *uniqueNetwork = [[NSMutableArray alloc] initWithArray:[[NSSet setWithArray:scan_networks] allObjects]]; [scan_networks removeAllObjects]; NSSortDescriptor *networkName = [[[NSSortDescriptor alloc] initWithKey:@"SSID_STR" ascending:YES] autorelease]; NSArray *descriptors = [NSArray arrayWithObjects:networkName,nil]; [scan_networks addObjectsFromArray:[uniqueNetwork sortedArrayUsingDescriptors:descriptors]]; } 这个怎么解决,谢谢 回答1 您可以使用 NSSET 但如果您仅在顺序无关紧要的情况下使用它,那么就采用这种方法。我已经使用过它,它给出了完美的答案。 在

2021-06-10 17:39:31    分类:技术分享    iphone   nsmutablearray   duplicate-removal

重复照片搜索只比较纯图像数据和图像相似性?(Duplicate photo searching with compare only pure imagedata and image similarity?)

问题 13 年来收集了大约 600GB 的照片 - 现在存储在 freebsd zfs/server 上。 照片来自家庭计算机,从几个部分备份到不同的外部 USB 硬盘驱动器,从磁盘灾难中重建图像,来自不同的照片处理软件(iPhoto、Picassa、HP 和许多其他 :( )在几个深层子目录中 - 很快= TERRIBLE MESS与许多重复。 所以首先我做了: 在树中搜索相同大小的文件(快速)并为这些文件制作 md5 校验和。 收集的重复图像(相同大小 + 相同 md5 = 重复) 这有很大帮助,但这里仍然有很多重复: 照片仅与某些照片管理软件添加的 exif/iptc 数据不同,但图像是相同的(或至少“看起来一样”并且具有相同的尺寸) 或者它们只是原始图像的调整大小版本或者它们是原件等的“增强”版本。 现在的问题: 如何在没有exif/IPTC和类似元信息的JPG中使用g校验和仅查找“纯图像字节”来查找重复项? 所以,要过滤掉照片重复,只有exif标签有什么不同,但图像是一样的。 (因此文件校验和不起作用,但图像校验可以......)。 这(我希望)不是很复杂 - 但需要一些指导。 什么 perl 模块可以从 JPG 文件中提取可用于比较/校验和的“纯”图像数据? 更复杂 如何找到“相似”的图像,只有哪些原件的尺寸调整版本原件的“增强”版本(来自一些照片处理程序)

2021-06-10 09:36:15    分类:技术分享    image   perl   bash   image-processing   duplicate-removal

如何删除两个完全相同的行之一?(How can I delete one of two perfectly identical rows?)

问题 我正在清理没有主键的数据库表(我知道,我知道,他们在想什么?)。 我无法添加主键,因为列中有一个重复项会成为键。 重复值来自在所有方面都相同的两行之一。 我无法通过 GUI 删除该行(在本例中为 MySQL Workbench,但我正在寻找一种与数据库无关的方法),因为它拒绝在没有主键(或至少是 UQ NN 列)的表上执行任务,并且我无法添加主键,因为列中有一个重复项会成为键。 重复值来自一个... 如何删除双胞胎之一? 回答1 解决您的问题的一种选择是创建一个具有相同架构的新表,然后执行以下操作: INSERT INTO new_table (SELECT DISTINCT * FROM old_table) 然后只需重命名表。 当然,您需要的空间量与您的表需要磁盘上的备用空间大致相同! 它效率不高,但非常简单。 回答2 SET ROWCOUNT 1 DELETE FROM [table] WHERE .... SET ROWCOUNT 0 这只会删除两行相同的行之一 回答3 请注意,MySQL 有自己的DELETE扩展名,即DELETE ... LIMIT ,它的工作方式与您对LIMIT期望相同:http://dev.mysql.com/doc/refman/5.0/en/delete .html DELETE 的 MySQL 特定 LIMIT row_count

2021-06-10 05:02:38    分类:技术分享    sql   duplicate-removal