天道酬勤,学无止境

optimization

Scipy Newton Krylov Expects Square Matrix

I am trying to use scipy.optimize.newton_krylov() to solve a least-squares optimization problem, i.e. finding x such that (Ax - b)**2 = 0. My understanding is that A has to be mxn with m>n, b has to be mx1, and x will be nx1. When I try to run the optimization, I get an error: ValueError: expected square matrix, but got shape=(40, 6) Presumably this error concerns the computation of the Jacobian and not my input matrix A? But if so, how can I change the values I am providing to the functions to resolve this problem? Any advice would be appreciated. The following code reproduces the error

2021-09-24 16:11:21    分类:问答    python   numpy   optimization   scipy

MongoDB [Index Optimization] perfomance during aggregation stage

I have around 50M document in my Mongo database called dma and I use this aggregation to obtain necessary data res and then manipulate with it. async function FormContract(ownerRealm, id) { try { const res = await collection.aggregate([ { $match: { date: {$gt:moment.utc().subtract(1, 'days').toDate(), $lt:moment.utc().toDate()}, id: id, //45 values one by one ownerRealm: {$in: ownerRealm} //20 values one-by-one } }, { $group: { _id: "$lastModified", open_interest: {$sum: "$buyout"}, min: {$min: "$price"}, min_size: {$min: {$cond: [{$gte: ["$quantity", 200]}, "$price", "$min:$price"]}}, avg: {

2021-09-24 02:08:13    分类:问答    javascript   mongodb   performance   optimization   mongoose

如何在 Linux 中的核心 i3/i7 中击败硬件预取器(How to defeat hardware prefetcher in core i3/i7 in linux)

问题 我试图找到一种方法来击败 H/w 预取器来检测流模式并以随机顺序访问 4KB 数据,这样 H/w 预取器就不会检测到和预取它。 最初我想以随机模式访问所有偶数索引数据,因为硬件预取器总是预取下一个缓存行(所以当我访问偶数索引时,下一个奇数索引数据已经预取了)。 我编写了代码以随机模式访问所有偶数索引数据,但是结果表明预取器检测到该模式(不知道如何?没有固定步幅,都是随机步幅) 我正在调查原因-为什么会发生这种情况,然后我在英特尔找到了这篇文章; https://software.intel.com/en-us/forums/topic/473493 根据 John D. McCalpin 博士的说法,“带宽博士, 在“Intel 64 and IA-32 Architectures Optimization Reference Manual”(文档 248966-028,2013 年 7 月)的第 2.2.5.4 节中,它指出, streamer prefetcher "[d] 检测和维护多达 32 个数据访问流。对于每个 4K 字节的页面,您可以维护一个前向流和一个后向流。 这意味着 L2 硬件预取器跟踪最近访问的 16 个 4KiB 页面,并记住这些页面的足够访问模式以跟踪一个前向流和一个后向流。 因此,要通过“随机”获取来击败 L2 流传输器预取器

2021-09-24 01:48:09    分类:技术分享    c   linux   optimization   assembly

MySQL INSERT 语句在大表中是否较慢?(are MySQL INSERT statements slower in huge tables?)

问题 我可以看到SELECT和UPDATE语句如何随着表的增长而变慢,但是INSERT呢? 回答1 INSERT也会变慢,特别是如果您有很多索引也必须更新。 但是,不同存储引擎之间存在差异: MyISAM对于很多SELECT更快,InnoDB 对于很多INSERT / UPDATE更快,因为它使用行锁定而不是表锁定以及它处理索引的方式。 回答2 INSERT 也会变慢,因为它必须整理索引。 回答3 一般来说,是的 - O(1) 性能在任何地方都很罕见,并且保持索引最新是有代价的。 问题是,为什么这很重要,以及在您的特定情况下您可以做些什么? 不要创建从未使用过的索引(例如,我一直在主键上查找带有附加单列索引的表) 不要在表中保留无用的数据(或者,如果您很少使用旧数据,请考虑将它们移动到存档表/数据库中) 如果您的插入速度有问题并且您不关心自动增量 ID,则INSERT DELAYED可能会帮助您 回答4 INSERT 也会变慢 回答5 如果你的桌子很小,你会没事的。 但是如果你的表变大了,插入,更新会变慢,这是我使用的过程,对我有用。 即使使用 InnoDB 表或 MyISAM 也会发生此问题,未针对写入进行优化,并通过使用第二个表写入临时数据(可以定期更新主大表)来解决此问题。 主表超过1800万条记录,用于只读记录并将结果写入第二个小表。 问题是对大主表的插入/更新需要一段时间

2021-09-23 22:42:18    分类:技术分享    mysql   optimization

优化接近重复的值搜索(Optimizing near-duplicate value search)

问题 我试图在一组字段中找到接近重复的值,以便管理员清理它们。 我匹配的有两个标准 一个字符串完全包含在另一个字符串中,并且至少是其长度的 1/4 字符串的编辑距离小于两个字符串总长度的 5% 伪 PHP 代码: foreach($values as $value){ $matches = array(); foreach($values as $match){ if( ( $value['length'] < $match['length'] && $value['length'] * 4 > $match['length'] && stripos($match['value'], $value['value']) !== false ) || ( $match['length'] < $value['length'] && $match['length'] * 4 > $value['length'] && stripos($value['value'], $match['value']) !== false ) || ( abs($value['length'] - $match['length']) * 20 < ($value['length'] + $match['length']) && 0 < ($match['changes'] = levenshtein(

2021-09-23 21:11:04    分类:技术分享    php   optimization   string-matching

Understanding performance impacts for mysql tuple search

I am working on a table structure like this (emp_data) id dept_id emp_id emp_name role 1 101 1001 Tom Good Worker 2 101 1002 Dick Smart Worker 3 102 1001 Harry Hard Worker 4 103 1001 Kate Nice Worker 5 101 1003 Lucy Great Worker id is the uncontested primary key :) (dept_id, emp_id) is a multi column index Now, I need to do some really big search on combinations on (dept_id, emp_id). I use tuple search which goes like this. select * from emp_data where (dept_id, emp_id) in ((101, 1001), (101, 1002), (103, 1001)); This takes quite some time when the table is quite long. But if i do this, select

2021-09-23 20:08:19    分类:问答    mysql   database   search   optimization   indexing

为什么最自然的查询(即使用 INNER JOIN(而不是 LEFT JOIN))非常慢(Why the most natural query(I.e. using INNER JOIN (instead of LEFT JOIN)) is very slow)

问题 此查询花费的时间太长。 explain analyze select c.company_rec_id, c.the_company_code , c.company from tlist t -- it is questionable why this query become fast when using left join, the most natural query is inner join... join mlist m using(mlist_rec_id) join parcel_application ord_app using(parcel_application_rec_id) join parcel ord using(parcel_rec_id) join company c on c.company_rec_id = ord.client_rec_id -- ...questionable where ( 'cadmium' = '' or exists ( select * from mlist_detail md where md.mlist_rec_id = m.mlist_rec_id and exists ( select * from mlist_detail_parameter mdp join parameter p using

2021-09-23 19:14:25    分类:技术分享    sql   postgresql   optimization

No performance gain after using multiprocessing for a queue-oriented function

The real code I want to optimize is too complicated to be included here, so here is a simplified example: def enumerate_paths(n, k): """ John want to go up a flight of stairs that has N steps. He can take up to K steps each time. This function enumerate all different ways he can go up this flight of stairs. """ paths = [] to_analyze = [(0,)] while to_analyze: path = to_analyze.pop() last_step = path[-1] if last_step >= n: # John has reach the top paths.append(path) continue for i in range(1, k + 1): # possible paths from this point extended_path = path + (last_step + i,) to_analyze.append

2021-09-23 18:07:20    分类:问答    python   performance   optimization   queue   multiprocessing

what makes Jsoup faster than HttpURLConnection & HttpClient in most cases

I want to compare performances for the three implementations mentioned in the title, I wrote a little JAVA program to help me doing this. The main method contains three blocks of testing, each block looks like this : nb=0; time=0; for (int i = 0; i < 7; i++) { double v = methodX(url); if(v>0){ nb++; time+=v; } } if(nb==0) nb=1; System.out.println("HttpClient : "+(time/ ((double) nb))+". Tries "+nb+"/7"); Variable nb is used to avoid failed requests. Now method methodX is one of : private static double testWithNativeHUC(String url){ try { HttpURLConnection httpURLConnection= (HttpURLConnection)

2021-09-23 16:16:31    分类:问答    java   optimization   jsoup   httpclient   httpurlconnection

有没有办法优化我的 Powershell 函数以从大文件中删除模式匹配?(Is there a way to optimise my Powershell function for removing pattern matches from a large file?)

问题 我有一个大文本文件(~20K 行,每行~80 个字符)。 我还有一个较大的对象数组(约 1500 项),其中包含我希望从大文本文件中删除的模式。 请注意,如果数组中的模式出现在输入文件的一行中,我希望删除整行,而不仅仅是模式。 输入文件是 CSVish,其行类似于: A;AAA-BBB;XXX;XX000029;WORD;WORD-WORD-1;00001;STRING;2015-07-01;;010; 我在输入文件中搜索每一行的数组中的模式类似于 XX000029 上面那一行的一部分。 我实现这个目标的有点天真的功能目前看起来像这样: function Remove-IdsFromFile { param( [Parameter(Mandatory=$true,Position=0)] [string]$BigFile, [Parameter(Mandatory=$true,Position=1)] [Object[]]$IgnorePatterns ) try{ $FileContent = Get-Content $BigFile }catch{ Write-Error $_ } $IgnorePatterns | ForEach-Object { $IgnoreId = $_.IgnoreId $FileContent = $FileContent | Where

2021-09-23 14:59:20    分类:技术分享    regex   powershell   optimization   powershell-4.0