天道酬勤,学无止境

Does the term “vectorization” mean different things in different contexts?

Based on what I've read before, vectorization is a form of parallelization known as SIMD. It allows processors to execute the same instruction (such as addition) on an array simultaneously.

However, I got confused when reading The Relationship between Vectorized and Devectorized Code regarding Julia's and R's vectorization performance. The post claims that devectorized Julia code (via loops) is faster than the vectorized code in both Julia and R, because:

This confuses some people who are not familiar with the internals of R. It is therefore worth noting how one improves the speed of R code. The process of performance improvement is quite simple: one starts with devectorized R code, then replaces it with vectorized R code and then finally implements this vectorized R code in devectorized C code. This last step is unfortunately invisible to many R users, who therefore think of vectorization per se as a mechanism for increasing performance. Vectorization per se does not help make code faster. What makes vectorization in R effective is that it provides a mechanism for moving computations into C, where a hidden layer of devectorization can do its magic.

It claims that R turns vectorized code, written in R, into devectorized code in C. If vectorization is faster (as a form of parallelization), why would R devectorize the code and why is that a plus?

评论

"Vectorization" in R, is a vector processing in R's interpreter's view. Take the function cumsum as an example. On entry, R interpreter sees that a vector x is passed into this function. However, the work is then passed to C language that R interpreter can not analyze / track. While C is doing work, R is just waiting. By the time that R's interpreter comes back to work, a vector has been processed. So in R's view, it has issued a single instruction but processed a vector. This is an analogy to the concept of SIMD - "single instruction, multiple data".

Not just the cumsum function that takes a vector and returns a vector is seen as "vectorization" in R, functions like sum that takes a vector and returns a scalar is also a "vectorization".

Simply put: whenever R calls some compiled code for a loop, it is a "vectorization". If you wonder why this kind of "vectorization" is useful, it is because a loop written by a compiled language is faster than a loop written in an interpreted language. The C loop is translated to machine language that a CPU can understand. However, if a CPU wants to execute an R loop, it needs R's interpreter's help to read it, iteration by iteration. This is like, if you know Chinese (the hardest human language), you can respond to someone speaking Chinese to you faster; otherwise, you need a translator to first translator Chinese to you sentence after sentence in English, then you respond in English, and the translator make it back to Chinese sentence by sentence. The effectiveness of communication is largely reduced.

x <- runif(1e+7)

## R loop
system.time({
  sumx <- 0
  for (x0 in x) sumx <- sumx + x0
  sumx
  })
#   user  system elapsed 
#  1.388   0.000   1.347 

## C loop
system.time(sum(x))
#   user  system elapsed 
#  0.032   0.000   0.030 

Be aware that "vectorization" in R is just an analogy to SIMD but not a real one. A real SIMD uses CPU's vector registers for computations hence is a true parallel computing via data parallelism. R is not a language where you can program CPU registers; you have to write compiled code or assembly code for that purpose.

R's "vectorization" does not care how a loop written in a compiled language is really executed; after all that is beyond R's interpreter's knowledge. Regarding whether these compiled code will be executed with SIMD, read Does R leverage SIMD when doing vectorized calculations?


More on "vectorization" in R

I am not a Julia user, but Bogumił Kamiński has demonstrated an impressive feature of that language: loop fusion. Julia can do this, because, as he points out, "vectorization in Julia is implemented in Julia", not outside the language.

This reveals a downside of R's vectorization: speed often comes at a price of memory usage. I am not saying that Julia won't have this problem (as I don't use it, I don't know), but this is definitely true for R.

Here is an example: Fastest way to compute row-wise dot products between two skinny tall matrices in R. rowSums(A * B) is a "vectorization" in R, as both "*" and rowSums are coded in C language as a loop. However, R can not fuse them into a single C loop to avoid generating the temporary matrix C = A * B into RAM.

Another example is R's recycling rule or any computations relying on such rule. For example, when you add a scalar a to a matrix A by A + a, what really happens is that a is first replicated to be a matrix B that has the same dimension with A, i.e., B <- matrix(a, nrow(A), ncol(A)), then an addition between two matrices are calculated: A + B. Clearly the generation of the temporary matrix B is undesired, but sorry, you can't do it better unless you write your own C function for A + a and call it in R. This is described as "such a fusion is possible only if explicitly implemented" in Bogumił Kamiński's answer.

To deal with the memory effects of many temporary results, R has a sophisticated mechanism called "garbage collection". It helps, but memory can still explode if you generate some really big temporary result somewhere in your code. A good example is the function outer. I have written many answers using this function, but it is particularly memory-unfriendly.

I might have been off-topic in this edit, as I begin to discuss the side effect of "vectorization". Use it with care.

  • Put memory usage in mind; there might be a more memory efficient vectorized implementation. For example, as mentioned in the linked thread on row-wise dot products between two matrices, c(crossprod(x, y)) is better than sum(x * y).
  • Be prepared to use CRAN R packages that have compiled code. If you find existing vectorized functions in R limited to do your task, explore CRAN for possible R packages that can do it. You can ask a question with your coding bottleneck on Stack Overflow, and somebody may point you to the right function in the right package.
  • Be happy to write your own compiled code.

I think it is worth to note that the post you are referring to does not cover all current functionality of vectorization in Julia.

The important thing is that vectorization in Julia is implemented in Julia, as opposed to R, where it is implemented outside of the language. This is explained in detail in this post: https://julialang.org/blog/2017/01/moredots.

The consequence of the fact that Julia can perform fusion of any sequence of broadcasted operations into a single loop. In other languages that provide vectorization such a fusion is possible only if explicitly implemented.

In summary:

  1. In Julia you can expect that vectorized code is as fast as a loop.
  2. If you perform a sequence of vectorized operations then in general you can expect Julia to be faster than R as it can avoid allocation of intermediate results of the computations.

EDIT:

Following the comment of 李哲源 here is an example showing that Julia is able to avoid any allocations if you want to increase all elements of a vector x by 1:

julia> using BenchmarkTools

julia> x = rand(10^6);

julia> @benchmark ($x .+= 1)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     819.230 μs (0.00% GC)
  median time:      890.610 μs (0.00% GC)
  mean time:        929.659 μs (0.00% GC)
  maximum time:     2.802 ms (0.00% GC)
  --------------
  samples:          5300
  evals/sample:     1

In the code .+= performs addition in place (adding $ in front of the expression is only needed for benchmarking, in normal code it would be x .+= 1). And we see that no memory allocation was done.

If we compare this to a possible implementation in R:

> library(microbenchmark)
> x <- runif(10^6)
> microbenchmark(x <- x + 1)
Unit: milliseconds
       expr      min       lq     mean   median       uq      max neval
 x <- x + 1 2.205764 2.391911 3.999179 2.599051 5.061874 30.91569   100

we can see that it not only saves memory, but also leads to a faster execution of the code.

受限制的 HTML

  • 允许的HTML标签:<a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd> <h2 id> <h3 id> <h4 id> <h5 id> <h6 id>
  • 自动断行和分段。
  • 网页和电子邮件地址自动转换为链接。

相关推荐
  • What is caching?
    I'm constantly hearing about person y had performance issue x which they solved through caching. Or, how doing x,y,z in your programs code can hurt your caching ability. Even in one of the latest podcasts, Jeff Atwood talks about how they cache certain values for speedy retrieval. There seems to be some ambiguity in the terms "cache" and "caching" and it has led me to be confused about it's meaning in different cases. Whether you are referring to application or database caching, cpu, etc and what that means. What is caching and what are the different types? From context I can get a sense of it
  • Why is vectorization, faster in general, than loops?
    Why, at the lowest level of the hardware performing operations and the general underlying operations involved (i.e.: things general to all programming languages' actual implementations when running code), is vectorization typically so dramatically faster than looping? What does the computer do when looping that it doesn't do when using vectorization (I'm talking about the actual computations that the computer performs, not what the programmer writes), or what does it do differently? I have been unable to convince myself why the difference should be so significant. I could probably be persuaded
  • What exactly do we mean by “branch”?
    Long story short... As far as I can tell, the term "branch" (in Git parlance) may refer to related but different things: a non-symbolic reference/pointer to a commit, the name of such a reference (e.g. "master"), the subgraph of the repository's commit DAG composed of all the commits reachable from the commit pointed to by such a reference. However, I've seen the term used to apparently refer to something other than those three possible usages (more details below). In a Git context, are there other valid and unambiguous usages of the term "branch" that my list above is missing? More details
  • What is “vectorization”?
    Several times now, I've encountered this term in matlab, fortran ... some other ... but I've never found an explanation what does it mean, and what it does? So I'm asking here, what is vectorization, and what does it mean for example, that "a loop is vectorized" ?
  • “分支”到底是什么意思?(What exactly do we mean by “branch”?)
    问题 Long story short... As far as I can tell, the term "branch" (in Git parlance) may refer to related but different things: a non-symbolic reference/pointer to a commit, the name of such a reference (e.g. "master"), the subgraph of the repository's commit DAG composed of all the commits reachable from the commit pointed to by such a reference. However, I've seen the term used to apparently refer to something other than those three possible usages (more details below). In a Git context, are there other valid and unambiguous usages of the term "branch" that my list above is missing? More details
  • What's the difference between a character, a code point, a glyph and a grapheme?
    Trying to understand the subtleties of modern Unicode is making my head hurt. In particular, the distinction between code points, characters, glyphs and graphemes - concepts which in the simplest case, when dealing with English text using ASCII characters, all have a one-to-one relationship with each other - is causing me trouble. Seeing how these terms get used in documents like Matthias Bynens' JavaScript has a unicode problem or Wikipedia's piece on Han unification, I've gathered that these concepts are not the same thing and that it's dangerous to conflate them, but I'm kind of struggling
  • Entity vs Model vs View Model
    I just spent some time reading about this terms (I don't use them that much since we don't have any MVC applications and I usually just say "model"), but I have the feeling these means different things depending on the context: Entity This is quite simple, it is one row in the database: 2) In relation to a database , an entity is a single person, place, or thing about which data can be stored. Model I often read, this is basically a combimation of entities to represent a full set of data, let's say an Addresslist-model of a customer would combine the entities customer, address and probably
  • What is the difference between attribute and property? [closed]
    Closed. This question needs details or clarity. It is not currently accepting answers. Want to improve this question? Add details and clarify the problem by editing this post. Closed 6 years ago. Improve this question These seem to mean the same thing. But what term is more appropriate in what context?
  • lvalue to rvalue implicit conversion
    I see the term "lvalue-to-rvalue conversion" used in many places throughout the C++ standard. This kind of conversion is often done implicitly, as far as I can tell. One unexpected (to me) feature of the phrasing from the standard is that they decide to treat lvalue-to-rvalue as a conversion. What if they had said that a glvalue is always acceptable instead of a prvalue. Would that phrase actually have a different meaning? For example, we read that lvalues and xvalues are examples of glvalues. We don't read that lvalues and xvalues are convertible to glvalues. Is there a difference in meaning
  • How to use SQLite in a multi-threaded application?
    I'm developing an application with SQLite as the database, and am having a little trouble understanding how to go about using it in multiple threads (none of the other Stack Overflow questions really helped me, unfortunately). My use case: The database has one table, let's call it "A", which has different groups of rows (based on one of their columns). I have the "main thread" of the application which reads the contents from table A. In addition, I decide, once in a while, to update a certain group of rows. To do this, I want to spawn a new thread, delete all the rows of the group, and re
  • Static/Dynamic vs Strong/Weak
    I see these terms bandied around all over the place in programming and I have a vague notion of what they mean. A search shows me that such things have been asked all over stack overflow in fact. As far as I'm aware Static/Dynamic typing in languages is subtly different to Strong/Weak typing but what that difference is eludes me. Different sources seem to use different meanings or even use the terms interchangeably. I can't find somewhere that talks about both and actually spells out the difference. What would be nice is if someone could please spell this out clearly here for me and the rest
  • Difference between microtask and macrotask within an event loop context
    I've just finished reading the Promises/A+ specification and stumbled upon the terms microtask and macrotask: see http://promisesaplus.com/#notes I've never heard of these terms before, and now I'm curious what the difference could be? I've already tried to find some information on the web, but all I've found is this post from the w3.org Archives (which does not explain the difference to me): http://lists.w3.org/Archives/Public/public-nextweb/2013Jul/0018.html Additionally, I've found an npm module called "macrotask": https://www.npmjs.org/package/macrotask Again, it is not clarified what the
  • Difference between shared objects (.so), static libraries (.a), and DLL's (.so)?
    I have been involved in some debate with respect to libraries in Linux, and would like to confirm some things. It is to my understanding (please correct me if I am wrong and I will edit my post later), that there are two ways of using libraries when building an application: Static libraries (.a files): At link time, a copy of the entire library is put into the final application so that the functions within the library are always available to the calling application Shared objects (.so files): At link time, the object is just verified against its API via the corresponding header (.h) file. The
  • Reference — What does this symbol mean in PHP?
    What is this? This is a collection of questions that come up every now and then about syntax in PHP. This is also a Community Wiki, so everyone is invited to participate in maintaining this list. Why is this? It used to be hard to find questions about operators and other syntax tokens.¹ The main idea is to have links to existing questions on Stack Overflow, so it's easier for us to reference them, not to copy over content from the PHP Manual. Note: Since January 2013, Stack Overflow does support special characters. Just surround the search terms by quotes, e.g. [php] "==" vs "===" What should
  • In Windows: How do you programatically launch a process in administrator mode under another user context?
    Scenario I have a remote computer that I want to run installers (arbitrary executables) on programatically. These installers require two things: They must run in Administrator mode. They must run under a specific user context (Specifically, a local user who is a member of the Administrators group). This has proven to be very challenging. It appears as though there are a few external tools that exist that do this, but I am looking for a solution that comes with Windows. What a valid solution to this problem would look like From an elevated context (e.g. an elevated batch file or executable
  • Should I avoid multi-table (concrete) inheritance in Django by any means?
    Many experienced developers recommend against using Django multi-table inheritance because of its poor performance: Django gotcha: concrete inheritance by Jacob Kaplan-Moss, a core contributor of Django. In nearly every case, abstract inheritance is a better approach for the long term. I’ve seen more than few sites crushed under the load introduced by concrete inheritance, so I’d strongly suggest that Django users approach any use of concrete inheritance with a large dose of skepticism. Two Scoops of Django by Daniel Greenfield (@pydanny) Multi-table inheritance, sometimes called “concrete
  • What does “rep; nop;” mean in x86 assembly? Is it the same as the “pause” instruction?
    What does rep; nop mean? Is it the same as pause instruction? Is it the same as rep nop (without the semi-colon)? What's the difference to the simple nop instruction? Does it behave differently on AMD and Intel processors? (bonus) Where is the official documentation for these instructions? Motivation for this question After some discussion in the comments of another question, I realized that I don't know what rep; nop; means in x86 (or x86-64) assembly. And also I couldn't find a good explanation on the web. I know that rep is a prefix that means "repeat the next instruction cx times" (or at
  • Best way to get child nodes
    I was wondering, JavaScript offers a variety of methods to get the first child element from any element, but which is the best? By best, I mean: most cross-browser compatible, fastest, most comprehensive and predictable when it comes to behaviour. A list of methods/properties I use as aliases: var elem = document.getElementById('container'); var child = elem.children[0]; var child = elem.firstElementChild; // == children[0] This works for both cases: var child = elem.childNodes[0]; // or childNodes[1], see below That’s in case of forms, or <div> iteration. If I might encounter text elements
  • Why do we use Base64?
    Wikipedia says Base64 encoding schemes are commonly used when there is a need to encode binary data that needs be stored and transferred over media that are designed to deal with textual data. This is to ensure that the data remains intact without modification during transport. But is it not that data is always stored/transmitted in binary because the memory that our machines have store binary and it just depends how you interpret it? So, whether you encode the bit pattern 010011010110000101101110 as Man in ASCII or as TWFu in Base64, you are eventually going to store the same bit pattern. If
  • Are “data races” and “race condition” actually the same thing in context of concurrent programming
    I often find these terms being used in context of concurrent programming . Are they the same thing or different ?